3 (declare (simple-string s))
4 (declare (optimize (speed 3) (safety 0) (debug 0)))
7 (dotimes (i (length s))
8 (when (eql (aref s i) #\1)
12 * On X86 I is represented as a tagged integer.
15 3: SLOT S!11[EDX] {SB-C::VECTOR-LENGTH 1 7} => t23[EAX]
16 4: MOVE t23[EAX] => t24[EBX]
18 --------------------------------------------------------------------------------
21 (declare (optimize (speed 3) (safety 0) (space 2) (debug 0)))
22 (declare (type (simple-array double-float 1) v))
24 (declare (type double-float s))
25 (dotimes (i (length v))
26 (setq s (+ s (aref v i))))
29 * Python does not combine + with AREF, so generates extra move and
32 * On X86 Python thinks that all FP registers are directly accessible
33 and emits costy MOVE ... => FR1.
35 --------------------------------------------------------------------------------
38 (declare (optimize (speed 3) (safety 0) (space 2))
40 (let ((v (make-list n)))
41 (setq v (make-array n))
44 * IR1 does not optimize away (MAKE-LIST N).
45 --------------------------------------------------------------------------------
48 (declare (optimize (speed 3) (safety 0) (space 2))
49 (type (simple-array base-char 1) v1 v2))
50 (dotimes (i (length v1))
51 (setf (aref v2 i) (aref v1 i))))
53 VOP DATA-VECTOR-SET/SIMPLE-STRING V2!14[EDI] t32[EAX] t30[S2]>t33[CL]
55 MOV #<TN t33[CL]>, #<TN t30[S2]>
56 MOV BYTE PTR [EDI+EAX+1], #<TN t33[CL]>
57 MOV #<TN t35[AL]>, #<TN t33[CL]>
58 MOV #<TN t34[S2]>, #<TN t35[AL]>
60 * The value of DATA-VECTOR-SET is not used, so there is no need in the
64 --------------------------------------------------------------------------------
67 (declare (optimize (speed 3) (safety 0) (debug 0)))
68 (declare (type (double-float 0d0 1d0) d))
69 (loop for i fixnum from 1 to 5
70 for x1 double-float = (sin d) ;;; !!!
71 do (loop for j fixnum from 1 to 4
72 sum x1 double-float)))
74 Without the marked declaration Python will use boxed representation for X1.
80 ;; use of X as DOUBLE-FLOAT
83 The initial binding is effectless, and without it X is of type
84 DOUBLE-FLOAT. Unhopefully, IR1 does not optimize away effectless
85 SETs/bindings, and IR2 does not perform type inference.
86 --------------------------------------------------------------------------------
87 #9 "Multi-path constant folding"
89 (if (= (cond ((irgh x) 0)
96 This code could be optimized to
102 --------------------------------------------------------------------------------
104 (inverted variant of #9)
107 (let ((y (sap-alien x c-string)))
111 It could be optimized to
113 (lambda (x) (list x x))
115 (if Y were used only once, the current compiler would optimize it)
116 --------------------------------------------------------------------------------
118 (typep (truly-the (simple-array * (*)) x) 'simple-vector)
121 --------------------------------------------------------------------------------
123 FAST-+/FIXNUM and similar should accept unboxed arguments in interests
124 of representation selection. Problem: inter-TN dependencies.
125 --------------------------------------------------------------------------------
127 The derived type of (/ (THE (DOUBLE-FLOAT (0D0)) X) (THE (DOUBLE-FLOAT
128 1D0) Y)) is (DOUBLE-FLOAT 0.0d0). While it might be reasonable, it is
129 better to derive (OR (MEMBER 0.0d0) (DOUBLE-FLOAT (0.0d0))).
130 --------------------------------------------------------------------------------
132 On the alpha, the system is reluctant to refer directly to a constant bignum,
133 preferring to load a large constant through a slow sequence of instructions,
134 then cons up a bignum for it:
137 (DECLARE (OPTIMIZE (SAFETY 1) (SPEED 3) (DEBUG 1))
138 (TYPE (INTEGER -10000 10000) A)
141 ((89 125 16) (ASH A (MIN 18 -706)))
142 (T (DPB -3 (BYTE 30 30) -1))))
143 --------------------------------------------------------------------------------
146 ((= i (the (integer 0 100) n)))
149 It is commonly expected for Python to derive (FIXNUMP I). (If ``='' is
150 replaced with ``>='', Python will do.)
151 --------------------------------------------------------------------------------
153 Type tests for (ARRAY BIT), (ARRAY T) and similar go through full
154 %TYPEP, even though it is relatively simple to establish the arrayness
155 of an object and also to obtain the element type of an array. As of
156 sbcl-0.8.12.30, this affects at least DUMP-OBJECT through
157 COMPOUND-OBJECT-P, and (LABELS MAYBE-EMIT-MAKE-LOAD-FORMS GROVEL)
158 through TYPEP UNBOXED-ARRAY, within the compiler itself.
159 --------------------------------------------------------------------------------
161 (lambda (x) (declare (null x)) (sxhash x)) goes through SYMBOL-HASH
162 rather than either constant-folding or manipulating NIL-VALUE or
164 --------------------------------------------------------------------------------
166 (defun-with-dx foo (x)
168 (let ((l (list nil nil)))
170 (setf (second l) (1- x))
173 (declare (dynamic-extent l))
176 Result of MAKE is not stack allocated, which means that
177 stack-allocation of structures is impossible.
178 --------------------------------------------------------------------------------
180 IR2 does not perform unused code flushing.
181 --------------------------------------------------------------------------------
183 Python does not know that &REST lists are LISTs (and cannot derive it).
184 --------------------------------------------------------------------------------
186 a. Iterations on &REST lists, returning them as VALUES could be
187 rewritten with &MORE vectors.
188 b. Implement local unknown-values mv-call (useful for fast type checking).
189 --------------------------------------------------------------------------------
191 SBCL cannot derive upper bound for I and uses generic arithmetic here:
195 (dotimes (i (length l))
197 (map-foo (lambda (x) (if x (return t)))
202 (So the constraint propagator or a possible future SSA-convertor
203 should know the connection between an NLE and its CLEANUP.)
204 --------------------------------------------------------------------------------
206 Initialization of stack-allocated arrays is inefficient: we always
207 fill the vector with zeroes, even when it is not needed (as for
208 platforms with conservative GC or for arrays of unboxed objectes) and
209 is performed later explicitely.
211 (This is harder than it might look at first glance, as MAKE-ARRAY is smart
212 enough to eliminate something like ':initial-element 0'. Such an optimization
213 is valid if the vector is being allocated in the heap, but not if it is being
214 allocated on the stack. You could remove this optimization, but that makes
215 the heap-allocated case somewhat slower...)
216 --------------------------------------------------------------------------------
218 a. Accessing raw slots in structure instances is more inefficient than
219 it could be; if we placed raw slots before the header word, we would
220 not need to do arithmetic at runtime to access them. (But beware:
221 this would complicate handling of the interior pointer).
223 b. (Also note that raw slots are currently disabled on HPPA)
224 --------------------------------------------------------------------------------
226 Python is overly zealous when converting high-level CL functions, such
227 as MIN/MAX, LOGBITP, and LOGTEST, to low-level CL functions. Reducing
228 Python's aggressiveness would make it easier to effect changes such as
231 * direct MIN/MAX on {SINGLE,DOUBLE}-FLOATs ({MIN,MAX}S{S,D})
234 * direct LOGBITP on word-sized integers and fixnums (BT + JC)
237 * branch-free MIN/MAX on word-sized integers and fixnums (floats could
238 be handled too, modulo safety considerations on the PPC)
241 * efficient LOGTESTs on word-sized integers and fixnums (TEST)
245 (The framework for this has been implemented as of 0.9.9.18; see the
246 vm-support-routine COMBINATION-IMPLEMENTATION-STYLE and its use in
247 src/compiler/ir1opt.lisp, IR1-OPTIMIZE-COMBINATION. The above
248 optimizations are left as an exercise for the reader.)
249 --------------------------------------------------------------------------------
254 FOO's IR1 representation is roughly:
261 However, if a full call is generated for < (and similarly for other
262 predicate functions), then the IF is unnecessary, since the return value
263 of (< x y) is already T or NIL.
264 --------------------------------------------------------------------------------
266 The typecheck generated for a declaration like (integer 0 45) on x86 looks
269 ; 12B: F6C203 TEST DL, 3
271 ; 130: 8BC2 MOV EAX, EDX
272 ; 132: 83F800 CMP EAX, 0
274 ; 137: 8BC2 MOV EAX, EDX
275 ; 139: 3DB4000000 CMP EAX, 180
278 A better code sequence for this would be:
286 Doing an unsigned comparison means that, similarly to %CHECK-BOUND, we can
287 combine the <0 and >=bound tests. This sort of test is generated often
288 in SBCL and any array-based code that's serious about type-checking its
290 --------------------------------------------------------------------------------
292 The code for a vector bounds check on x86 (similarly on x86-64) where
293 the vector is in EDX and the index in EAX looks like:
295 ; 49: L0: 8B5AFD MOV EBX, [EDX-3]
296 ; 4C: 39C3 CMP EBX, EAX
299 because %CHECK-BOUND is used for bounds-checking any array dimension.
300 A more efficient specialization (%CHECK-BOUND/VECTOR) would produce:
305 Which is slightly shorter and avoids using a register.
306 --------------------------------------------------------------------------------
308 Reports from the Java camp indicate that using an SSE2-based
309 floating-point backend on x86 when possible is highly preferable to
310 using the x86 FP stack. It would be nice if SBCL included an SSE2-based
311 floating point backend with a compile-time option to switch between the
313 --------------------------------------------------------------------------------
318 (declare (type (integer 0 45) x y))
321 results in the following error trapping code for type-checking the
324 ; 424: L0: 8B058CE31812 MOV EAX, [#x1218E38C] ; '(MOD 46)
325 ; 42A: 0F0B0A BREAK 10 ; error trap
327 ; 42E: 1F BYTE #X1F ; OBJECT-NOT-TYPE-ERROR
328 ; 42F: FECE01 BYTE #XFE, #XCE, #X01 ; EDI
329 ; 432: 0E BYTE #X0E ; EAX
330 ; 433: L1: 8B0590E31812 MOV EAX, [#x1218E390] ; '(MOD 46)
331 ; 439: 0F0B0A BREAK 10 ; error trap
333 ; 43D: 1F BYTE #X1F ; OBJECT-NOT-TYPE-ERROR
334 ; 43E: 8E BYTE #X8E ; EDX
335 ; 43F: 0E BYTE #X0E ; EAX
337 Notice that '(MOD 46) has two entries in the constant vector. Having
338 one would be preferable.
339 --------------------------------------------------------------------------------
344 (declare (type simple-vector a))
347 results in the following x86 code:
349 ; 115886E9: F7C703000000 TEST EDI, 3 ; no-arg-parsing entry point
351 ; 6F1: 8BC7 MOV EAX, EDI
352 ; 6F3: 83F800 CMP EAX, 0
354 ; 6F8: 8BC7 MOV EAX, EDI
355 ; 6FA: 3DF8FFFF7F CMP EAX, 2147483640
357 ; 701: L0: 8B057C865811 MOV EAX, [#x1158867C] ; '(MOD
359 ; 707: 0F0B0A BREAK 10 ; error trap
361 ; 70B: 1F BYTE #X1F ; OBJECT-NOT-TYPE-ERROR
362 ; 70C: FECE01 BYTE #XFE, #XCE, #X01 ; EDI
363 ; 70F: 0E BYTE #X0E ; EAX
364 ; 710: L1: 8B42FD MOV EAX, [EDX-3]
365 ; 713: 8BCF MOV ECX, EDI
366 ; 715: 39C8 CMP EAX, ECX
368 ; 719: 8B540A01 MOV EDX, [EDX+ECX+1]
370 ... plus the standard return sequence and some error blocks. The
371 `TEST EDI, 3' and associated comparisons are to ensure that `I' is a
372 positive fixnum. The associated comparisons are unnecessary, as the
373 %CHECK-BOUND VOP only requires its tested index to be a fixnum and takes
374 care of the negative fixnum case itself.
376 {HAIRY-,}DATA-VECTOR-REF are DEFKNOWN'd with EXPLICIT-CHECK, which would
377 seem to take care of this, but EXPLICIT-CHECK only seems to be used when
378 compiling calls to unknown functions or similar. Furthermore,
379 EXPLICIT-CHECK, as NJF understands it, doesn't have the right
380 semantics--it suppresses all type checking of arguments, whereas what we
381 really want is to ensure that the argument is a fixnum, but not check
383 --------------------------------------------------------------------------------
386 In #35, the CMP EAX, $foo instructions are all preceded by a MOV. They
387 appear to be unnecessary, but are necessary because in IR2, EDI is a
388 DESCRIPTOR-REG, whereas EAX is an ANY-REG--and the comparison VOPs only
389 accept ANY-REGs. Therefore, the MOVs are "necessary" to ensure that the
390 comparison VOP receives an TN of the appropriate storage class.
392 Obviously, it would be better if a) we only performed one MOV prior to
393 all three comparisons or b) eliminated the necessity of the MOV(s)
394 altogether. The former option is probably easier than the latter.
396 --------------------------------------------------------------------------------
399 (setf (subseq s1 start1 end1) (subseq s2 start2 end1))
401 could be transformed into
406 (replace s1 #:s2 :start1 start1 :end1 end1 :start2 #:start2 :end2 #:end2))
408 when the return value is unused, avoiding the need to cons up the new sequence.
410 --------------------------------------------------------------------------------
413 (let ((*foo* 42)) ...)
415 currently compiles to code that ensures the TLS index at runtime, which
416 is both a decently large chunk of code and unnecessary, as we could ensure
417 the TLS index at load-time as well.