Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!akgua!mcnc!idis!mi-cec!dvk From: dvk@mi-cec.UUCP (Dan Klein) Newsgroups: net.lang.c Subject: C "optimization" (3 of 8) Message-ID: <205@mi-cec.UUCP> Date: Tue, 14-Feb-84 14:21:54 EST Article-I.D.: mi-cec.205 Posted: Tue Feb 14 14:21:54 1984 Date-Received: Fri, 17-Feb-84 04:22:30 EST Lines: 85 This is a continuation of my diatribe on "C doesn't optimize, it neatens". In this and other articles, I compare a true optimizing compiler (Bliss-32 running under VMS) to a code neatener (C running under BSD 4.1c). Any and all counterexamples are welcome. However, this is NOT a comparison of the languages. Both C and Bliss have their good and bad points. This is simply a comparison of the code they generate. As in all examples, the source code and uncensored assembly code is presented. In all examples, the C source and Bliss source are as nearly identical as language differences permit. I have not taken advantage of any "tricks" to get either language to perform better or worse than the other. The optimizer was enabled for both languages. -Dan Klein, Mellon Institute, Pittsburgh (412)578-3382 ============================================================================= In this example, I demonstrate the ability (and lack thereof) of the compilers to extract loop invariant code from the body of a loop. This is a technique we were all taught in Programming-1. The Bliss compiler knows about this technique, and does it for you when you forget (or when it is less elegant to create a temporary variable to hold the invariant value). What I do here is loop on "i" from 0 to "(5+a)/2", and do *nothing* in the body of the loop. The invariant expression is "(5+a)/2". It doesn't take a genius to see that that value will never change in the loop, especially since the loop does nothing at all (let alone reference "a"). This is a very simple example of loop invariant code. Bliss can recognize more complex examples than this. Neither compiler eliminates the loop altogether. This is a religious issue, in that "is the loop needed at all if you know you aren't going to do anything in it". There is no "right" answer to that question, since it is really very application dependant (i.e. do you really want to ignore software timing delays?). So, on to the comparison: 1) Bliss recognizes the loop invariant section of the loop, and evaluates it once (before the loop is executed). Thereafter, it does not need to reevaluate the expression. The C compiler, on the other hand, evaluates the limit expression before each pass of the loop. Not only is this computationally redundant, but speed inefficient. 2) Bliss uses the AOBLEQ (Add One and Branch if Less or Equal) to effect the loop. The C compiler (after having recalculated the limit expression), uses an "incr" / "cmpl" / "jleq" combination. This is less efficient in both speed and space. 3) In the calculation of the limit expression, both compilers need a temporary variable to place the result. The Bliss compiler chooses R1, while the C compiler allocates a stack location. This is a poor choice on the part of C, since stack accesses take far longer than register accesses and require more bytes of assembly code. The register "r1" is available (and does not need to be preserved on routine entry), so C should use it. 4) The C compiler allocates a single variable on the stack in the wrong way. It emits "subl $4,sp" / "clrl -4(fp)" when it could much more efficiently do "clrl -(sp)". Thereafter it refers to the variable as "-4(fp)" when it should use "(sp)". The latter takes 1 bytes versus 2. However, as mentioned in 3) above, using "r1" is better all around. 5) The Bliss compiler sets the loop index variable to be 1 less than the starting value it needs, and immediately increments it (i.e. the loop increment is at the top of the loop). The loop increment in C is also at the top of the loop, but C sets the variable to be what it wants to start at, skips the loop increment the first time, and hits it each time afterward. For loop increments that are complex (i.e. involve pointer deaccessing), this is a reasonable approach. However, for simple increments (like "i++"), the code is wasteful. ----------------------------------------+------------------------------------- routine test(a) : novalue = | test(a) begin | int a; | { incr i from 0 to (5+.a)/2 do ; | int i; | end; | for (i=0; i<=(5+a)/2; i++) ; | } | .TITLE FOO | .data | .text .PSECT $CODE$,NOWRT,2 | LL0: .align 1 | .globl _test TEST: .WORD ^M<> | .set L12,0x0 ADDL3 #5, 4(AP), R1 | .data DIVL2 #2, R1 | .text MNEGL #1, R0 | _test: .word L12 1$: AOBLEQ R1, R0, 1$ | subl2 $4,sp RET | clrl -4(fp) | jbr L18 | L200001:incl -4(fp) | L18: addl3 $5,4(ap),r0 | divl2 $2,r0 | cmpl -4(fp),r0 | jleq L200001 | ret