Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!amdahl!drivax!socha From: socha@drivax.UUCP Newsgroups: comp.arch Subject: Re: AM29000 Booleans Message-ID: <1512@drivax.UUCP> Date: Fri, 8-May-87 14:05:32 EDT Article-I.D.: drivax.1512 Posted: Fri May 8 14:05:32 1987 Date-Received: Sun, 10-May-87 05:36:16 EDT References: <1270@aw.sei.cmu.edu> <138@neptune.AMD.COM> <3540@spool.WISC.EDU> <16587@amdcad.AMD.COM> Reply-To: socha@drivax.UUCP (Henri J. Socha (x6251)) Organization: Digital Research, Monterey Lines: 91 In article <16587@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes: >In article <3540@spool.WISC.EDU>, lm@cottage.WISC.EDU (Larry McVoy) writes: >> Could you please show the assembly language generated for the following, >> >> { int x; >> >> if (x) >>; >> else >> ; >> >> if (!x) >> ; >> else >> ; >> } Well, to add some fuel to the fire about how a machine's performance can depend on the smarts in the compiler used, I hand modified the example code given in the referenced article. The changed code is shown below. The changes were limited to taking better advantage of the delayed branching. I only re-arranged and removed some code. --------------------------------------- .global _main _main: .align ; if (x) cpneq gr72,gr70,0 <-- sets the msb of gr72 if gr70 != 0 jmpf gr72,$16 <-- jumps if the msb of gr72 is not set ; else ; y = 1; const gr71,1 <-- waisted assign on fall through case ; y = 0; const gr71,0 <-- remember those delayed branches! $16: ; if (!x) cpeq gr72,gr70,0 jmpf gr72,$18 ; else ; y = 1; const gr71,1 ; THEN y = 0; const gr71,0 $18: jmpi lr00 nop > -- Tim Olson > Advanced Micro Devices The savings were: nop jmp $17 <-- remember those delayed branches! nop jmp $19 And that is not so shabby for such a small programme especially if the hardware is smart enough to recognize that the jumped to address is probably in the pipeline. Now, don't try to say that the simple case of 1 instruction after the branch doesn't occur (or occurs rarely). I've seen it often enough to make me think that this is a worthwhile optimization by the compiler. BTW I like the fact that non-arithmetic instructions DO NOT change (affect) the condition code (status) register. This can develop other optimizations. But, I can't understand why the following sequence is needed for multiply. mul gr72,gr71,#0 ;step 1 repeat 30 ; (not in assembler? my addition) mul gr72,gr71,gr72 ;steps 2 to 31 (that's right, 30 words!) mull gr72,gr71,gr72 ; step 32 The above from Am29000 user's manual page 7-10 for a 32 bit integer multiply. It takes 32 instructions to do it (similar for divide). Now, I can understand the advantages of a RISC processor but this is going to far. Should I put 32 instructions in the processing stream each time I need to multiply two numbers? Should I use a subroutine? Seems to me a perfect time/space tradeoff decision. But, what are the costs? -- UUCP:...!amdahl!drivax!socha WAT Iron'75 "Everything should be made as simple as possible but not simpler." A. Einstein