Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!amdahl!drivax!socha
From: socha@drivax.UUCP
Newsgroups: comp.arch
Subject: Re: AM29000 Booleans
Message-ID: <1512@drivax.UUCP>
Date: Fri, 8-May-87 14:05:32 EDT
Article-I.D.: drivax.1512
Posted: Fri May  8 14:05:32 1987
Date-Received: Sun, 10-May-87 05:36:16 EDT
References: <> <138@neptune.AMD.COM> <3540@spool.WISC.EDU> <16587@amdcad.AMD.COM>
Reply-To: socha@drivax.UUCP (Henri J. Socha (x6251))
Organization: Digital Research, Monterey
Lines: 91

In article <16587@amdcad.AMD.COM>, tim@amdcad.AMD.COM (Tim Olson) writes:
>In article <3540@spool.WISC.EDU>, lm@cottage.WISC.EDU (Larry McVoy) writes:
>> Could you please show the assembly language generated for the following,
>> {	int x;
>> 	if (x)
>> 	    ;
>> 	else
>> 	    ;
>> 	if (!x)
>> 	    ;
>> 	else
>> 	    ;
>> }

Well, to add some fuel to the fire about how a machine's performance can
depend on the smarts in the compiler used, I hand modified the example code
given in the referenced article.  The changed code is shown below.
The changes were limited to taking better advantage of the delayed branching.
I only re-arranged and removed some code.

	.global	_main

;	if (x)
	cpneq	gr72,gr70,0		<-- sets the msb of gr72 if gr70 != 0
	jmpf	gr72,$16		<-- jumps if the msb of gr72 is not set
;	else
;	    y = 1;
	const	gr71,1			<-- waisted assign on fall through case

;	    y = 0;
	const	gr71,0			<-- remember those delayed branches!
;	if (!x)
	cpeq	gr72,gr70,0
	jmpf	gr72,$18
;	else
;	    y = 1;
	const	gr71,1

; THEN	    y = 0;
	const	gr71,0
	jmpi	lr00
>	-- Tim Olson
>	Advanced Micro Devices

The savings were:
	jmp	$17			<-- remember those delayed branches!
	jmp	$19

And that is not so shabby for such a small programme especially if the hardware
is smart enough to recognize that the jumped to address is probably in the

Now, don't try to say that the simple case of 1 instruction after the branch
doesn't occur (or occurs rarely).  I've seen it often enough to make me
think that this is a worthwhile optimization by the compiler.

BTW I like the fact that non-arithmetic instructions DO NOT change (affect)
the condition code (status) register. This can develop other optimizations.
But, I can't understand why the following sequence is needed for multiply.

	mul	gr72,gr71,#0		;step 1
	repeat	30			; (not in assembler? my addition)
	mul	gr72,gr71,gr72		;steps 2 to 31 (that's right, 30 words!)

	mull	gr72,gr71,gr72		; step 32

The above from Am29000 user's manual page 7-10 for a 32 bit integer multiply.
It takes 32 instructions to do it  (similar for divide).

Now, I can understand the advantages of a RISC processor but this is going
to far.  Should I put 32 instructions in the processing stream each time
I need to multiply two numbers?  Should I use a subroutine?
Seems to me a perfect  time/space tradeoff decision.  But, what are the costs?

UUCP:...!amdahl!drivax!socha                                      WAT Iron'75
"Everything should be made as simple as possible but not simpler."  A. Einstein