Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site fortune.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!ihnp4!fortune!rpw3
From: rpw3@fortune.UUCP
Newsgroups: net.lang.c
Subject: Re: Re: Casting Pointers -- fast *portab - (nf)
Message-ID: <2482@fortune.UUCP>
Date: Wed, 8-Feb-84 07:03:53 EST
Article-I.D.: fortune.2482
Posted: Wed Feb  8 07:03:53 1984
Date-Received: Fri, 10-Feb-84 02:06:08 EST
Sender: notes@fortune.UUCP
Organization: Fortune Systems, Redwood City, CA
Lines: 32

#R:kobold:-27200:fortune:16200020:000:1178
fortune!rpw3    Feb  8 02:25:00 1984

And of course (?) everyone knows by now (?) that you can get even better
with a 68000 by using the move-multiple-long (register load/store)
instructions to eat and spew big gulps.

	1. Save a few regs
	2. While bunches left to do
	   a. gulp into the regs
	   b. spew out to memory
	   c. adjust indices
	3. copy the odd few words.

Now, that strategy doesn't compare well against loop-unrolled move-long,
since the move long takes care of the indices (movl a1@+,a2@+) and
the moveml doesn't, but the moveml's can be loop-unrolled too! In that
case, each load/store pair has a higher address offset word in the
instruction ("moveml ,a5(offset1)"), and you fix up the
whole loop with two adds at the end. In the limiting case (which you
can get close to attaining while doing buffer-block moves), you only
fetch 8 bytes of instructions for each 40 bytes of data copied (note
that's 80 bytes touched), or just over 10% overhead.

(See the code for "blt" that comes with the the MIT "C" compiler.)

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065