Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site fortune.UUCP
Path: utzoo!watmath!clyde!floyd!harpo!ihnp4!fortune!rpw3
From: rpw3@fortune.UUCP
Newsgroups: net.micro.68k
Subject: Re: Re: 68020 vs 16k - is the 020 worth - (nf)
Message-ID: <2459@fortune.UUCP>
Date: Tue, 7-Feb-84 04:05:09 EST
Article-I.D.: fortune.2459
Posted: Tue Feb  7 04:05:09 1984
Date-Received: Thu, 9-Feb-84 13:36:05 EST
Sender: notes@fortune.UUCP
Organization: Fortune Systems, Redwood City, CA
Lines: 83

#R:utzoo:-349300:fortune:6600011:000:4379
fortune!rpw3    Feb  6 23:15:00 1984

Please, please, please, folks... don't fall in the trap of comparing
CPU clock speeds across different machine architectures (such as
20 Mhz 68k vs. 6Mhz 16k). "It ain't that simple!" [Murphy's Law #27]

The CPU clock has only to do with the internal fineness of
the particular state-machine/microcode-engine used to implement
the chip. You have to look at how many clocks it takes for a
memory cycle, AND what access time is demanded of the memory
to achieve that cycle. Comparing CPU clocks is like saying,
"My car is faster than yours because my wheels have higher RPMs."
(What's the diameter of the wheels, Ollie?)

To get valid comparisons one must normalize the CPU clock to the memory
access time and then memory cycle times can be calculated using the bus
sequence of the particular chip.  Since processor clock speeds generally
evolve more quickly than memory access times (in the marketplace), one
has to look at how well the (expensive) memory is being used.

In extreme examples, equal speed memories can result in one
architecture being two or more times faster than another, simply
because the memory is left idle. This explains, for example, why the
obscure 6809 can stomp the familiar Z80, given equal access time
memories, even though the Z80 may be running with a 2.5 times faster
CPU clock. The 6809 uses one clock per memory cycle, the Z80 needs
three (data) or four (instruction fetch). The Z80 also leaves the
RAMs idle for a longer fraction of the cycle. (To get equivalent
performance from the Z80, you have to run the CPU clock at a MUCH
higher rate to balance the duty cycle while adding back wait states
to match the access time.)

One of the main reasons I happen to like the 68000/68010 is simply that
the bus access-to-cycle time ratio nicely matches the access-to-cycle
ratio of current (and near-future) dynamic RAMs. (For hardware hackers,
the chip leaves the memories idle for just about the "RAS precharge
time".) It makes good use of the memories. (Who knows about the 68020?)
But don't let Motorola hype you. With the RAM chips we are going to have
available over the next 1-2 years, you don't NEED a 20Mhz CPU; 12-16Mhz
will do just fine, thank you.

(I have not done a careful study of the 16000, but from the few minutes
I have looked at the bus timing diagrams, it didn't looked quite as
memory efficient. Be that as it may, ...)

To do a fair comparision, one needs to presume some RAM access time,
add bus driver/receiver and memory system delays (to get a memory
SYSTEM access and cycle time), add MMU delays, and then compute the
fastest CPU clock speed (for each chip) that just makes that access
time work. (If one of the CPUs won't go fast enough to keep commercial
memory chips busy, you've got a real problem with that one.) From that
clock and the number of clocks per memory cycle, you can calculate
the effective system memory cycle time as driven by each processor.
Divide the raw memory system cycle time by the CPU-cum-memory system
cycle time to get percentage effective memory utilization. The result
is a pretty good first-order comparison of throughput between the CPU
architectures.

If you have reason to believe that one machine is GROSSLY more instruction
stream efficient than the other (average bits/instruction), then you can
scale a little for that, but be careful. Such interpretations are tricky
(what is an "average instruction"?). The best way to do that is to take
some fairly large modules of frequently used code (say, pieces of "libc")
and hand code them in assembler as tight as possible. (Comparisons of
individual instructions are meaningless.) Look at total memory cycles
required for the entire function (don't forget a byte often costs the
same as a word), and scale by the memory utilization calculated above.
That gives you "functions per mem-access-time", which is a measure that
can be used across a fairly large evolution in CPU clock and memory
access times (which occur as chips get better).

Whatever you do, don't try to compare CPU clock speeds alone. Even
within a chip family, it's bogus. (A 20 Mhz 68000 is twice as fast
as a 10 MHz 68000 ONLY with an infinitely fast memory system with
no real-world components.)

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065