Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!floyd!harpo!seismo!rochester!ritcv!tropix!rcm
From: rcm@tropix.UUCP (Robert C. Moore)
Newsgroups: net.micro.68k
Subject: Re: Re: 68020 vs 16k - is the 020 worth - (nf)
Message-ID: <181@tropix.UUCP>
Date: Thu, 9-Feb-84 19:17:06 EST
Article-I.D.: tropix.181
Posted: Thu Feb  9 19:17:06 1984
Date-Received: Sat, 11-Feb-84 08:15:11 EST
References: fortune.2459
Lines: 43

Rob's comments on the relative speed comparisons of micros are quite
correct.  It is important to note the effective instruction execution
speed including the effects of mmu's, bus arbitration, memory speeds,
and so forth --- unless there is an intervening cache.  The cache speed
is then most important, as well as the cache size (and thus its hit
rate.)

For example, the 16k can get data from memory in only 3 clock cycles,
but with its mmu, the number jumps to 4 (assuming very fast memory.)
If the 68451 mmu is used alone with the 68000, getting only 2 waits at
12.5 Mhz is considered pretty good (ie 6 clock cycles).  But if a
translation cache is put around it, no wait operation (4) is pretty
easy with conventional dynamic ram.  With both a translation cache and
a data cache, no wait operation of the 68000 at 12.5 Mhz is trivial,
although the cache size will determine the degradation due to imperfect
hit rate.

The 68020 provides a virtual cache inside the processor, neatly
avoiding the delays in address translation and main memory cycle time.
A hidden benefit is the fact that the internal cycles are synchronous,
avoiding the need to repetedly sample the DTACK (actually DSACK)
asynchronous handshake line to prevent meta-stable states from
propagating into the chip.  ("You are not expected to understand
this.")  In short, an average of one byte is consumed off the
instruction stream on each clock cycle.  (The shortest instructions
require 2 cycles, and are two bytes long.)

Compare this to the 32032.  There the state machine is unchanged from
the 16032.  It contains no cache.  The 16032 already underutilizes its
bus (in fact the 16008 is almost as fast, as the 8 byte prefetch queue
is almost always full.)  The 32032 will only go slightly faster than
the 16032 in such circumstances.  It will, however, leave enough bus
time available that one could credibly run two processors on the same
bus!

All this discussion assumes that with register rich instruction sets most
of the effects on system timing are due to the time needed to access
text (programs) and data cycles have very little effect.  Does anyone
have any hard numbers on 16k bus utilization or text/data access ratios
for either of these chips?

bob moore 
ihnp4!tropix!rcm