Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxb!mhuxn!mhuxm!mhuxj!houxm!whuxlm!akgua!sdcsvax!sdcrdcf!hplabs!hao!seismo!ut-sally!oakhill!davet
From: davet@oakhill.UUCP (Dave Trissel)
Newsgroups: net.arch
Subject: Re: Re: Caltech's Cosmic Cube
Message-ID: <333@oakhill.UUCP>
Date: Fri, 8-Feb-85 04:50:31 EST
Article-I.D.: oakhill.333
Posted: Fri Feb  8 04:50:31 1985
Date-Received: Wed, 13-Feb-85 02:16:21 EST
Organization: Motorola Inc. Austin, Tx
Lines: 95

>>Dec 27's Electronic Design makes reference to a 64-node parallel processor
>>using 8086/87's having solved a high-order physics problem which, heretofore,
>>folk had only had the temerity to try out on a Cray.
>>      I'm curious.  Anyone know about this or know literature references?

>A machine consisting of 16 x {8086, 8087, 256kb} is known as a "Mark II".
>The architecture encourages (2^N)-node networks by making the maximum distance
>between nodes to be N links; hence, "hypercube".  I understand that different
>configurations  of the Mark II are being built, up to possibly 128-node.

I think it is important to size up the claims made for the power of multiple
microprocessors tied together in ANY configuration.  First lets look at the
raw power available.  The 8086 at 10 Mhz (its highest rated speed) can do
at most 1.25 million integer operations per second (thats 32-bit register to
register ADD.)  The 8087 performs ADD and SUBTRACT floating-points at
20 us a shot (MUL is around 30 and DIV is around 40) at its highest rated
speed of 5 Mhz. (Lets be good guys and forget for the moment that the 8086
cannot run faster than the 8087 which means it must run at 5 Mhz which lowers
its 32-bit integer add rate to .625 MIPS.)

Now the CRAY runs (I am quoting from memory but I don't think that I'm going
to be far off) scalar rates of 30 Megaflops and vector rates of over 80.
At the scalar rate of 30 Megaflops and assuming no interconnect overhead or
idle time penalties on all 8087s it would take about 600 8087s to match the
floating-point power of a CRAY!  Thats right --- 600!  Even if the cube
had an array of 64 8086/8087 pairs its power would only be about one tenth
that of a CRAY.  (Cost wise though, 600 8086/8087 pairs would only run about
200 grand - substantially cheaper than the CRAY.)

Assuming the same 30 MIPS figure for the CRAY integer processing it would only
take about 50 8086's (at 10 Mhz) to match the CRAY.

Even though these are ballpark figures, I think the conclusion to be had is
quite obvious.  The cube does not approach the power of a CRAY.

>The next version, a "Mark III", is tentatively set to be 64 x {16 mhz 68020,
>68881, 1-4mb } for delivery in 1987.  For my purposes (massive discrete event
>simulations) that begins to look interesting.  I've heard claims that the
>68020/68881 pair is faster than a VAX-11/780...can someone comment on this?

Well true and false.  At nonfloating-point operations the '020 runs from
20 percent to 80 percent faster than the 780.  For floating-point (DEC gives
out no timings) we figure    the 780 is slightly faster for single precision,
slightly slower for double and extended and moderately slower at
transcendentals. So the result is that the MC68020/881 combination is from
about the same to 80 percent faster than the VAX 11/780 depending upon what
you are doing.

Lets make the same ballpark comparison with the CRAY.  Floating ADD/SUB is
about 2.3 us on the MC68881.  That still means you would need about 44 881s
to match the power of the CRAY 30 Megaflops.  This is a little more
encouraging as fourty-four of something is more managable than 600 of
something. The MC68020 runs 32-bit register to register operations at an
impressive 8 MIPS, which would indicate that only four MC68020's would be
needed to approach the integer power of a CRAY.  (I am assuming a 30 MIP
figure here for the CRAY.  Corrections welcomed from those in the know. Sorry
but my CRAY manual is in storage.)

Fermii (sp?) Labs in Chicago have a serious proposal to build a CRAY power
equivalent MC68020 multi-processor system.  I have seen their prototype
running on MC68000s and it along with the software they have developed
is truely impressive.  They are running ABSOFT FORTRAN on each node with
a VAX 780 controlling the whole thing.  However, thier nodes do not seem
to be as closely coupled as those mentioned here about the Cube.
I will post a synopsis of that machine if people are interested.

>I've also heard a rumor that a major firm plans to market its own Intel
>386-based hypercube.  I don't know enough about the 386 performance or
>schedule to know when this would be or whether the 68020 would be better.

   <<>>
We at Motorola have heard rumors that it is in for its fourth redesign and
that now the on-chip instruction cache is being abandoned. I fail to see
how any high performance chip can be effective without an on-chip cache
of some type.  (The EDN benchmarks on the MC68020 show an over 25 percent
improvement when the cache is turned on.)  Intel's sales pitch may give
a clue about the 386's status.  It is a polished presentation which attempts
to prove that you don't need 32-bits for anything, and that the MC68020 is
overkill.

>The problem of effectively using this computing power is non-trivial
>(ask the folks with Illiac IV). ...
>        Joel West
>        CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037
>        jww@bonnie.UUCP (ihnp4!bonnie!jww)
>        westjw@nosc.ARPA

I would have agreed 100 percent with that statement before I saw the Fermii
Lab demo.  Now I'm not so sure.  It may be non-trivial but now I don't think
its too difficult to tackle either.

Of course, all responses welcome.

Motorola Semiconductor Inc.                Dave Trissel
Austin, Texas          {ctvax,siesmo,gatech,ihnp4}!ut-sally!oakhill!davet