Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!oliveb!pyramid!prls!mips!larry
From: larry@mips.UUCP
Newsgroups: comp.arch,comp.org.usenix
Subject: Re: Benchmarking the 532, 68030, MIPS, 386...at a Usenix!
Message-ID: <396@gumby.UUCP>
Date: Fri, 15-May-87 21:29:26 EDT
Article-I.D.: gumby.396
Posted: Fri May 15 21:29:26 1987
Date-Received: Sat, 16-May-87 20:59:27 EDT
References: <324@dumbo.UUCP> <809@killer.UUCP> <2417@homxa.UUCP> <4294@nsc.nsc.com> <2128@hoptoad.uucp> <826@rtech.UUCP>
Reply-To: larry@gumby.UUCP (Larry Weber)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 88
Xref: utgpu comp.arch:1214 comp.org.usenix:161

In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
>In article <2128@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
>>
>>	Let's have the bake-off in the trade show at, say, next Winter
>>	Usenix.  Probably the actual setup and running of the benchmarks
>>	can be done a day or two before the show, so the results can be
>>	printed for distribution, and to give the losers time to think
>>	up (and print up) good explanations before we descend on them :-).
>>
>>	Let's also make the same setup of machines available for people
>>	to run their own benchmarks...
>
>At last winter's Uniforum, I went around to a number of booths trying to
>run the infamous
>
>	/bin/time bc << !
>	2^4096
>	!
>
>At a distressing number of places the sales creatures in the booth would
>say things like, "I don't believe we're interested in running any
>benchmarks today.  Let me show you vi."  Now there are some good reasons
>for this, but it sure sounded like there was something being hidden.
I think we should aim for the bake-off to be done through respective
engineering staffs. I really like the sales folks but this is really a 
technical endeavor.  Having the benchmarks at a show is a wonderful
idea.  It gives the engineering staffs a chance to explain, brag, boast 
or promise their results to lots of people.  By having each machine start 
with a 'clean' benchmark tape we can remove all doubt about whether everyone
used exactly the same sources and were run under the same conditions.

>Problem 1 is getting some benchmarks run.  Problem 2 is trying to get a
>straight answer on the price of the system.  What you really want is the
>bang/buck of different benchmarks on different boxes.  The results would
>be an embarrassing to many people wearing suits, which is why it may be
>difficulty to get a lot of cooperation.
I think the benchmark should be made available well in advance and be
made available to the 'world'.  There is too much comparison of machines
using different definitions of performance.  This activity would perform
a valuable service for the industry.

>PS:  Given my druthers, I'd like to see:
>
>	* the bc benchmark above
>	* Dhrystone
>	* Whetstones
>	* A paging thrasher.
>	* A system call overhead checker (looped getpid()s maybe).
>	* A process thrasher.
>
>I'd probably give up on disk speed and tty i/o.
The benchmarks should strive to illustrate how real world programs run
on the machines.  Dhrystone, as maligned as it is, is useful only if it is
one of a number of larger programs - we will need to carefully document
the program with a range of optimizations.  A page thrasher would be
wonderful BUT it is highly dependent on I/O system, configurations, page
size, MMU ... in fact so many things that I suspect it wouldn't be
useful.

I encourage the readers of this group to search for real programs that
range from modest to large size (maybe a couple of hundred Kbytes) that
can be run without elaborate setup.  They should be:
	Easily checked for correctness
	Not rely on system files (eg, grep of passwd)
	Not use any system commands, if you want to grep, then the
	  code should be part of the benchmark.
        Be examples of integer, single and double precision float, character
	  oriented, pointer oriented - in short a nice mix of different
	  application areas.
	Run long enough to be meaningful - none of this 0.1u times
	  that have more timing error than meaning.
My suggestions include:
	Common benchmarks	Dhrystone,Whetstone,Linpack,Stanford
	Real Programs		Doduc,Timberwolf,UCB Spice,YACC,C 
				  compiler (from Stallman), 

We should agree ahead of time how the results are to be reported.  I
suggest that we list individual results under specific conditions and have some
weighting method to give a simple result.  Maybe, the organizing group
could select a base machine and weight the values so that the base machine
is one.  The VAX 11/780 is often used for this - so why not use it.

It is very good that non-vendors get involved to make sure that the
fair representation is preserved.  Maybe the Uniforum organizing committee
can help identify the leaders.  Or maybe one of you wants take the
lead.  Prehaps it will be know as the X suite, where X is YOU.  

LETS DO IT...