Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!oliveb!pyramid!prls!mips!larry From: larry@mips.UUCP Newsgroups: comp.arch,comp.org.usenix Subject: Re: Benchmarking the 532, 68030, MIPS, 386...at a Usenix! Message-ID: <396@gumby.UUCP> Date: Fri, 15-May-87 21:29:26 EDT Article-I.D.: gumby.396 Posted: Fri May 15 21:29:26 1987 Date-Received: Sat, 16-May-87 20:59:27 EDT References: <324@dumbo.UUCP> <809@killer.UUCP> <2417@homxa.UUCP> <4294@nsc.nsc.com> <2128@hoptoad.uucp> <826@rtech.UUCP> Reply-To: larry@gumby.UUCP (Larry Weber) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 88 Xref: utgpu comp.arch:1214 comp.org.usenix:161 In article <826@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes: >In article <2128@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes: >> >> Let's have the bake-off in the trade show at, say, next Winter >> Usenix. Probably the actual setup and running of the benchmarks >> can be done a day or two before the show, so the results can be >> printed for distribution, and to give the losers time to think >> up (and print up) good explanations before we descend on them :-). >> >> Let's also make the same setup of machines available for people >> to run their own benchmarks... > >At last winter's Uniforum, I went around to a number of booths trying to >run the infamous > > /bin/time bc << ! > 2^4096 > ! > >At a distressing number of places the sales creatures in the booth would >say things like, "I don't believe we're interested in running any >benchmarks today. Let me show you vi." Now there are some good reasons >for this, but it sure sounded like there was something being hidden. I think we should aim for the bake-off to be done through respective engineering staffs. I really like the sales folks but this is really a technical endeavor. Having the benchmarks at a show is a wonderful idea. It gives the engineering staffs a chance to explain, brag, boast or promise their results to lots of people. By having each machine start with a 'clean' benchmark tape we can remove all doubt about whether everyone used exactly the same sources and were run under the same conditions. >Problem 1 is getting some benchmarks run. Problem 2 is trying to get a >straight answer on the price of the system. What you really want is the >bang/buck of different benchmarks on different boxes. The results would >be an embarrassing to many people wearing suits, which is why it may be >difficulty to get a lot of cooperation. I think the benchmark should be made available well in advance and be made available to the 'world'. There is too much comparison of machines using different definitions of performance. This activity would perform a valuable service for the industry. >PS: Given my druthers, I'd like to see: > > * the bc benchmark above > * Dhrystone > * Whetstones > * A paging thrasher. > * A system call overhead checker (looped getpid()s maybe). > * A process thrasher. > >I'd probably give up on disk speed and tty i/o. The benchmarks should strive to illustrate how real world programs run on the machines. Dhrystone, as maligned as it is, is useful only if it is one of a number of larger programs - we will need to carefully document the program with a range of optimizations. A page thrasher would be wonderful BUT it is highly dependent on I/O system, configurations, page size, MMU ... in fact so many things that I suspect it wouldn't be useful. I encourage the readers of this group to search for real programs that range from modest to large size (maybe a couple of hundred Kbytes) that can be run without elaborate setup. They should be: Easily checked for correctness Not rely on system files (eg, grep of passwd) Not use any system commands, if you want to grep, then the code should be part of the benchmark. Be examples of integer, single and double precision float, character oriented, pointer oriented - in short a nice mix of different application areas. Run long enough to be meaningful - none of this 0.1u times that have more timing error than meaning. My suggestions include: Common benchmarks Dhrystone,Whetstone,Linpack,Stanford Real Programs Doduc,Timberwolf,UCB Spice,YACC,C compiler (from Stallman), We should agree ahead of time how the results are to be reported. I suggest that we list individual results under specific conditions and have some weighting method to give a simple result. Maybe, the organizing group could select a base machine and weight the values so that the base machine is one. The VAX 11/780 is often used for this - so why not use it. It is very good that non-vendors get involved to make sure that the fair representation is preserved. Maybe the Uniforum organizing committee can help identify the leaders. Or maybe one of you wants take the lead. Prehaps it will be know as the X suite, where X is YOU. LETS DO IT...