Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!pasteur!ucbvax!bloom-beacon!gatech!hubcap!kruskal
From: kruskal@mimsy.UUCP (Clyde P. Kruskal)
Newsgroups: comp.parallel
Subject: Re: Breakthrough at Sandia?  + Iso-efficiency function
Message-ID: <1317@hubcap.UUCP>
Date: 6 Apr 88 12:05:57 GMT
Sender: fpst@hubcap.UUCP
Lines: 93
Approved: parallel@hubcap.clemson.edu

In article <1252@hubcap.UUCP> ut-sally!kumar@uunet.UU.NET (Vipin Kumar) writes:
>
>Clearly, it is fair to increase the problem size to test speedup
>on more processor.  If the problem size is fixed, then 
>the speedup will at best saturate at some point. (Each program
>has some (however small) fixed sequential component, which
>puts a ceiling on the obtainable speedup).
>Actually for many parallel algorithms, it is possible to obtain linear speedup
>for arbitrarily large N (the number of processors)
>by simply increasing the problem size.
>
>A more interesting question to ask is: how fast should
>the problem size (measured in terms of sequential execution time) 
>grow as a function of N  to maintain linear speedup (i.e., constant
>efficiency)?  In other words, what is the iso-efficiency function
>of the parallel formulation? (If the problem size needs to grow as
>f(N) to maintain constant efficiency, then the iso-efficiency function
>of the parallel formulation is f(N).) 
> 
	Discussion of iso-efficiency deleted


Basically I agree with everything written here ...
but as long as the issue has arisen I want to add some
historical information and talk about some relevant recent work. 

The idea of how fast the efficiency grows being an important
concept appears in my thesis [1].  This was actually joint work
with Allan Gottlieb and later appeared in JACM [2].
The idea also appears in a paper of Vitter and Simon [3].
Some practical aspects are discussed in a paper
from a NATO summer school [4]; in particular it talks
about Amdahl's Law and other alleged bounds on parallel speedup.

Recently, Larry Rudolph, Marc Snir, and I have returned
to this subject and obtain the following (theoretical) results [5]:

We define an algorithm to be in the class PE (parallel efficient)
if the parallel time for a problem of size N on P processors is
		T(P,N) = O(t(N)/P + P^k) ,
where t(N) is the time of the ("best") sequential algorithm.
I.e., PE captures the problems with parallel algorithms
that achieve optimal speedup over the optimal sequential time
for all problems whose execution times are relatively (polynomially)
large with respect to the number of processors.  Notice that
it is sequential time t(N) we compare to rather problem size N; this
is relevant for problems that have super polynomial executions times.
(Actually, for other reasons, we define PE somewhat differently
but this definition is equivalent.)  We argue that this is the concept
of parallel performance that users of parallel machine desire.

We prove that the class PE is invariant under the shared
memory models (PRAMs), independent of whether the machines
are synchronous or asynchronous, SIMD or MIMD, or have small
or large latency for the processors to communicate.
We also show that PE is invariant under the completely
connected memory module machines for probabilistic algorithms.

We define the class PE* of problems with almost efficient algorithms.
This class is invariant across PRAMs and down to the memory module
machines with the ability to route quickly such as Hypercube
and Butterfly machines.

The paper is quite long: there are other definitions and results,
but enough said for now.

[1]	C. P. Kruskal, Upper and Lower Bounds on the Performance
	of Parallel Algorithms, Ph.D. Thesis, NYU, 1981.

[2]	A. Gottlieb and C. P. Kruskal,
	Complexity results for permuting data and other
	computations on parallel processors.
	Journal of the Association for Computing Machinery 31
	
[3]	J. S. Vitter and R. A. Simons,
	New classes for parallel complexity: a study of unification and other
	complete problems for P.
	IEEE Transactions on Computers TC-35
	(1986) 403-418.
	(1984) 193-209.
	
[4]	C. P. Kruskal,
	Performance bounds on parallel processors: an optimistic view.
	Control Flow and Data Flow: Concepts of Distributed Programming,
	NATO ASI series, Series F: Computer and System Sciences, Vol. 14,
	Manfred Broy (ed.), Springer-Verlag (1985) 331-344.
	
[5]	C. P. Kruskal, L. Rudolph, and M. Snir,
	A complexity theory of efficient parallel algorithms,
	Technical Report (RC 13572 (#60702) 3/4/88),
	IBM T. J. Watson Research Center, 1988.

	An extended abstract will appear in ICALP, July 88
	(Intl. Conf. on Automata Lang. and Programming).