Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!pasteur!ucbvax!bloom-beacon!gatech!hubcap!kruskal From: kruskal@mimsy.UUCP (Clyde P. Kruskal) Newsgroups: comp.parallel Subject: Re: Breakthrough at Sandia? + Iso-efficiency function Message-ID: <1317@hubcap.UUCP> Date: 6 Apr 88 12:05:57 GMT Sender: fpst@hubcap.UUCP Lines: 93 Approved: parallel@hubcap.clemson.edu In article <1252@hubcap.UUCP> ut-sally!kumar@uunet.UU.NET (Vipin Kumar) writes: > >Clearly, it is fair to increase the problem size to test speedup >on more processor. If the problem size is fixed, then >the speedup will at best saturate at some point. (Each program >has some (however small) fixed sequential component, which >puts a ceiling on the obtainable speedup). >Actually for many parallel algorithms, it is possible to obtain linear speedup >for arbitrarily large N (the number of processors) >by simply increasing the problem size. > >A more interesting question to ask is: how fast should >the problem size (measured in terms of sequential execution time) >grow as a function of N to maintain linear speedup (i.e., constant >efficiency)? In other words, what is the iso-efficiency function >of the parallel formulation? (If the problem size needs to grow as >f(N) to maintain constant efficiency, then the iso-efficiency function >of the parallel formulation is f(N).) > Discussion of iso-efficiency deleted Basically I agree with everything written here ... but as long as the issue has arisen I want to add some historical information and talk about some relevant recent work. The idea of how fast the efficiency grows being an important concept appears in my thesis [1]. This was actually joint work with Allan Gottlieb and later appeared in JACM [2]. The idea also appears in a paper of Vitter and Simon [3]. Some practical aspects are discussed in a paper from a NATO summer school [4]; in particular it talks about Amdahl's Law and other alleged bounds on parallel speedup. Recently, Larry Rudolph, Marc Snir, and I have returned to this subject and obtain the following (theoretical) results [5]: We define an algorithm to be in the class PE (parallel efficient) if the parallel time for a problem of size N on P processors is T(P,N) = O(t(N)/P + P^k) , where t(N) is the time of the ("best") sequential algorithm. I.e., PE captures the problems with parallel algorithms that achieve optimal speedup over the optimal sequential time for all problems whose execution times are relatively (polynomially) large with respect to the number of processors. Notice that it is sequential time t(N) we compare to rather problem size N; this is relevant for problems that have super polynomial executions times. (Actually, for other reasons, we define PE somewhat differently but this definition is equivalent.) We argue that this is the concept of parallel performance that users of parallel machine desire. We prove that the class PE is invariant under the shared memory models (PRAMs), independent of whether the machines are synchronous or asynchronous, SIMD or MIMD, or have small or large latency for the processors to communicate. We also show that PE is invariant under the completely connected memory module machines for probabilistic algorithms. We define the class PE* of problems with almost efficient algorithms. This class is invariant across PRAMs and down to the memory module machines with the ability to route quickly such as Hypercube and Butterfly machines. The paper is quite long: there are other definitions and results, but enough said for now. [1] C. P. Kruskal, Upper and Lower Bounds on the Performance of Parallel Algorithms, Ph.D. Thesis, NYU, 1981. [2] A. Gottlieb and C. P. Kruskal, Complexity results for permuting data and other computations on parallel processors. Journal of the Association for Computing Machinery 31 [3] J. S. Vitter and R. A. Simons, New classes for parallel complexity: a study of unification and other complete problems for P. IEEE Transactions on Computers TC-35 (1986) 403-418. (1984) 193-209. [4] C. P. Kruskal, Performance bounds on parallel processors: an optimistic view. Control Flow and Data Flow: Concepts of Distributed Programming, NATO ASI series, Series F: Computer and System Sciences, Vol. 14, Manfred Broy (ed.), Springer-Verlag (1985) 331-344. [5] C. P. Kruskal, L. Rudolph, and M. Snir, A complexity theory of efficient parallel algorithms, Technical Report (RC 13572 (#60702) 3/4/88), IBM T. J. Watson Research Center, 1988. An extended abstract will appear in ICALP, July 88 (Intl. Conf. on Automata Lang. and Programming).