Usenet

Path: utzoo!mnetor!uunet!husc6!bbn!gatech!hubcap!ut-sally!kumar
From: ut-sally!kumar@uunet.UU.NET (Vipin Kumar)
Newsgroups: comp.parallel
Subject: Re: Breakthrough at Sandia?  + Iso-efficiency function
Message-ID: <1373@hubcap.UUCP>
Date: 12 Apr 88 12:33:07 GMT
Sender: fpst@hubcap.UUCP
Lines: 64
Summary: Iso-efficiency functions & Parallel Efficient Algorithms
Approved: parallel@hubcap.clemson.edu

In article <1317@hubcap.UUCP>, kruskal@mimsy.UUCP (Clyde P. Kruskal) writes:
> 
> 
> We define an algorithm to be in the class PE (parallel efficient)
> if the parallel time for a problem of size N on P processors is
> 		T(P,N) = O(t(N)/P + P^k) ,
> where t(N) is the time of the ("best") sequential algorithm.
> I.e., PE captures the problems with parallel algorithms
> that achieve optimal speedup over the optimal sequential time
> for all problems whose execution times are relatively (polynomially)
> large with respect to the number of processors.  Notice that
> it is sequential time t(N) we compare to rather problem size N; this
> is relevant for problems that have super polynomial executions times.
> (Actually, for other reasons, we define PE somewhat differently
> but this definition is equivalent.)  We argue that this is the concept
> of parallel performance that users of parallel machine desire.
> 
> We prove that the class PE is invariant under the shared
> memory models (PRAMs), independent of whether the machines
> are synchronous or asynchronous, SIMD or MIMD, or have small
> or large latency for the processors to communicate.
> We also show that PE is invariant under the completely
> connected memory module machines for probabilistic algorithms.
> 

According to the above definition of PE, any parallel algorithm
with polynomial iso-efficiency function is a parallel efficient algorithm.
This definition of PE appears to be too weak.  The ideal parallel
formulation should have a linear iso-efficiency function (O(P), where
P is the number of processors).

In references [1] and [2], we have  3 different parallel formulations 
of depth-first search on the ring architecture with iso-efficiency
functions of O(k^P), O(P^3 log P), O(P^2 log P), respectively.
Clearly the first formulation is not in PE (its iso-efficiency
function is exponential). The second and third formulations are
in PE, but the third formulation is far superior to the
second, as it has a better iso-efficiency function. (See [1] and [2]).

The choice of architecture is also very important.  For a shared
memory machine (with message combining), we have a parallel
depth-first search formulation whose iso-efficiency function
is O(P log P), which is close to linear (and is significantly
better than O(P^2 log P)).


I think that the users of parallel machines should look for
parallel formulations whose iso-efficiency functions are
linear (or almost linear).


Vipin
-----

References:

[1] Parallel Depth First Search on the Ring Architecture
by Vipin Kumar, V. Nageshwara Rao and K. Ramesh.
To appear in the proc. of 1988 Parallel Processing Conference.


[2] Parallel Depth-First on Multiprocessors, Part II: Analysis
by Vipin Kumar and V. Nageshwar Rao.
To appear in International Journal of Parallel programming.