Parallel external memory

This is an old revision of this page, as edited by Mwoelkde (talk | contribs) at 21:00, 21 January 2019 (Removed collapse templates according to https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Scrolling_lists_and_collapsible_content). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Parallel external memory (model)

In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine.[1] It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.

Model

...

Read / Write conflicts

...

Complexity measure

...

Examples

Prefixsum

...

Multiway partitioning

...

Selection

The selection problem is about finding the k-th smallest item in an unordered list   of size  . The following code[1] makes use of PRAMSORT which is a PRAM optimal sorting algorithm which runs in  , and SELECT, which is a cache optimal single-processor selection algorithm.

if   then 
   
  return  
end if 

for each processor   in parallel do 
  //Find median of each  
   
end for 

// Sort medians
 

// Partition around median of medians
 

if   then 
  return  
else 
  return  
end if

Under the assumption that the input is stored in contiguous memory, PEMSELECT has an I/O complexity of:

 

Distribution sort

Distribution sort partitions an input list   of size   into   disjoint buckets of similar size. Every bucket is then sorted recursively and the results are combined into a fully sorted list.

If   the task is delegated to a cache-optimal single-processor sorting algorithm.

Otherwise the following algorithm[1] is used:

// Sample   elements from  
for each processor   in parallel do
  if   then
     
    Load   in  -sized pages and sort pages individually
  else
     
    Load and sort   as single page
  end if
  Pick every  'th element from each sorted memory page into contiguous vector   of samples
end for 

in parallel do
  Combine vectors   into a single contiguous vector  
  Make   copies of  :  
end do

// Find   pivots  
for   to   in parallel do
   
end for

Pack pivots in contiguous array  

// Partition  around pivots into buckets  
 

// Recursively sort buckets
for   to   in parallel do
  recursively call   on bucket  of size  
  using   processors responsible for elements in bucket  
end for

The I/O complexity of PEMDISTSORT is:

 

where

 

If the number of processors is chosen that  and   the I/O complexity is then:

 

Other PEM algorithms

PEM Algorithm I/O complexity Constraints
Mergesort[1]    
List ranking[2]    
Euler tour[2]    
Expression tree evaluation[2]    
Finding a MST[2]    

Where  is the time it takes to sort  items with  processors in the PEM model.

See also

  • ...

References

  1. ^ a b c d Arge, Lars; Goodrich, Michael T.; Nelson, Michael; Sitchinava, Nodari (2008). "Fundamental parallel algorithms for private-cache chip multiprocessors". Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures - SPAA '08. New York, New York, USA: ACM Press. doi:10.1145/1378533.1378573. ISBN 9781595939739.
  2. ^ a b c d Arge, Lars; Goodrich, Michael T.; Sitchinava, Nodari (2010). "Parallel external memory graph algorithms". 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE. doi:10.1109/ipdps.2010.5470440. ISBN 9781424464425.