Efficient Chip Multi Processor Programming

Master's thesis by Sune Tougaard-Andersen (September 2011)


Abstract

In this work a realistic machine model, the CMP-model, is investigated. This model captures the cache hierarchies on mainstream CMPs as well as the way these caches interact.

A parallel programming library for benchmarking is presented. The presented library introduces policy based scheduling, allowing fair evaluation of scheduling algorithms. The parallel-depth-first scheduler theoretically performs better than the widely used work-stealing scheduler, in the CMP-cache model. Both schedulers are implemented in the benchmarking library.

Efficient parallelizations of the well-known sequential sorting algorithms quicksort and multi-way mergesort are analysed based on the CMP-cache model. The parallel quicksort algorithm is based on a parallelization of the in-place sequential partitioning algorithm. The parallel multi-way mergesort is based on a f -way partitioning algorithm that makes it possible to partition multiple sequences.

The presented library is used to evaluate the two analysed parallel sorting algorithms using both the implemented schedulers. We find that efficient parallel sorting algorithms are limited by memory access. We also find that the algorithmic techniques know from the external-memory model can be used to gain n increase performance on a CMP.


The whole thesis is available for download: [pdf]
The source code is available as an archive: [zip]