next up previous contents
Next: The External Heapsort Up: Tuning the cache behavior Previous: Programming considerations

Performance of D-heaps

Three tests has been run to verify the performance of the d-heap. In the first test (figure 5.2) the heap is aligned suboptimally in memory and the fanout is set to 3,4 and 8 respectively. Both the absolute timing and the speed relative to the traditional heap are displayed. In the second test (figure 5.3) the heap is aligned optimally in memory but otherwise it is the same as the first test. It not possible to align a 3-heap correctly in memory though, so the run with the 3-heap was not included here. The third test (figure 5.4) is simply a rerun of the second test, but with a different compiler. The two first tests were done with the KCC compiler from Kuch and associates, while in the third test, the programs were compiled with gnu g++ version 2.8.1.

   figure181
Figure 5.2: Unaligned D-heaps on PowerPC 332MHz

   figure187
Figure 5.3: Aligned D-heaps on PowerPC 332MHz

   figure193
Figure 5.4: Aligned D-heaps on PowerPC 332MHz compiled with g++

The machine used for the tests, has a 32KB first level cache with a cache line size of 32 bytes, and a 256KB second level cache with a cache line size of 64 bytes. The elements sorted are double precision (8 byte) floating point values.

A few observations can be made from the graphs. First it is noted that alignment is important, especially in the runs with a large fanout. For a fanout of 8 the aligned heap is about 25% faster than the unaligned heaps on large problem sizes, and the speedup relative to the 2-heap improves from 1.8 to 2. It is also quite impressive that we can get a speedup of 2 from the 8-heap.- Theoretically the maximum obtainable speedup is tex2html_wrap_inline1323 .

The third test is interesting. When compiled with g++;- all the programs runs about 40% slower when the problem fits in the cache;- suddenly the 4-heap is faster than the 8-heap;- and the best speedup obtained is only 1.6. This all supports the remark in section 5.3 that the performance of the linear search for the maximum child is absolutely critical for the feasibility of a d-heap.


next up previous contents
Next: The External Heapsort Up: Tuning the cache behavior Previous: Programming considerations

Jesper Bojesen
Wed Nov 4 15:35:15 MET 1998