The memory model assumed for this discussion is a paged memory. The N elements to be sorted are stored in a memory of M pages, with room for S elements per page. Thus N=MS.
The heapsort algorithms seen so far are not very well suited for
external sorting. The expected number of page accesses for the basic
heapsort in a memory constrained environment (e. g. 3 pages of memory) is
where N is the number of records and M is the number of
disk blocks. We would like it to be
which typically is
orders of magnitude faster. (example: 8 byte element, 4KB page,
translates into 512 elements pr page).
NOTE: Be aware that the words 'page access' does not necessarily imply that disk accesses are controlled by the virtual memory subsystem. It may be completely under program control.
One may wonder if the d-heaps described in chapter 5 can be
used effeciently as an external sorting method. If we consider the
same situation as above, it is seen that if we use the number of
elements per page S as the fanout, and align the heap on external
memory so that all siblings fit in one page, the depth of the tree
will drop with a factor and thus the number of page accesses
will drop accordingly. As observed in the chapter on d-heaps, the cost
of a page access should be weighted against the cost of S
comparisons needed to find the maximal child. Anyway, because the cost
of a disk access is normally much higher than the time it takes to
search linearly through one page of data, this idea is definitely an
improvement over the traditional heap.
The drawback of the idea is twofold. Firstly it only improves the
external cost with a factor , not the desired factor S, and
secondly it increases the internal cost with a factor
. Thus it
actually makes sorting much slower in the situation where all data
fits in memory. Considering that gigabyte main memories are now
commonplace and that terabyte memories will eventually be affordable,
it is reasonable to require a good internal performance even of external
methods.