In order to formulate a heap property for a heap with holes, we need a few definitions that allows us to keep track of, and control, the location of the holes in the heap.- Holes are classified as either white holes or black holes. A hole is defined to be black if it is the root of an empty subheap. Otherwise it is termed white.
The heap property for a heap with holes can now be defined as:
For every node it is true that
With these definitions it is clear that there must be a non branching path of white holes from the root downwards, that can be used to find the maximal page.- The second condition guarantees that the path does not branch, while the first condition ensures that the largest elements of the heap can be found as immediate descendants of these white holes.
The process of extracting the largest page, can now be loosely described as a siftup from the bottom of the white hole chain. During the siftup, the largest elements so far are merged with the siblings along the path bottom up.- If any of the siblings are black holes, the merge is replaced by a page move.- It is obvious that this process does indeed produce a page with the largest elements in the heap, but it takes more consideration to accept that the merges can be arranged so that the heap property is also valid after the siftup.
The ordering condition of the heap property is fairly easy to maintain. It is done as in the siftdown procedure, by ensuring that the smallest element of the two pages are not moved. I.e. the lower half of the elements merged, must go to the page that holds the smallest element.
Also it is quite obvious that no pairs of white holes can be introduced by this process.- All merges are merges of siblings, which leaves exactly one hole behind. So there is no change of producing a pair of siblings that are white holes by a merge. - The only situation where the merge is not done, is when the rising page passes a hole. But this hole is black and remains black.
The external memory model used in [5] uses indirect addressing of pages. Input to the sorting program is an array of page addresses and output is a similar list, which specifies the ordering of the pages for the sorted data. With this model, two pages can be swapped with no external cost at all, and holes in the heap can be reused as output area while sorting, so that the sort remains inplace on external memory, even though the heap does not shrink during the sort.
There is an implementation of an external heapsort with holes, without indirect addressing of pages, in heaplab directory exthole/. Unfortunately the implementation is not very readable because the bit map used to keep track of the location of holes, is implemented using C macroes. I was not able to get the C++ vector<bool> facility to work in a portable way.