When sorting a sequence of n elements, it is wellknown that every
comparisonbased sorting algorithm must perform at least n lg n −
O(n) element comparisons. Hence, it seems unavoidable that any
sorting algorithm will also execute about the same number of
conditional branches. Since nothing is known about the order of the
input elements, it is hard to predict the outcome of the branches
related to element comparisons. On an average, every second such
branch may lead to a branch misprediction. In this report, together with an accompanying tar ball, we
release the source code of the sorting programs discussed and
experimented in the following papers:

Amr Elmasry, Jyrki Katajainen, and Max Stenmark,
Branch mispredictions don’t affect mergesort,
Proceedings of the 11th International Symposium on Experimental
Algorithms, Lecture Notes in Computer Science 7276, SpringerVerlag
(2012), 160171

Amr Elmasry and Jyrki Katajainen,
Lean programs, branch mispredictions, and sorting,
Proceedings of the 6th International Conference on Fun with
Algorithms, Lecture Notes in Computer Science 7288, SpringerVerlag
(2012), 119130
The key idea in the described sorting programs is to decouple element
comparisons from conditional branches. Therefore, these programs will
incur significantly fewer branch mispredictions than, for example, the
C++ standardlibrary introsort (introspective medianofthree
quicksort that switches to heapsort if the recursion depth gets too
high). The sorting programs given can be divided into three categories
depending on how many branch mispredictions they incur during their
execution.

Lean:
 O(1) branch mispredictions incurred when the branch
predictor used by the underlying hardware is static.
 Moderately optimized:
 O(n) branch mispredictions
incurred under the same assumptions as above.
 Unoptimized:
 The number of branch mispredictions incurred is
proportional to the number of element comparisons performed.
As shown in [2], all programs can be made lean. However, in practice,
a moderatelyoptimized variant is often faster than a lean variant.
This is also true for the variants of heapsort, mergesort, and
quicksort studied here.
