Jyrki Katajainen: Publications

A reliable randomized algorithm for the closest-pair problem
Authors:	Martin Dietzfelbinger, Torben Hagerup, Jyrki Katajainen, and Martti Penttonen
Publication:	Report 93/25, Department of Computer Science, University of Copenhagen (1993), 31 pp.
Full text:	PDF (307 kB)
Abstract:	The following two computational problems are studied: Duplicate grouping: Assume that n items are given, each of which is labeled by an integer key from the set {0,…,U − 1}. Store the items in an array of size n such that items with the same key occupy a contiguous segment of the array. Closest pair: Assume that a multiset of n points in the d-dimensional Euclidean space is given, where d ≥ 1 is a fixed integer. Each point is represented as a d-tuple of integers in the range {0,…,U − 1} (or of arbitrary real numbers). Find a closest pair, i. e., a pair of points whose distance is minimal over all such pairs. In 1976 Rabin described a probabilistic algorithm for the closest-pair problem that takes linear expected time. As a subroutine, he used a hashing procedure whose implementation was left open. Only years later randomized hashing schemes suitable for filling this gap were developed. In this paper, we return to Rabin’s classic algorithm in order to provide a fully detailed description and analysis, thereby also extending and strengthening his result. As a preliminary step, we study randomized algorithms for the duplicate-grouping problem. In the course of solving the duplicate-grouping problem, we describe a new universal class of hash functions of independent interest. It is shown that both of the above problems can be solved by probabilistic algorithms that use O(n) space and finish in O(n) time with probability tending to 1 as n grows to infinity. The model of computation is a unit-cost RAM capable of generating random numbers and of performing arithmetic operations from the set {+, −, , div, log₂, exp₂}, where div denotes integer division, and log₂ and exp₂ are the mappings from ℕ to ℕ∪{0} with log₂(m) = ⌊log₂ m⌋ and exp₂(m) = 2^m, for all m ∈ ℕ. If the operations log₂ and exp₂ are not allowed, the running time of the algorithms increases by an additive term of O(log log U). All numbers manipulated by the algorithms consist of O(log n + log U) bits. We consider two variants of the algorithm for the closest-pair problem. One uses only O(log n + log U) random bits and still has probability O(n^−α) of exceeding the time bound O(n), where α ≥ 1 is a constant that can be chosen arbitrarily. The other one uses O(n log n + log U) random bits, but reduces the probability that the time bound is exceeded to 2^{−n^Ω(1)}. The algorithm for the closest-pair problem also works if the coordinates of the points are arbitrary real numbers, provided that the RAM is able to perform arithmetic operations from {+, −, , div} on real numbers, where a div b now means ⌊a / b⌋. In this case, the running time is O(n) with log₂ and exp₂ and O(n + log log(δ_max / δ_min)) without them, where δ_max is the maximum and δ_min is the minimum distance between any two distinct input points.
Related:	HTML (Journal article)
BibL^AT_EX:	@report{DHKP1993R, author = {Martin Dietzfelbinger and Torben Hagerup and Jyrki Katajainen and Martti Penttonen}, title = {A reliable randomized algorithm for the closest-pair problem}, type = {Report}, number = {93/25}, institution = {Department of Computer Science, University of Copenhagen}, year = {1993}, pagetotal = {31}, }

This page was generated by Jyrki Katajainen <jyrki@di.ku.dk> on 17.05.2017.