TY - GEN
T1 - Performance guarantees for B-trees with different-sized atomic keys
AU - Bender, Michael A.
AU - Hu, Haodong
AU - Kuszmaul, Bradley C.
PY - 2010/6/6
Y1 - 2010/6/6
N2 - Most B-tree papers assume that all N keys have the same size K, that F = B/K keys fit in a disk block, and therefore that the search cost is O(log f+1 N) block transfers. When keys have variable size, however, B-tree operations have no nontrivial performance guarantees. This paper provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly. This paper describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K, the search cost is O(⌈K/B⌉ log1+⌈K/B⌉ N) expected transfers. The paper proves that it is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O(NK) operations and O(NK/B) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost. For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ Specifically the amortized cost to insert a key κ of arbitrary length and random rank is O(⌈K/B⌉ log1+⌈K/B⌉ N + |κ| /B) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.
AB - Most B-tree papers assume that all N keys have the same size K, that F = B/K keys fit in a disk block, and therefore that the search cost is O(log f+1 N) block transfers. When keys have variable size, however, B-tree operations have no nontrivial performance guarantees. This paper provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly. This paper describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K, the search cost is O(⌈K/B⌉ log1+⌈K/B⌉ N) expected transfers. The paper proves that it is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O(NK) operations and O(NK/B) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost. For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ Specifically the amortized cost to insert a key κ of arbitrary length and random rank is O(⌈K/B⌉ log1+⌈K/B⌉ N + |κ| /B) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.
KW - B-tree with different-sized keys
KW - atomic keys
KW - dynamic programming
UR - https://www.scopus.com/pages/publications/77954736988
U2 - 10.1145/1807085.1807125
DO - 10.1145/1807085.1807125
M3 - Conference contribution
SN - 9781450300339
T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
SP - 305
EP - 316
BT - PODS'10 - Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
PB - Association for Computing Machinery (ACM)
T2 - 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010
Y2 - 6 June 2010 through 11 June 2010
ER -