TY - GEN
T1 - Tahoe
T2 - 16th European Conference on Computer Systems, EuroSys 2021
AU - Xie, Zhen
AU - Dong, Wenqian
AU - Liu, Jiawen
AU - Liu, Hang
AU - Li, Dong
N1 - Publisher Copyright: © 2021 Owner/Author.
PY - 2021/4/21
Y1 - 2021/4/21
N2 - Decision trees are widely used and often assembled as a forest to boost prediction accuracy. However, using decision trees for inference on GPU is challenging, because of irregular memory access patterns and imbalance workloads across threads. This paper proposes Tahoe, a tree structure-aware high performance inference engine for decision tree ensemble. Tahoe rearranges tree nodes to enable efficient and coalesced memory accesses; Tahoe also rearranges trees, such that trees with similar structures are grouped together in memory and assigned to threads in a balanced way. Besides memory access efficiency, we introduce a set of inference strategies, each of which uses shared memory differently and has different implications on reduction overhead. We introduce performance models to guide the selection of the inference strategies for arbitrary forests and data set. Tahoe consistently outperforms the state-of-the-art industry-quality library FIL by 3.82x, 2.59x, and 2.75x on three generations of NVIDIA GPUs (Kepler, Pascal, and Volta), respectively.
AB - Decision trees are widely used and often assembled as a forest to boost prediction accuracy. However, using decision trees for inference on GPU is challenging, because of irregular memory access patterns and imbalance workloads across threads. This paper proposes Tahoe, a tree structure-aware high performance inference engine for decision tree ensemble. Tahoe rearranges tree nodes to enable efficient and coalesced memory accesses; Tahoe also rearranges trees, such that trees with similar structures are grouped together in memory and assigned to threads in a balanced way. Besides memory access efficiency, we introduce a set of inference strategies, each of which uses shared memory differently and has different implications on reduction overhead. We introduce performance models to guide the selection of the inference strategies for arbitrary forests and data set. Tahoe consistently outperforms the state-of-the-art industry-quality library FIL by 3.82x, 2.59x, and 2.75x on three generations of NVIDIA GPUs (Kepler, Pascal, and Volta), respectively.
KW - Decision tree ensemble
KW - Decision tree inference
KW - Performance model
KW - Tree structure
UR - https://www.scopus.com/pages/publications/85105332442
U2 - 10.1145/3447786.3456251
DO - 10.1145/3447786.3456251
M3 - Conference contribution
T3 - EuroSys 2021 - Proceedings of the 16th European Conference on Computer Systems
SP - 426
EP - 440
BT - EuroSys 2021 - Proceedings of the 16th European Conference on Computer Systems
PB - Association for Computing Machinery, Inc
Y2 - 26 April 2021 through 28 April 2021
ER -