TY - CHAP
T1 - Hand parsing and gesture recognition with a commodity depth camera
AU - Liang, Hui
AU - Yuan, Junsong
N1 - Publisher Copyright: © Springer International Publishing Switzerland 2014.
PY - 2014
Y1 - 2014
N2 - Hand pose tracking and gesture recognition are useful techniques in human-computer interaction (HCI) scenarios, while previous work in this field suffers from the lack of discriminative features to differentiate and track hand parts. In this chapter, we present a robust hand parsing scheme to obtain a high-level and discriminative representation of the hand from raw depth image. A novel distanceadaptive feature selectionmethod is proposed to generatemore discriminative depthcontext features for hand parsing. The random decision forest is adopted for per-pixel labeling, and it is combined with the temporal prior to form an ensemble of classifiers for enhanced performance. To enforce the spatial smoothness and remove the misclassified isolated regions, we further build a superpixelMarkov random field, which is capable to handle the per-pixel labeling error at variable scales. To demonstrate the effectiveness of our proposed method, we have compared it to the benchmark methods. The results show it produces 17.2 % higher accuracy on the synthesized datasets for single-frame parsing. The tests on the real-world sequences show our method is more robust against complex hand poses. In addition, we develop a hand gesture recognition algorithm with the hand parsing results. The experiments show our method achieves good performance compared to state-of-the-art methods.
AB - Hand pose tracking and gesture recognition are useful techniques in human-computer interaction (HCI) scenarios, while previous work in this field suffers from the lack of discriminative features to differentiate and track hand parts. In this chapter, we present a robust hand parsing scheme to obtain a high-level and discriminative representation of the hand from raw depth image. A novel distanceadaptive feature selectionmethod is proposed to generatemore discriminative depthcontext features for hand parsing. The random decision forest is adopted for per-pixel labeling, and it is combined with the temporal prior to form an ensemble of classifiers for enhanced performance. To enforce the spatial smoothness and remove the misclassified isolated regions, we further build a superpixelMarkov random field, which is capable to handle the per-pixel labeling error at variable scales. To demonstrate the effectiveness of our proposed method, we have compared it to the benchmark methods. The results show it produces 17.2 % higher accuracy on the synthesized datasets for single-frame parsing. The tests on the real-world sequences show our method is more robust against complex hand poses. In addition, we develop a hand gesture recognition algorithm with the hand parsing results. The experiments show our method achieves good performance compared to state-of-the-art methods.
UR - https://www.scopus.com/pages/publications/84984874803
U2 - 10.1007/978-3-319-08651-4_12
DO - 10.1007/978-3-319-08651-4_12
M3 - Chapter
T3 - Advances in Computer Vision and Pattern Recognition
SP - 239
EP - 265
BT - Advances in Computer Vision and Pattern Recognition
PB - Springer-Verlag London Ltd
ER -