Skip to main navigation Skip to search Skip to main content

Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs

  • Nanyang Technological University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

278 Scopus citations

Abstract

Articulated hand pose estimation plays an important role in human-computer interaction. Despite the recent progress, the accuracy of existing methods is still not satisfactory, partially due to the difficulty of embedded highdimensional and non-linear regression problem. Different from the existing discriminative methods that regress for the hand pose with a single depth image, we propose to first project the query depth image onto three orthogonal planes and utilize these multi-view projections to regress for 2D heat-maps which estimate the joint positions on each plane. These multi-view heat-maps are then fused to produce final 3D hand pose estimation with learned pose priors. Experiments show that the proposed method largely outperforms state-of-the-art on a challenging dataset. Moreover, a cross-dataset experiment also demonstrates the good generalization ability of the proposed method.

Original languageEnglish
Title of host publicationProceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
PublisherIEEE Computer Society
Pages3593-3601
Number of pages9
ISBN (Electronic)9781467388504
DOIs
StatePublished - Dec 9 2016
Event29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 - Las Vegas, United States
Duration: Jun 26 2016Jul 1 2016

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2016-December

Conference

Conference29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
Country/TerritoryUnited States
CityLas Vegas
Period06/26/1607/1/16

Fingerprint

Dive into the research topics of 'Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs'. Together they form a unique fingerprint.

Cite this