Fast and Robust Hand Tracking Using Detection-Guided Optimization

Computer Vision and Pattern Recognition (CVPR) 2015, Boston, USA

Download Video (MP4, 1080p, 480 MB)


Markerless tracking of hands and fingers is a promising enabler for human-computer interaction. However, adoption has been limited because of tracking inaccuracies, incomplete coverage of motions, low framerate, complex camera setups, and high computational requirements. In this paper, we present a fast method for accurately tracking rapid and complex articulations of the hand using a single depth camera. Our algorithm uses a novel detection-guided optimization strategy that increases the robustness and speed of pose estimation. In the detection step, a randomized decision forest classifies pixels into parts of the hand. In the optimization step, a novel objective function combines the detected part labels and a Gaussian mixture representation of the depth to estimate a pose that best fits the depth. Our approach needs comparably less computational resources which makes it extremely fast (50 fps without GPU support). The approach also supports varying static, or moving, camera-to-scene arrangements. We show the benefits of our method by evaluating on public datasets and comparing against previous work.



BibTeX, 1 KB

 author = {Sridhar, Srinath and Mueller, Franziska and Oulasvirta, Antti and Theobalt, Christian},
 title = {Fast and Robust Hand Tracking Using Detection-Guided Optimization},
 booktitle = {Proceedings of Computer Vision and Pattern Recognition ({CVPR})},
 url = {},
 numpages = {9},
 month = June,
 year = {2015}

Related Pages

  • Investigating the Dexterity of Multi-Finger Input for Mid-Air Text Entry, CHI 2015 (webpage)
  • Real-time Hand Tracking Using a Sum of Anisotropic Gaussians Model, 3DV 2014 (webpage)
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data, ICCV 2013 (webpage)
  • HandSonor: A Customizable Vision-based Control Interface for Musical Expression, Extended Abstracts, CHI 2013 (webpage)



This research was funded by the ERC Starting Grant projects CapReal (335545) and COMPUTED (637991), and the Academy of Finland. We would like to thank Christian Richardt.


Srinath Sridhar

This page is Zotero translator friendly.

Imprint/Impressum | Data Protection/Datenschutzhinweis