RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

SIGGRAPHAsia2020 Virtual Conference

Download Video(MP4, 190 MB)


Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.



BibTeX, 1 KB

  title={{RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video}},
  author={Wang, Jiayi and Mueller, Franziska and Bernard, Florian and Sorli, Suzanne and Sotnychenko, Oleksandr and Qian, Neng and Otaduy, Miguel A. and Casas, Dan and Theobalt, Christian},
  journal={ACM Transactions on Graphics (TOG)},


The work was supported by the ERC Consolidator Grants 4DRepLy (770784) and TouchDesign (772738) and Spanish Ministry of Science (RTI2018-098694-B-I00 VizLearning).


Jiayi Wang

This page is Zotero and Mendeley translator friendly.

Imprint/Impressum | Data Protection/Datenschutzhinweis