Dexter 1

A Dataset for Evaluation of 3D Articulated Hand Motion Tracking.
S. Sridhar, A. Oulasvirta, and C. Theobalt, Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data, Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 2013.


Dexter 1 consists of 7 sequences of challenging, slow and fast hand motions that covers the abduction-adduction and flexion-extension of the hand. Roughly the first 250 frames in each sequence correspond to slow motions while the remaining frames are fast motions. All sequences are with a single actor's right hand.


  • RGB: 5 Sony DFW-V500 RGB cameras at 25 fps
  • Depth: 1 Creative Gesture Camera (Close-range ToF depth sensor) at 25 fps
  • Depth: 1 Kinect structured light camera at 30 fps
  • Ground Truth: Manually annotated on ToF depth data for 3D fingertip positions
While the RGB and ToF data is mostly synchronized, the structured light data is not synchronized.


  • Compressed Tarball: Single file (tar.gz, 2.9 GB), SHA-256:
  • Browse: Link
  • Camera Setup: [ Picture 1 | Picture 2 ]
Note that the RGB data used in the paper and previews has been enhanced by adjusting color levels. See script for more details.


If you use this dataset in your work, you are required to cite the following paper. BibTeX, 1 KB

author = {Sridhar, Srinath and Oulasvirta, Antti and Theobalt, Christian},
title = {Interactive Markerless Articulated Hand Motion Tracking using RGB and Depth Data},
booktitle = {Proceedings of the {IEEE} International Conference on Computer Vision ({ICCV)}},
url = {},
numpages = {8},
month = Dec,
year = {2013}


The following table lists the average fingertip error for several published algorithms. The error is computed as the average Euclidean distance of all estimated 3D fingertip locations from the 5 ground truth 3D positions. (The palm center annotation is not used in all except one method.)

Algorithm Average Error [mm]
Sridhar et al., ICCV 2013 13.1[1]
Sridhar et al., 3DV 2014 24.1
Franziska Mueller, 2014
(offline method)
Sridhar et al., CVPR 2015 19.6
[1] Includes palm center, first 250 frames only
[2] Palm center not used, all frames

If you would like to include your algorithm results here, please contact us.

Sequence Details

Please click on the image above for a larger preview of the links below for a video preview.
  1. adbadd: Abduction-adduction of all fingers together. Please note the spelling.
  2. fingercount: Counting using each finger and the thumb.
  3. fingerwave: Waving of fingers.
  4. flexex1: Flexion-extension of all fingers.
  5. pinch: Pinching while moving around.
  6. random: Random motions with articulation.
  7. tigergrasp: Making a posture like a tiger grasping.

Dataset Structure

The root directory (containing this file) consists of 3 sub-directories.
  1. calibration: Contains information for calibrating the cameras for intrinsic and extrinsic parameters. We recommend using the MATLAB calibration toolbox.
  2. preview: Montage video preview of 3 cameras and structured light data for all sequences.
  3. data: All the data resides here.
      • multicam: Data from RGB cameras stored as PNG files in separate directories.
      • tof: The ToF data.
        • confmap: Confidence map as 16-bit PNG.
        • depth: Depth map as 16-bit PNG.
        • rgb: Empty.
        • uvmap: Emtpy.
        • vertices: Point cloud from depth map stored in PCD format (see Units are mm.
      • structlight: Unsynchronized structured light data in ONI format.
      • annotations: Contains manually annotated data and a preview of annotations. See README.txt inside the directory for more details.


We thank Thomas Helten for helping with data capture.