Gesture control using a depth camera


Gesture control enables hands-free operation and is useful in applications from gaming to human-machine interface (HMI) for industrial and medical equipment, to augmented reality (AR) and virtual reality (VR)virtual reality (VR).  To detect hand gesture one must first segment the hand image from the background.  Traditional method of using lone RGB camera relies on skin color for segmentation and is unreliable due to a wide variation in user skin color.  Recently available depth cameras make gesture detection much more reliable because depth can be used for image segmentation.  Shape analysis is then applied to the segmented foreground (hand) to detect and track finger tips and palm to recognize gesture.  In this blog post I will describe how to detect gesture using a depth camera, more specifically Texas Instruments time-of-flight sensor.


Segmenting the hand

Depth image sensors  produce both an amplitude image and a depth image each frame.  Each pixel in the depth image represents euclidean distance measured from the pixel to the point in the scene.  If the hand is the closest object to the depth camera. Image segmentation become finding closest pixel and neighboring pixels with similar depth.  The segmented hand image is then converted to a binary image through thresholding.  The binary image often leaves small specks that may need to be cleaned up–this is a job for the morphology open operator.  The open operator will not only remove specks but also fill-in small holes and cracks in the blob.  The end result should be a clean blob.


Finding the contour

The contour of a clean blob can be extracted using OpenCV cv::findContours() function.


OpenCV also provides APIs for returning contour lengths and contour areas, and can be exploited to qualify valid contours if there are any unwanted small blobs after the open operation.

Finding the palm

Palm, or hand center, can be found by computing the image moment, defined by:



Finding hand orientation

Hand orientation can be determined by performing Principal Components Analysis (PCA), which returns a major axis and a minor axis.  OpenCV supports PCA, and takes a collection of points as input and outputs two unit vectors and two eigenvalues.  In the image below, the eigenvectors are represented by the yellow and blue axis; the magnitudes of these vectors are eigenvalues.


Finding the fingers

Fingers can be determined by finding convex hulls and convexity defects.  Convex hulls are vertices of the smallest convex polygon enclosing the contour.  Convexity defects are deepest points between convex hulls where convexity is violated (defect).  At this point, fingertips are essentially convex hulls.  However, in some cases many convex hulls can be very closely grouped together tracing out the curvature of one fingertip, and one must reduce them to just one point.  One way is finding the convex hull between two adjacent convexity defects that is farthest from the palm center.


The algorithm can be applied depth sensor with resolution as low as 80 x 60 pixels, the resolution of OPT8320.   In fact when two hands are presented, both can be tracked.


Demo video

The algorithm was demonstrated in an automotive settings using the OPT8320-CDK-EVM clipped to the rearview mirror.  It would have worked just as well if mounted to the roof control console of the car.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s