Multi-touch–Part 3: Finger Tracking & Gesture Recognition


tumblr_mk2033JBM81rtinz0o1_1280This is Part 3 of a 4-part series on multi-touch technology.  In Part 1 I presented the theory of operation behind many capacitive multi-touch sensors.  In Part 2 I discussed the noise filtering and touch extraction algorithms.  This article, I will present the common techniques for finger tracking and gesture recognition.

What’s a Touch?

Touch properties.

Touch properties.

A touch has several attributes.  It has a location, usually defined in the screen coordinates, a contact area, usually an ellipse, and an orientation, defined by the ellipse’ major axis.  Knowing the area may tell the software whether the touch is likely from a thumb or one of the fingers.  Area variation from the same finger may tell how hard the finger is pressing.   Orientation can be used for direction indication.  Finally, touch location is used along with graphical widgets to define user interactions.  Touches from the same finger, sequenced in time, form the trajectories used for gesture recognition–this is finger tracking, which is discussed next.

Finger Tracking

Proper touch-to-finger matching.

Proper touch-to-finger matching.

Finger tracking matches new touches to existing finger trajectories.  The challenge is, given a new set of touches, which touch should be matched to which finger?  The simplest approach would be to match a finger to the nearest touch.  But a touch can be the closest  to more than one finger; and different assignments could result depending on the matching order.  A more sophisticated approach would be to match first the touch-finger pair with the shortest distance, eliminate the pair, and then repeat by finding the next pair with the shortest distance.

As the match goes along, the distance of the remaining pairs will grow larger, eventually to point where the pair are separated so much that correct match is unlikely, and they are left touchup_downunmatched.  After the matching processing completes, any residual touches are new touch-downs, and new finger trajectory is created for each new touch-down.  Similarly, any residual fingers after match are touch-ups, and their finger trajectories are deleted.

When a touch is matched to a finger, the touch is added to the finger’s trajectory.  A finger trajectory is simply a sequence of touches from the same finger that begins with a touch-down, and ends with a touch-up.  Whenever there is more touches than fingers in a match, there will be new touch-downs. Similarly, whenever there is more fingers than touches, there will be new touch-ups.

Simple Gestures

Most consumer devices today only implements simple gestures, such as single and double taps, and pinch and zoom.  The figure below shows some of the more popular simple gestures.

Common simple gestures.

Common simple gestures.

Simple gestures can be implemented using heuristics to avoid bringing in complex state machine algorithm.    For example, a single-tap is a finger trajectory with short overall trajectory duration and displacement; a double-tap is a trajectory where consecutive touches having relatively large temporal spacing and short distance spacing.  Two-finger gestures are parametric–one parameter is pinch-zoom, a value proportional to the distance between the two touches; another parameter is rotation, which measures the slope change of the line segment between the two fingers.

Complex Gestures

Recognition of complex gestures are, in essence, translation of multi-finger trajectories into symbols.  One example is handwriting recognition, where finger trajectories are analysed to extract key parametric features, and then the features are then compared to the stereotypical values of each symbol.  The degree of resemblance is defined by the distance between the sample features and the prototype features in the feature space.

So, what are the features?  Dean Robin defined a set of complex gesture features common used today that includes:

  • Initial angle of the gesture
  • Length and angle of the bounding box diagonal
  • Angle between first and last touch of the trajectory
  • Trajectory’s overall length and duration and maximum speed
  • Total angle traversed

These feature parameters form a feature vector.  As a gesture progresses, its current feature vector is a point moving through the feature space. Each gesture symbol would have a point in the feature space space as well.  The distance between the gesture’s feature point and each of gesture symbol’s feature point can be computed and normalized to form a percentage, or matching probability.

Another comparison method is the Hidden Markov Model (HMM), based on transitional probabilities.  The readers are left to study that method.

Gesture Vocabulary and Dictionary

Each gesture symbol needs to have a stereotype feature vector.  To train the feature vector, the teacher should repeat the gesture multiple times. The training trials for will in general form a cluster of points in the feature space, with the cluster centre and the variance forming a probability model for the gesture.  The probability model is considered a vocabulary.  A group of gesture vocabularies for a specific applications forms a dictionary that is loaded when the application starts.


This is Part 3 of a 4-part series on multi-touch technology.  Part 1 presented the theory of operation behind many capacitive multi-touch sensors.  Part 2 discussed the noise filtering and touch extraction algorithms.  This article presented the common techniques for finger tracking and gesture recognition, the different properties of a touch, how touches are matched to fingers to create finger trajectories, and how trajectories are used to gesture recognition. The next and the last article will discuss how multi-touch Linux driver works.  Please follow this blog or check back frequently for the link to the final instalment.

Further Reading:

(The above article is solely the expressed opinion of the author and does not necessarily reflect the position of his current and past employers)

Leave a Reply