Introduction

Previous Work

Algorithm

Experiment

Conclusions

References

Report

Authors:

crh13 | rz33

Hand Gesture Recognition

EE 547: Computer Vision Poster Project

Remik Ziemlinski     Christopher Hynes



Algorithm

A binary version of the image is initially created using a threshold passed in as a argument. Then edge detection is performed using the Sobel maps. Combining the 2 resultant images, a strong silhouette of the original data is achieved. Before sensitive image processing, the binary image data is filtered twofold for noisy regions too small to be significant (also specifiable by program argument).


The center of mass and orientation are then computed. The center of mass uses a distance transform on the region pixels with respect to the boundary pixels. Orientation is determined by using slope densities. From this, a skeleton of the hand is created. Starting from the center of mass, a quadrilateral is created perpendicular to the orientation. It then balloons outwards in four directions along each edge until each hits a boundary of the hand. This is done to separate the "palm" and wrist from the fingers.

All the remaining branches are passed through a "maximum distance" filter max_dist that returns the longest path through the branch. This is to eliminate excess noise from the skeleton and to find where the finger is. This algorithm assumes a binary image, and follows the skeleton from the start point until a branch is reached. Through recursion it then finds the longer path. It terminates when the end of the skeleton is reached.



After checking all the branches, the longest 5 are selected as the fingers. These branches are used to create polylines, which in turn are used to determine the orientation of the fingers. Classification under the proposed constraints follows a decision tree model. Since our approach is independent of any training, any data set is testable. This is a bonus for experimentation and confidence measurement, however it does have its drawbacks.



The most significant shortcoming is that the decision tree is statically fixed. It approaches classification through an exhaustive means, and is not amenable to dynamic settings. A new encoding would be necessary to make the classifier extendable. Since the parameters that model the hand are finite and fully understood, a rigorous module could encompass the search space. However, that was not the goal of this investigation, so it is left for future endeavors.

We find that the moment method for calculating center of mass is not robust to the morphable model of the hand. Therefore a Euclidean distance transform was used. The actual implementation is very nearly a Medial Axis Transform. That is why in future improvements, this step will be incorporated with the skeletonizing phase. The present skeletonizing algorithm can be modified with clever labeling to track a notion of a timestamp. Regardless, the experimentation has shown us the sensibility of using a distance transform over the standard 1st order moment descriptor.

Skeletonizing

Skeletonizing is based on a single pass algorithm [4]. This was used to effortlessly attain estimates of where finger axes may be found. With this information, further refinement would be possible much like that done by Kanade [1]. As natural as this operator is on a skeletal structure, the reward of the estimation is countered by the subsequent filtering of branches. The skeletonizing algorithm was expected to perform quickly, but this was not the case and is not recommended for realtime applications.

Orientation by Slope Density

Since the hand is highly morphable, it was proposed that an approach of slope density could more closely synthesize the hand's orientation [5]. One problem arises when a 2 candidate slopes vie due to redundancies in perpendicular surfaces of the hand. So a clustering was employed with a weight distribution a*t1 + (1 - a)*t2 on the competing angles (see the source code listing for SceneAnalysis.getSlopeDensity).



Classification

Classification under the proposed constraints follows a decision tree model. Since our approach is independent of any training, any data set is testable. This is a bonus for experimentation and confidence measurement, however it does have its drawbacks.

The most significant shortcoming is that the decision tree is statically fixed. It approaches classification through an exhaustive means, and is not amenable to dynamic settings. A new encoding would be necessary to make the classifier extendable. Since the parameters that model the hand are finite and fully understood, a rigorous module could encompass the search space. However, that was not the goal of this investigation, so it is left for future endeavors.