Hand Recognition

Introduction

Previous Work

Algorithm

Experiment

Conclusions

References

Report

Authors:

crh13 | rz33

Hand Gesture Recognition

EE 547: Computer Vision Poster Project

Remik Ziemlinski Christopher Hynes

Past Work

Hand tracking and gesture recognition is often misunderstood as being one and the same area, but in retrospect they are not. The two have a fair share of overlap, and that drives one another to excel in innovation and exploration. But before anyone can use the high level notions of gestures to drive a system, reoccurring problems must be addressed at a more fundamental level.

But before the finer details are resolved, an end to these means is desired. This provides a common focal point and direction for any endeavors. The basis of the problem is the visualization of a hand, but some target is needed at another level to map the perception to comprehension.

High level abstraction is appealing to researchers, and in the last decade two have been exploited and continue to be pursued in the modeling of the hand. The classical model follows a robotics-control description and the latter a principal component description, also known as PCA. The first has several advantages over the other on a conceptual level.

Since the robotics control descriptor simulates a 3D-hand/arm model closely, it gives rise to a very intuitive feel for what the domain actually is. As a by-product, this description can be parameterized rigorously. The exact set of parameters established, due to Denavit and Hartenburg (DH), represent the model with kinematic chains, or series of articulated links.

Kanade and Regh employed this representation in their hand tracking system [1]. This application of the kinematic model was novel such that the hand was no longer classed as some rigid object, as was commonly done with humanoid parts. Their modeling allowed for a highly adaptable state. For reasons of practicality their model was constrained to 27 degrees of freedom, much like the 27 bones of a human hand [3]. Together with shape models describing visual appearances, the DH parameters constituted their modeling framework.

This framework translates well to the visual world because features can be used to extrapolate the parameter model, and vice-versa. Each finger was modeled as a cylinder, and naturally, the axis, or bisector, was a cue to the parameterized link. This development from the palm to the fingertip would then give a full articulated description of a finger. The fingers as seen under that occlusion boundary axis formulation constituted their feature space in the static analysis.

Their main objective was, however, in the dynamic nature of the hand. Each statically analyzed state was amenable to corrections due to previous observations. As a result, the line fitting done was of significant help to deal with difficult states where features could be lost. Incrementally adjusting state to fit static state alone isnÕt sufficient, however. Kanade used a 2-camera system to resolve the discontinuities inherent in the tracking. Wu and Huang were well aware of this and extended the hand tracking work of Kanade in 3 respects [2]. Firstly, they found that unrestrictedly modeling the hand produced a larger space than necessary and thus exposed it to more susceptibility of articulation singularities. Second, they posed the problem as a separable one, so once constituent parts were refined to a level desired, they could be joined to form a more accurate model. Lastly, they introduced means to handle the inherent error and noise of image sampling.

The lack of these notions in the Kanade system may explain the appeal for a multi-camera view. For instance, the new constraints introduced to the kinematic hand model helped bound articulation. Specifically, Wu imposed angular limits to several of the DH parameters. In addition, one constraint established a relative relationship between 2 links, thus keeping the two more finely coupled [2].

Another notion not made explicit by Kanade, and therefore important as a development, is the splitting of the global hand from the local fingers. This is important in avoiding possible traps in local minima. This means of divide and conquer allows many alternatives to be employed in treating each subproblem. The decoupled subproblems proposed by Wu were pose determination and inverse kinematics. Under the Kanade system, both of these were not looked upon as being separate, and so the estimation was less guided.

Finally, converging to more accurate estimates was also done by the using robust statistics. The least median squares (LMS) method allows for up to 50% of a sampling to be composed of outliers, and so this method is the preferred choice in cases such as when the Gaussian noise assumption fails [2]. This was overlooked by Kanade and Regh, whereby they opted to use the method of least squares (LS), a non-robust approach [1].

With these new additions to kinematic modeling, the system was still sensitive to acquisition suitability. Wu was explicit in stating that any occlusion of a single fingertip would lead to tracking loss of that finger. This in turn implies that systems as of the Wu design could not reasonably track projected hands under roll or pitch. This demonstrates that hand systems are still inadequate even with the aid of dynamic information.