Data Acquisition
The first step in the experiment involved acquiring the data set to both train and test with. The ultimate goal is, of course, to determine whether a signature is authentic or not, requiring that more than one subject produce signatures. To ensure diversity, we acquired the signatures of five individuals, both male and female, who were instructed to produce their best forgeries and even given indirect hints on how to do this, creating similar sized signatures for example.

Preprocessing
Median Filter
The signatures were produced on clean white sheets of paper with no printing on either side and scanned on a clean scanner, but nonetheless "salt and pepper" noise cannot be avoided, if caused by measurement error for no other reason. Any large blocks of noise would have been detected and removed when cropping the images.

Thinning and Pruning
To account for ink bleeding, different pens, and other similar variations in writing conditions the signatures needed to be thinned. Non-smooth surfaces resulted in phantom lines appearing. A solution to the phantom line problem is pruning. From sampling several of the signatures, it appeared that line lengths shorter than 20 were probably merely phantom and could be removed. In some cases, the resulting signatures were marvelous, but in most cases some significant but not vital amounts of detail were lost, such as in the top of the "h" in "Mehler". One must realize that though the resulting signature may not visually appeal, as long as most of its identifying characteristics remain the preprocessing will have been a success.

Bounding Box
Since the images were cropped by hand, they were not necessarily bottom-left oriented at the origin. To position all signatures similarly and reduce feature extraction time a bounding box for the signature is obtained and a new image produced.

Authentic OriginalAuthentic Preprocessed
Forged OriginalForged Preprocessed
Authentic Original w/o Pruning
Authentic OriginalAuthentic Preprocessed
Forged OriginalForged Preprocessed


Feature Extraction
Global Features.
Six global features were used in the feature vector. They are signature height, width, area, center of mass (x,y), edge points, cross points, and closed loops. These features proved to be a very accurate classifier.

Local Features
Two methods of localized features were experimented with. One was imposing a grid on the image area, and taking the area of all the pixels in the grid as the feature. The other method was grouping the signature pixels into 'strokes'. A stroke attempts to capture a continuous curve of the pen. Strokes are found by scanning the image for an unmarked pixel, and extending this pixel in the up and down direction, only adding neighboring pixels to this stroke if they neighbor in the same direction (i.e. all successive neighbors are above each other). Comparing the results of the two showed that using the grid is more sensitive to the particular signature. Since strokes are based on the actual image data, they can adapt to difficult cases. For each stroke, the top most and bottom most point is extracted, as well as the length and slope of the line connecting these two points.

Height:161165
Width:621924
Area:24182782
Center of Mass:(307.320923, 61.564102)(458.598846, 63.445721)
Edge Points:58
Cross Points:5328
Closed Loops:2511


1st Stroke
Top Point:(74,165)(38,164)
Bottom Point:(5,21)(6,25)
Length:159.677795142.635895
Slope:2.0869574.343750


Neural Network
We used a Quasi-Newton training algorithm, with 1 Hidden Layer of 15 nodes, each with a logistic node function, and an output node with a tanh function. The final result was the sign of the output. This network was trained for 200 iterations.

We also used an Ada Boost program. Boosting lightly trains many very simple networks, and hopes by using a large number of them, the resulting vote will be accurate. The program trained 1000 networks; each with 1 hidden layer of 3 nodes, with logistic functions, and a tanh output. These networks were only allowed to be trained for a maximum of 50 iterations. We see that the Boosting algorithm performed better on test sets than the Quasi-Newton algorithms.

Statistics
MethodCorrect Classifications(%)False Positives(%)False Negatives(%)Total Misclassifications(%)
Adaboost (AB)925.22.78
Quasi Newton84.47.77.915.6
Nearest Neighbor82.16.511.417.9
AB, grid features69.57.323.230.5
AB, global only94.911.93.25.1
AB, strokes only92.33.853.857.7
AB, 1st stroke only51.16048.8448.84


Programs
bbDownloadManExtracts bounding box
featureDownloadManComputes features
fv.plDownloadManAutomates creation of feature vectors
nn.plDownloadManComputes nearest neighbor
random.plDownloadManRandomly swaps lines between files