NAME nn.pl - Calculates nearest neighbor for a given test and training set SYNOPSIS nn.pl [train] [test] DESCRIPTION Using a training set of feature vectors, this program calculates the nearest neighbor for each (of the same type) feature vectors in the test set and determines and prints to stdout the 4 possible results: [Actual/Calculated] [0/0] [1/0] [0/1] [1/1] The two cases [0/0] and [1/1] represent correct predictions by the nearest neighbor algorithm, while [1/0] and [0/1] represent errors. For example, 10 5 5 10 Would have predicted 0 correctly for 10 of the test cases, 1 correctly for 10 more of the test cases, and 0 or 1 incorrectly for 5+5 of the test cases, yielding an accuracy of 66%. The format for both the training and test files is: [feature vector] [0 || 1] Where the feature vector represents any number of features, and [0 || 1] is the expected result. In this example training file, 0 0 0 0 1 1 1 1 The test vector 1 1 1 1 would produce the output 0 0 0 1, while the test vector 1 1 1 0 would produce the output 0 1 0 0. Similarly, a test vector .2 .2 .2 0 would produce the output 1 0 0 0, while .8 .8 .8 0 would produce the output 0 0 1 0. Nearest neighbor is calculated using Euclidean distance: d = sqrt((a1 - b1)^2 + (a2 - b2)^2 + (a3 - b3)^2 + ...) For every test vector, the distance is calculated to every training set vector, and the minimum distance is determined, which then becomes the prediction. The training and test files can be generated using fv.pl. CONSTRAINTS Large datasets quickly decrease performance. Perl must be located in /usr/local/bin/perl OPTIONS train training set file test test set file AUTHOR(S) U.Moszkowicz and A.Mehler SEE ALSO fv.pl