VisionX V4VCLASFVisionX V4
NAME

vclasf − distance classification program

SYNOPSIS

vclasf kf=<known-library> [uf=<unknown library>] [of=<ofile>] [nn=<int value>] [os=<output stats file>] [s=<input stats file>] [-b] [-l] [-v] [-d]

DESCRIPTION

vclasf classifies unknown vectors using a K-Nearest Neighbors algorithm against a library. To perform the classification, the Euclidean distance between each item in the unknown library and each item in the known library is computed. The number of neighbors (as determined by nn=) with the smallest distance are used to determine the class of each unknown vector; in the case of one neighbor, the class is simply the class of the nearest neighbor. In the case of more than one neighbor, the majority class is selected. If there is a tie between several classes, it is broken as follows:

If no majority -> the class with the nearest item out of the classes tied for majority is selected.

If no majority, and classes have items with the same distance -> latest class in the library is selected.

The result is a VisionX vector format file with the feature vectors removed and with two or three additional vector elements added. After the feature vector, there is a vector of length (nn+1) with the classified class as the first entry and a list of the class of the nearest neighbors. A vector of distances follows, again of length (nn+1), where the first element is the distance to the classified class, and the remaining elements the distances to the nearest neighbors. After that, if the library had associated ids, a vector of ids of length (nn+1) is included, with the first entry being the id of the nearest neighbor and the remaining elements the ids of the nearest neighbors.

CONSTRAINTS

All items in a library must have the same feature length. Leave-one-out classification will fail to correctly classify a class with only one item. IDs and classes must be integer values, features are real-valued. Each vector in the library must have a class specified; if the first vector has an ID, then all the vectors must have ID values specified. If the known library file and the unknown vector file do not have the same number of frames, only the smaller number of frames between the two files will be processed.

OPTIONS

kf=

Known library file

uf=

Unknown vector file. Required unless -l is specified.

of=

Output file as described above.

nn=

An integer specifying the number of neighbors to consider when performing classification. Defaults to 1 if not specified.

os=

File containing the statistics calculated from the balancing. The first floating point vector (VX_GFLOAT) contains the mean for each element, and the second floating point vector contains the standard deviation.

s=

Balance using statistics from a file with the same format as described for os=

-b

Perform balancing of data using either statistics specified by s= or by computing new statistics for each element from the known library file.

-l

Performs leave-one-out classification on library specified by kf. This is accomplished by classifying each feature vector in the library with all other feature vectors in the library.

-v

Verbose mode

-d

Debugging output

AUTHOR

A. P. Reeves and A. Jirapatnakul

SEE ALSO

vfcstat(1), vfvcat(1), vfvpick(1), vxtoxl(1), vxltovf(1)