VisionX V4VRCLASSTTVisionX V4
NAME

vrclasstt − two class classification

SYNOPSIS

vrclasstt [tr=] [te=] [im=] [of=] [roc=] [os=] [om=] [cl=] [p=] [-e] [-q] [-v] [-d]

DESCRIPTION

Vrclasstt performs feature-based classification for two classes using the R statistics library. This command requires that the R packages cvAUC, e1071 and randomForest be installed. This command performs both training and testing; the trained model may be saved. There are three main modes of operation, training, testing, and both training and testing with different datasets. The classifier to be used is specified with the cl= parameter and the classifier parameters are specified by the p= parameter. A graphical representation of the ROC curve may be saved in .png format with the roc= option and classification statistics may be saved with the os= option The operation of the classifier is specified as follows:

1. Just training: the training dataset is specified by the tr= parameter and the resulting model is specified by the om= parameter. The response file (i.e. training performance) is specified by the of= parameter.

2. Just testing: the testing dataset is specified by the te= parameter and the model is specified by the im= parameter. The response file is specified by the of= parameter.

3. Training and testing: The input dataset is specified by the tr= parameter and the testing dataset is specified by the te= parameter. The response file is specified by the of= parameter.

DATASET FORMATS

The training set and test set file formats are comma separated value (csv) files with a header and with a required column name of "class". All columns that precede the "class" column are assumed to be identifiers and are ignored by the classifier. The class column specifies the true class of the item (the value 1 specifies the (positive) class and any other value is mapped to 0 to specify the other (negative) class unless the -q option is specified). If the class of an item is unknown then its value is set to blank (""). All columns following the "class" column are taken to be feature values.

The response file specified by the of= parameter is a csv file that contains the identifier and class columns with an additional "response" column that contains the response value from the classifier.

CLASSIFIER SPECIFICATION

The classifier type is specified in cl= parameter and associated parameters are specified as a string in p= ; the defualt classifier is svmr.

svmr

SVM classifier with RBF kernel. Preset parameter kernel="radial". Typical additional parameter: p="cost=1.2" sets RBF kernel constraints violation cost to 1.2 (default cost=1). See R-package e1071() for additional optional parameters.

svmp

SVM classifier with Polynomial kernel. Preset parameter kernel="polynomial". Typical additional parameter, p=d=4 sets Polynomial kernel degree to 4 (default d=3). See R-package e1071() for available additional parameters.

knn

K-Nearest-Neighbor classifier. Preset parameter none (default). Typical additional parameter: p=k=3 sets number of nearest neighbors to 3 (default k=1). See R-package class() for more details.

log

Logistic regression classifier. Preset parameter settings in glm(family=binomial(link="logit")). See R-package glm() for available additional parameters.

rf

Random forest classifier. Preset parameter none (default). Typical additional parameter: p=ntree=300 sets number of trees to 300 (default ntree=500), See R-package randomForest() for more details.

CONSTRAINTS

Classifiers are sensitive to certain parameters. For a reasonable performance, k= for knn classifier and ntree= for rf classifier should not be set to very small values (ideally k>=3 and ntree>=100).

OPTIONS

p=<classifier-parameters>

This allows additional parameters to be specified for the classifier. The syntax is the same as is used in the R package; multiple parameters are specified by comma separated name=value pairs. If the parameter value is a quoted text string then care must be taken to correctly escape the quote characters when specifying this parameter.

tr=<trainfile>

Input training csv file with the class column specified.

te=<testfile>

Input testing csv file, the class column must be specified but may have blank values.

im=<model>

Input trained model from a previous classification.

of=<responsefile>

Output response file in csv format with the last column being the classifier response.

roc=<file>

Output the ROC curve in a graphic form in .png format. If testing data does not contain class labels, roc= will have no effect.

om=<outfile3>

Output trained model file. For knn classifier, om= has no effect.

os=<statfile>

Output classifier statistics in csv format. The file contains the following columns: AUC (average AUC value), CI low and CIhigh (95% confidence interval), SE (standard error), Confidence (value for confidence interval, i.e. 0.95). (See R package cvAUC for more details.)

cl=

specify the classifier type (svmr, knn, svmp, log, rf) the default is svmr.

-e

This option sets the evaluation mode for which class column is required. If not specified, only feature values are required (class and identifier columns can be omitted).

-q

This option specifies that the given values in the class column are to be directly used for the class values for the classifier. There is no value mapping and values other than 1 and 0 may be used. The impact of different class values is classifier dependent.

-v

verbose flag

-d

Debug flag, more information and tmp files are saved in the current directory

AUTHOR

Y. Xie