VisionX V4VRCLASSCVVisionX V4
NAME

vrclasscv − classification with cross validation

SYNOPSIS

vrclasscv [tr=] [of=] [roc=] [om=] [os=] [cl=] [n=] [p=] [-q] [-v] [-d]

DESCRIPTION

Vrclasscv performs feature-based classification for two classes with N-fold cross-validation using the R statistics library. This command requires that the R packages cvAUC, e1071 and randomForest be installed. The input tr= is randomly split into N folds and cross-validation is performed. The trained model from each fold may be saved. The classifier to be used is specified with the cl= parameter and the classifier parameters are specified by the p= parameter. A graphical representation of the ROC curve may be saved in .png format with the roc= option and classification statistics may be saved with the os= option.

DATASET FORMATS

The input file tr= format is comma separated value (csv) files with a header and with a required column name of "class". All columns that precede the "class" column are assumed to be identifiers and are ignored by the classifier. The class column specifies the true class of the item (the value 1 specifies the (positive) class and any other value is mapped to 0 to specify the other (negative) class unless the -q option is specified).

The response file specified by the of= parameter is a csv file that contains the identifier and class columns with an additional "response" column that contains the response value from the classifier.

CLASSIFIER SPECIFICATION

The classifier type is specified in cl= parameter and associated parameters are specified as a string in p= ; the defualt classifier is svmr.

svmr

SVM classifier with RBF kernel. Preset parameter kernel="radial". Typical additional parameter: p="cost=1.2" sets RBF kernel constraints violation cost to 1.2 (default cost=1). See R-package e1071() for additional optional parameters.

svmp

SVM classifier with Polynomial kernel. Prset parameter kernel="polynomial". Typical additional parameter, p=d=4 sets Polynomial kernel degree to 4 (default d=3). See R-package e1071() for available additional parameters.

knn

K-Nearest-Neighbor classifier. Preset parameter none (default). Typical additional parameter: p=k=3 sets number of nearest neighbors to 3 (default k=1). See R-package class() for more details.

log

Logistic regression classifier. Preset parameter settings in glm(family=binomial(link="logit")). See R-package glm() for available additional parameters.

rf

Random forest classifier. Preset parameter none (default). Typical additional parameter: p=ntree=300 sets number of trees to 300 (default ntree=500). See R-package randomForest() for more details.

CONSTRAINTS

Classifiers are sensitive to certain additional parameters. Typically, for a reasonable performance, k= for knn classifier and ntree= for rf classifier should not be set to very small values (ideally k>=3 and ntree>=100).

OPTIONS

tr=<infile>

Input csv file with the class column specified.

of=<outfile>

Output response filein csv format with the last column being the classifier response.

roc=<outfile2>

Output the ROC curve in a graphic form in .png format. (averaged from N-fold cross-validation).

om=<outfile3>

Prefix for output trained models.

os=<outfile4>

Output classifier statistics in csv format. The file contains the following columns: AUC (average AUC value), CI low and CIhigh (95% confidence interval), SE (standard error), Confidence (value for confidence interval, i.e. 0.95). (See R package cvAUC for more details.)

cl=

classifier type (knn, svmr, svmp, log, rf). Default is svmr.

n=

N-fold cross-validation N value (default n=5).

-q

This option specifies that the given values in the class column are to be directly used for the class values for the classifier. There is no value mapping and values other than 1 and 0 may be used. The impact of different class values is classifier dependent.

-v

verbose flag

-d

Debug flag, more information and tmp files are saved in the current directory

AUTHOR

Y. Xie