RLScore - regularized least-squares based machine learning algorithms for regression, classification, ranking, clustering, and feature selection.
| Authors: | Tapio Pahikkala, Antti Airola |
|---|---|
| Email: | firstname.lastname@utu.fi |
| Homepage: | http://staff.cs.utu.fi/~aatapa/software/RLScore/ |
| Version: | 0.5 |
| License: | The MIT License |
| Date: | 2012.06.19 |
Contents
RLScore is a Regularized Least-Squares (RLS) based algorithm package. It contains implementations of the RLS and RankRLS learners allowing the optimization of performance measures for the tasks of regression, ranking and classification. In addition, the package contains linear time greedy forward feature selection with leave-one-out criterion for RLS (greedy RLS). Finally, the package contains an implementation of a maximum margin clustering method based on RLS and stochastic hill climbing. Implementations of efficient cross-validation algorithms are integrated to the package, combined together with functionality for fast parallel learning of multiple outputs.
Reduced set approximation for large-scale learning with kernels is included. In this setting approximation is introduced also to the cross-validation methods. For learning linear models from large but sparse data sets, RLS and RankRLS can be trained using conjugate gradient optimization techniques.
Download RLScore.zip containing the python source code of RLScore.
RLScore is written in Python and thus requires a working installation of Python 2.6.x. The package is also dependent on the NumPy 1.3.x package for matrix operations, and SciPy 0.7.x package for sparse matrix implementations. The psyco package is automatically used if installed.
RLScore is designed to be used by supplying a configuration file defining the learning task to the rls_core program.
The easiest way to use RLScore is by modifying one of the example configuration files delivered with the distribution, to match your task. The software supports a wide variety of different learning tasks, ranging from supervised learning to clustering and feature selection.
To run RLScore using a configuration defined in a file example.cfg, simply write:
python rls_core.py example.cfg
The structure of the configuration file is described in detail next.
There is also a programming interface to RLScore. There is not yet a documentation of the API available, but for each configuration file we provide also example Python code for executing the same run.
The configuration file consists of [Sections], which contain attribute=value pairs. The configuration file is case sensitive, the ordering within sections does not matter. Use # to start comment. None of the attributes are mandatory. However, setting certain attributes also requires some other attributes to be set. The sections in the configuration are [Modules], [Parameters], [Input], and [Output] sections.
The Modules section defines the main modules used for model selection, learning and performance evaluation. The attributes in the section are
| learner: | by defining the learner you inform RLScore that it should train one of the available learning algorithms |
|---|---|
| kernel: | defines the used kernel function. For kernel parameters, see [Parameters] |
| measure: | defines the performance measure used for model selection and/or evaluating |
| mselection: | defines the used model selection strategy. The model selection strategies are not compatible with all the learners, and some not with all performance measures. |
RLScore currently has the following four possible values of the learner attribute:
| Value: | RLS |
|---|---|
| Description: | Regularized least-squares regression, or accuracy maximizing classification. |
| Modules: | kernel , mselection (optional, compatible with LOOSelection, NfoldSelection, ValidationSetSelection) |
| Parameters: | regparam (or reggrid if mselection used), bias , kernel parameters |
| Input data: | train_features , train_labels |
| Value: | AllPairsRankRLS |
|---|---|
| Description: | Regularized least-squares ranking, or AUC-maximizing classification. |
| Modules: | kernel, mselection (optional, compatible with LPOSelection, NfoldSelection, ValidationSetSelection) |
| Parameters: | regparam (or reggrid if mselection used), kernel parameters |
| Input data: | train_features , train_labels |
| Value: | LabelRankRLS |
|---|---|
| Description: | Regularized least-squares ranking with a query-structure. |
| Modules: | kernel, 'mselection'_ (optional, compatible with NfoldSelection, ValidationSetSelection) |
| Parameters: | regparam (or reggrid if mselection used), kernel parameters |
| Input data: | train_features , train_labels , train_qids |
| Value: | CGRLS |
|---|---|
| Description: | Regularized least-squares regression, or accuracy maximizing classification. Large scale algoritm for large and high-dimensional but sparse data sets, and linear kernel. Gives equivalent results as RLS. |
| Modules: | mselection (optional, compatible with ValidationSetSelection) |
| Parameters: | regparam (or reggrid if mselection used), bias |
| Input data: | train_features , train_labels . Supplying validation_features and validation_labels will automatically lead to using early stopping for faster training, by measuring sqerror on validation data, and terminating after no improvement is seen for 10 iterations. |
| Value: | CGRankRLS |
|---|---|
| Description: | Regularized least-squares ranking, or AUC-maximizing classification. Also ranking with a query-structure. Large scale algoritm for large and high-dimensional but sparse data sets, and linear kernel. Gives equivalent results as AllPairsRankRLS (or LabelRankRLS, if queries are supplied). |
| Modules: | mselection (optional, compatible with ValidationSetSelection) |
| Parameters: | regparam (or reggrid if mselection used), |
| Input data: | train_features , train_labels , train_qids (optional). Supplying validation_features , validation_labels and optionally validation_qids , will automatically lead to using early stopping for faster training, by measuring sqmprank-error on validation data, and terminating after no improvement is seen for 10 iterations. |
| Value: | GreedyRLS |
|---|---|
| Description: | Feature selecting regularized least-squares learner. |
| Modules: | kernel , mselection (compatible with ValidationSetSelection) |
| Parameters: | regparam (or reggrid if 'mselection'_ used), subsetsize , bias |
| Input data: | train_features , train_labels |
| Value: | MMC |
|---|---|
| Description: | Maximum margin clustering based on evolutionary search and regularized least-squares. |
| Modules: | kernel |
| Parameters: | regparam (or reggrid if mselection used), number_of_clusters , bias , kernel parameters |
| Input data: | train_features |
Of these, the first six are supervised learners and the last is an unsupervised clustering method.
The kernel attribute defines the used kernel function. This should be supplied to the kernel-based learners, default behaviour is to use linear kernel if this is not supplied. Parameters can be supplied for kernel functions in the [Parameters] section. For the kernel atribute, RLScore currently supports the following three values
| Value: | LinearKernel |
|---|---|
| Description: | The linear kernel aka the standard inner product <x,z> of feature vectors x and z. This is the default value for the kernel attribute. |
| Parameters: | None. |
| Requirements: | None. |
| Value: | GaussianKernel |
|---|---|
| Description: | The Gaussian radial basis function kernel e^(-gamma*<x-z,x-z>) for feature vectors x and z, where g is the width of the Gaussian kernel. |
| Parameters: | gamma (default 1) |
| Requirements: | gamma > 0 |
| Value: | PolynomialKernel |
|---|---|
| Description: | The polynomial kernel k(x,z) = (gamma * <x,z> + coef0)^degree for feature vectors x and z, where d, c, and g are kernel parameters. |
| Parameters: | gamma (default 1), coef0 (default 0.), degree (default 2) |
| Requirements: | degree>0, coef0>=0, gamma>0. Moreover, degree must be integer, while c and g may be floats. |
The measure attribute defines the performance measure used for model selection and/or evaluating test performance.
| Value: | sqerror |
|---|---|
| Description: | Mean squared error, for regression. |
| Requirements: | None |
| Value: | accuracy |
|---|---|
| Description: | Accuracy, for binary classification. |
| Requirements: | The correct labels must be +1 or -1. |
| Value: | auc |
|---|---|
| Description: | Area under ROC curver, for classification (bipartite ranking). |
| Requirements: | The correct labels must be +1 or -1. |
| Value: | ova_accuracy |
|---|---|
| Description: | Multiclass classification accuracy, one-vs-all strategy. |
| Requirements: | The correct labels must be +1 or -1 and there must be one and only one +1 per data point. |
| Value: | disagreement |
|---|---|
| Description: | Disagreement error, the number of misordered pairs in pairwise ranking. |
| Requirements: | None |
| Value: | sqmprank |
|---|---|
| Description: | Squared magnitude-preserving ranking error. Average value of ((f(x1)-f(x2))-(y1-y2))**2 over all data point pairs. |
| Requirements: | None |
| Value: | fscore |
|---|---|
| Description: | F1-score |
| Requirements: | The correct labels must be +1 or -1. |
The mselection attribute defines the model selection strategy used for selecting the regularization parameter. The model selection strategies are not compatible with all the learners, and some not with all performance measures.
| Value: | NfoldSelection |
|---|---|
| Description: | N-fold cross-validation or repeated hold-out for RLS or AllPairsRankRLS. Uses by default randomized 10-fold partition. User supplied hold-out sets can be provided via cross-validation_folds attribute in the [Input] section. For LabelRankRLS, each query forms a fold and user supplied hold-out sets are not supported. |
| Value: | LOOSelection |
|---|---|
| Description: | Leave-one-out cross-validation. Supported by RLS. |
| Value: | LPOSelection |
|---|---|
| Description: | Leave-pair-out cross-validation. Supported by AllPairsRankRLS. Based on disagreement error. |
| Value: | ValidationSetSelection |
|---|---|
| Description: | Parameter selection on a separate validation set. Supported by all the supervised learners. Requires in the [Input] section validation_features , validation_labels (also optionally for RankRLS learners, validation_qids ). |
Parameters section contains the parameters supplied to RLScore. The meaning of kernel and learner parameters differs for different learning and kernel modules.
Supply a float valued regularization parameter if you wish to train a learner with a pre-selected parameter value. This value is used, if no model selection module is defined. Must be positive. The default value is 1.
Regularization parameter grid searched during model selection. The value of the attribute is given as lower_upper, where lower and upper must be integers, with upper > lower. The grid becomes 2**lower ... 2**upper, that is, all integer powers of 2 between 2**lower and 2**upper are tested as values of the regularization parameter and the one with the best performance is selected. The default grid is -5_5. Alternatively, all the parameter values in the grid can be given directly, e.g. '0.001, 0.1, 1, 10, 50'.
Float valued bias term, that corresponds to a new constant-valued feature added to each data point. Allows learning models of the type f(x)+b, where a constant value (learned from data) is added to each prediction. The value must be positive, the default value is 0. Can be useful for RLS learners, when using linear kernel and low-dimensional data.
Parameter supplied to the MMC learner. Its value is an integer specifying the desired number of clusters.
Parameter supplied to the GreedyRLS learner. Its value is an integer defining the number of selected features.
Float valued positive kernel parameter for the Gaussian or the polynomial kernel. For the Gaussian kernel k(x,z) = e^(-gamma*<x-z,x-z>), for polynomial kernel k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 1.).
Float valued kernel parameter for the polynomial kernel. k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 0)
Integer valued positive kernel parameter for the polynomial kernel. k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 2)
The attributes in this section are names of RLScore variables used inside the RLScore software. The values of the attributes are filenames from which data is loaded to the variables. For example, the feature representations of the training data are loaded into a variable of name train_features. Some of the loaded [Modules] require certain valiables to be loaded. The loaded variables also have an effect on what rls_core does.
All variables have their corresponding default file formats. Detailed descriptions of the variables and their default file formats are given in RLScore variables.
RLScore variables are used to refer to the different types of data inside the RLScore software. The contents of the variables can be loaded from a file via the [Input] section or they are generated by the software itself. For example, if the contents of the model and prediction_features variables are provided, the software uses the model to perform predictions for the data points represented by the prediction_features variable and the predictions are put to the variable predicted_labels. The contents of predicted_labels can then be saved into file or used for performance evaluation if the contents of the test_labels variable are also provided.
Variable containing features for training data. The default file format is the one described in Featurefile.
Variable containing labels for training data. Necessary when training supervised learners. The default file format is the one described in Labelfile.
Qids for the training data. The default file format is the one described in Qid file.
Use reduced set approximation to speed up training and prediction. Restricts the learned hypothesis to be represented only by the training data points whose indices are in the basis vector file. The default file format is the one described in Basis vectors.
Variable containing indices of holdout data points, one row per hold-out set. This can be used to define folds for cross-validation or, more generally, hold-out sets for repeated hold-out. The default file format is the one described in Fold file.
This variable contains a model learned from a data. It will be generated if user provides a learner attribute and training data for the learner. Model can be saved into a file via Python's pickle protocol. Previously learned model can be loaded from a file in order to perform predictions for unseen data.
Features for data one wishes to make predictions for. Prediction will be performed if a model is loaded from a file or if a predictor has been trained. The default file format is the one described in Featurefile.
Correct labels for test data, supply these if you want to measure performance on test data. The default file format is the one described in Labelfile.
Predicted labels for test data. These are generated if a model is used to perform predictions. These are also needed if one wants to measure performance on test data. The default file format is the one described in Labelfile.
Qids for test data, supply these if you want to evaluate performance on test data as an average over queries. The default file format is the one described in Qid file.
Results of MMC clustering on the training data (see Clustering with evolutionary maximum margin clustering).
The indices of the features selected by the GreedyRLS learner (see Feature selection with greedy RLS).
The list containing the LOO performances made by GreedyRLS during the greedy forward selection process (see Feature selection with greedy RLS).
Variable containing features for validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter. The default file format is the one described in Featurefile.
Variable containing labels for validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter. The default file format is the one described in Labelfile.
Qids for the validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter, for query structured data with LabelRankRLS, or CGRankRLS (see the learner attribute).
The following types of files can be supplied as input for rls_core
Featurefile - the file containing attribute:value pairs for the training data.
Labelfile - the file containing the values of the correct labels for training data.
Fold file - Indices of holdout data, can be used to define folds for cross-validation.
Basis vectors - Indices of the training data points used as basis vectors, for the reduced set approximation. Normally, all the training data are basis vectors.
Qid file - File contains a query id for each data point. This can be used in query structured ranking tasks to define which document are related to the same query (information retrieval tasks), to define which parses correspond to the same sentence (parse ranking), etc.
The convention used when indexing features or data points is to start the indexing from zero. Thus if there are m distinct features/data points, the possible indices are from the range [0 ... m-1].
Below we give detailed descriptions of the file formats.
In all tasks, the data are provided in the input file one per line using sparse representation. Technically, the format of a line can be expressed as follows:
<line> .=. <index>:<value> <index>:<value> ... <index>:<value> # <comment> <index> .=. <integer> <value> .=. <float> <comment> .=. <string>
The features are provided in tokens consisting of a feature index, a colon, and a real number indicating the value of the feature. The feature representation is sparse so that only the features whose values differ from 0 are present in the line. Further, the feature indices have to be given from the smallest to the largest starting from zero. For example, the line:
0:0.43 3:0.12 9284:0.2
specifies a data point that has non-zero values for features number 0, 3 and 9284, and value 0 for all the other possible features. If a data point has no non-zero valued attribute, then use 0:0 to differentiate this from empty line.
Labels are the correct output values associated with some set of data points. These are required in training supervised learners and in performance estimation, but naturally not when making predictions for new examples. The labels are provided in the label file so that each line corresponds to one training data point, the data being in the same order as in the feature file. The file label file has the following dense matrix format:
<line> .=. <value> <value> ... <value> # <comment> <value> .=. <float> <comment> .=. <string>
Note that there may be several labels per each line but each line must have the same number of labels. Having multiple labels is useful for multi-class and multi-label classification tasks or in general if there are many learning tasks to be solved simultaneously. For classification 1 is used to represent the positive class and -1 the negative. For regression and ranking any real values can be used.
Examples:
Lines:
1 -1 1
Could represent two positive (lines 1 and 3) and one negative data points in a binary classification task.
Line:
1 -1 -1 -1 1
could represent the labels for a data point in a multi-label classification task where a data point may belong to several different classes simultaneously. In this case the data point would belong to classes 1 and 5.
Lines:
1 -1 -1 -1 -1 1 -1 -1 1 -1 1 -1
could represent the labels for four data points in a multi-class classification task with three possible classes. In this setting each label corresponds to one class, and each data point has value 1 for the class it belongs to, and -1 for the other classes.
Lines:
1.123 3.433 0.0023
could represent real valued outputs for a simple regression task, where each data point is associated with one value, which we want to learn to predict.
The cross-validation folds file format is the following. For each separate hold-out set, there is a line in the file consisting of a list of indices of the training inputs that belong to the hold-out set. Technically, the format of a line can be expressed as follows:
<line> .=. <index> ... <index> # <comment> <index> .=. <integer> <comment> .=. <string>
The indices are separated with a white-space character. An index can not be more than one time in a single line. However, a single training input can belong to several hold-out sets simultaneously, and hence an index can be in multiple lines. The indexing of the training inputs starts from zero.
The basis vectors file contains a single line, where the indices of the basis vectors are contained, separated by whitespace. The format can be expressed as follows:
<line> .=. <index> ... <index> <index> .=. <integer>
For example:
0 23 25 44
Would mean that the data points number 0, 23 25 and 44 are used as basis vectors. An index can not be more than once in this file. The indexing of the training inputs starts from zero.
When performing ranking, the qid value is used to restrict the pairwise preference relations. By default, the preference relation covers all pairs of data points. Qids can be used to restrict which pairs are included in the relation. A pair of data points is included in the preference relation only, if the value of "qid" is the same for both of them.
Each line in the query id file contains the id of the query the data point belongs to. The format can be expressed as follows:
<line>.=. <qid> <qid>.=. <integer>
For example:
1 1 1 2 2
Would mean that the first three data points belong to query number 1, and the last second to query number 2. In this case pairwise preferences would be observed between the first and second, first and third, second and third and fourth and fifth data points. However, preferences between other pairs would not be considered, as they have different qids. The qids mainly have an effect on the pairwise performance measures, such as disagreement error or squared magnitude preserving ranking error. However, they may also have an effect on the other performance measure due to the averaging over the queries. For example, if squared error is used together with the qids provided in the above example file, the average squared error is first calculated for each query and the overall error is the average taken over the queries. Therefore, the three first data points have a lesser weight than the last two data points. This is in contrast to the case without qids, where the overall error is the average error taken over all data.
Currently, using qid file and a fold file together is not supported.
RLScore is designed to be used by supplying a configuration file defining the learning task to the rls_core program.
The easiest way to use RLScore is by modifying one of the example configuration files presented next, to match your task. The software supports a wide variety of different learning tasks, ranging from supervised learning to clustering and feature selection. Examples of typical use-cases for each type of task are provided below.
The configuration files, and the example data sets used by them can be found in the 'examples' folder of the RLScore distribution. For example, to run the configuration 'reg_train.cfg' included in examples/cfgs from the command line, go to the folder containing the RLScore distribution, and execute the command 'python rls_core.py examples/cfgs/reg_train.cfg'
While the examples use Unix-style paths with '/' separator, they work also in Windows with no modifications needed.
In binary classification, the data is separated into two classes. Classification accuracy measures the fraction of correct classifications made by the learned classifier. This is perhaps the most widely used performance measure for binary classification. However, for very unbalanced data-sets it may be preferable to optimize the area under ROC curve (AUC) measure, considered in a later example, instead.
When training a classifier according to the accuracy criterion, using the RLS module which minimizes a least squares loss on the training set class labels is recommended. The approach is equivalent to the so-called least-squares support vector machine.
Requirements: - class labels should be either 1 (positive) or -1 (negative)
#Binary classification (accuracy maximizing) # #Accuracy maximizing binary classification is done by training a RLS regressor #with labels +1 for positive and -1 for negative class. The algorithm #is equivalent to the so-called 'Least-squares support vector machine', #and is known to provide similar performance as SVMs # #For Area under the ROC curve (AUC) maximizing binary classification, check #the corresponding example. # #this examples chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RLS regressor is used as the training algorithm learner=RLS #Leave-one-out cross-validation can be used for parameter selection mselection=LOOSelection #Alternatively, 10-fold cross-validation with randomized fold partition #could be used. It is also possible to supply your own folds. #mselection=NfoldSelection #Accuracy measures the fraction of correct classifications made measure=accuracy #Linear kernel is always a reasonable first choice, advanced examples #show how to learn non-linear models with other kernels. kernel=LinearKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
In binary classification, the data is separated into two classes, which are often referred to as the positive, and the negative class. AUC measures the probability, that a randomly drawn positive data point receives a higher predicted value than a randomly drawn negative one. The measure is especially suitable for unbalanced data.
When training a classifier according to the AUC criterion, using the RankRLS learner which minimizes a pairwise least-squares loss on the training set class labels is recommended. Leave-pair-out cross-validation is recommended for model selection, unless the data set is very large.
#Binary classification (AUC maximizing) # #AUC maximizing binary classification is done by training the RankRLS ranker #with labels +1 for positive and -1 for negative class. The ranker aims #to solve the bipartite ranking task of ranking positive examples higher #than negative ones, which corresponds to AUC-maximization. # #(see An efficient algorithm for learning to rank from preference #graphs", Machine Learning, 2009 for further details on the training algorithm) # #Leave-pair-out cross-validation is the recommended strategy for model #selection, as leave-one-out estimation of AUC is known to have serious #negative bias in some cases # #(see A Comparison Of AUC-Estimators In Small-Sample Studies, JMLR proceedings #of MLSB'09. 2010) # #this examples trains chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RankRLS is used as the training algorithm learner=AllPairsRankRLS #Leave-pair-out cross-validation can be used for parameter selection mselection=LPOSelection #Alternatively, 10-fold cross-validation with randomized fold partition #can be used for large data sets. It is also possible to supply your own #folds. #mselection=NfoldSelection #AUC measures the probability, that a randomly chosen positive example #receives a higher score than a randomly chosen negative (which corresponds #to the area under the ROC curve). measure=auc #Linear kernel is always a reasonable first choice, advanced examples #show how to learn non-linear models with other kernels. kernel=LinearKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classAUC.model predicted_labels=./examples/predictions/classAUC.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'AllPairsRankRLS'
kwargs['measure'] = auc
kwargs['mselection'] = 'LPOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classAUC.model', model)
writer.write_dense('./examples/predictions/classAUC.predictions', predicted_labels)
In ranking the aim is to learn a function, whose predictions result in an accurate ranking when ordering new examples according to the predicted values. That is, more relevant examples should receive higher predicted scores than less relevant.
Using qids means that instead of a total order over all examples, each query has it's own ordering, and examples from different queries should not be compared. For example in information retrieval, each query might consist of the ordering of a set of documents according to a query posed by a user.
When training a ranker, using the RankRLS learner which minimizes a pairwise least-squares loss on the training set class labels is recommended. Leave-query-out cross-validation is recommended for parameter selection.
In case you have a total order over all examples, instead of query structrue, proceed as follows: - do not supply qid files - replace LabelRankRLS with AllPairsRankRLS in the Modules section
If the data is both high dimensional and sparse, one should use the module CGRankRLS, which is optimized for such a data (see Learning linear models from large sparse data sets).
In addition to learning from utility scores of data points, CGRankRLS also supports learning from pairwise preferences, see Config file (cgrank_test_with_preferences) and Python code (cgrank_test_with_preferences)
#Ranking with query ids. # #In ranking the aim is to learn a function, whose predictions result in an #accurate ranking when ordering new examples according to the predicted #values. That is, more relevant examples should receive higher predicted #scores than less relevant. # #Using qids means that instead of a total order over all examples, each #query has it's own order, and examples from different queries should #not be compared. For example in information retrieval, each query #might consist of the ordering of a set of documents according to #a query posed by a user. # #This example combines training, prediction and performance evaluation #together [Modules] #LabelRankRLS is meant for ranking problems with qids learner=LabelRankRLS #For LabelRankRLS NfoldSelection performs leave-query-out cross-validation #when choosing regularization parameter value mselection=NfoldSelection #Disagreement error measures the average number of pairwise mis-orderings per #query measure=disagreement #Alternative: squared magnitude preserving ranking error #measure=SqMPRankMeasure #Linear kernel is always a reasonable first choice, advanced examples #show how to learn non-linear models with other kernels. kernel=LinearKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/rank_train.features #labels of the training examples train_labels=./examples/data/rank_train.labels #qids of the training examples train_qids=./examples/data/rank_train.qids #features of the test examples prediction_features=./examples/data/rank_test.features #true labels of the test examples test_labels=./examples/data/rank_test.labels #qids for the test examples test_qids=./examples/data/rank_test.qids [Output] #the learned model is written here model=./examples/models/rankqids.model #the predicted labels are written here predicted_labels=./examples/predictions/rankqids.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import disagreement
from numpy import mean
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/rank_train.labels')
test_labels = reader.read_dense('./examples/data/rank_test.labels')
kwargs['train_qids'] = reader.read_qids('./examples/data/rank_train.qids')
prediction_features = reader.read_sparse('./examples/data/rank_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/rank_train.features')
test_qids = reader.read_qids('./examples/data/rank_test.qids')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'LabelRankRLS'
kwargs['measure'] = disagreement
kwargs['mselection'] = 'NfoldSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
print 'calculating performance as averages over queries'
performances = []
for query in test_qids:
performances.append(disagreement(test_labels[query], predicted_labels[query]))
performance = mean(performances)
print 'Performance: %f %s' % (performance, disagreement.__name__)
writer.write_pickle('./examples/models/rankqids.model', model)
writer.write_dense('./examples/predictions/rankqids.predictions', predicted_labels)
In regression, the task is to predict real-valued labels. The regularized least-squares (RLS) module is suitable for solving this task.
#Regression # #Regressor is trained by optimizing regularized least-squares loss # #this examples trains chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RLS regressor is used as the training algorithm learner=RLS #Leave-one-out cross-validation can be used for parameter selection mselection=LOOSelection #Alternatively, 10-fold cross-validation with randomized fold partition #could be used. It is also possible to supply your own folds. #mselection=NfoldSelection #Mean squared error measure=sqerror #Linear kernel is always a reasonable first choice, advanced examples #show how to learn non-linear models with other kernels. kernel=LinearKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #bias is mostly useful for linear models with low-dimensional data bias=2 [Input] #features of the training examples train_features=./examples/data/reg_train.features #labels of the training examples train_labels=./examples/data/reg_train.labels #features of the test examples prediction_features=./examples/data/reg_test.features #true labels of the test examples test_labels=./examples/data/reg_test.labels [Output] #the learned model is written here model=./examples/models/reg.model #the predicted labels are written here predicted_labels=./examples/predictions/reg.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import sqerror
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/reg_train.labels')
prediction_features = reader.read_sparse('./examples/data/reg_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/reg_train.features')
test_labels = reader.read_dense('./examples/data/reg_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '2'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = sqerror
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = sqerror(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, sqerror.__name__)
writer.write_pickle('./examples/models/reg.model', model)
writer.write_dense('./examples/predictions/reg.predictions', predicted_labels)
In clustering, the task is to divide unlabeled data into several clusters. One aims to find such cluster structure that within a cluster the data points are similar to each other, but dissimilar with respect to the examples in the other clusters.
The clustering algorithm implemented in RLScore aims to divide the data so that the resulting division yields minimal regularized least-squares error. The approach is analogous to the maximum margin clustering approach. The resulting combinatorial optimization problem is NP-hard, stochastic hill-climbing together with computational shortcuts is used to search for a locally optimal solution. Re-starts may be necessary for discovering good clustering.
#Performs maximum-margin clustering on the data set # #Details of the method can be found in #'Fast Evolutionary Maximum Margin Clustering' # [Modules] #The only clustering method currently supported learner=MMC [Parameters] #number_of_clusters controls the number of clusters number_of_clusters=2 #Currently model selection is not supported, so we fix this #without search. regparam=1 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/class_train.features [Output] #the predicted cluster memberships are written here predicted_clusters_for_training_data=./examples/predictions/clusters.txt
from rlscore import core
from rlscore import reader
from rlscore import writer
kwargs = {}
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
kwargs['regparam'] = '1'
kwargs['bias'] = '1'
kwargs['number_of_clusters'] = '2'
kwargs['learner'] = 'MMC'
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
writer.write_ints('./examples/predictions/clusters.txt', trainresults['predicted_clusters_for_training_data'])
GreedyRLS, the feature selection module of RLScore, allows selecting a fixed size subset of features. The selection criterion is the performance of a RLS learner when trained on the selected features, which is measured using leave-one-out cross-validation. Both regression and classification tasks are supported.
In addition to feature selection, the module can be used to train sparse RLS predictors that use only a specified amount of features for making predictions. Only linear learning is supported. The method scales linearly with respect to the number of examples, features and selected features.
The indices of the selected features are written to the file provided as the 'selected_features' parameter. The LOO performances made by GreedyRLS in each step of the greedy forward selection process are written to the file provided as the 'GreedyRLS_LOO_performances' parameter.
#Performs incremental forward selection, where k features #which lead to good leave-one-out performance are chosen. # #Further, the method trains a sparse linear prediction model on the chosen #features. # #This example is about binary classification, but by changing #the performance measure the method is also suitable for #regression or multiclass problems. # #Details of the method can be found in the forthcoming article #'Linear Time Feature Selection for regularized least-squares' # #The quality of the learner model is tested on independent test data # [Modules] #The only feature selecting learner currently supported learner=GreedyRLS #Since we are doing feature selection for a classification task, #classification accuracy is a reasonable performance measure measure=accuracy [Parameters] #subsetsize controls the number of selected features subsetsize=3 #Currently cross-validated search for regularization parameter #choosing is not supported for feature selection, so we fix this #without search. regparam=1 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #Calculating test performance is of course not necessary, #but it gives some idea about the quality of selected feature #set #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model that has non-zero coefficients only for #the selected features is written here model=./examples/models/sparse.model #the indices of selected features are written here selected_features=./examples/predictions/selected.findices #The LOO performances made by GreedyRLS in each step of #the greedy forward selection process are written here GreedyRLS_LOO_performances=./examples/predictions/GreedyRLS_LOO.performance
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['subsetsize'] = '3'
kwargs['bias'] = '1'
kwargs['learner'] = 'GreedyRLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/sparse.model', model)
writer.write_dense('./examples/predictions/GreedyRLS_LOO.performance', trainresults['GreedyRLS_LOO_performances'])
writer.write_ints('./examples/predictions/selected.findices', trainresults['selected_features'])
Most of the learning algorithms included in the RLScore package support the use of also other kernels than the linear one. Efficient implementations for calculating the Gaussian and the polynomial kernel are included.
The training algorithms explicitly construct and decompose the full kernel matrix, resulting in squared memory and cubic training complexity. Performing cross-validation or multiple output learning does not increase this complexity due to computational shortcuts. In practice kernels can be used with several thousands of training data points. For large scale learning with kernels, see reduced set approximation
Currently grid searching for kernel parameters is not supported, the way to accomplish this is to write a wrapper script around rls_core.
In the following example we traing a RLS classifier using Gaussian kernel, the other learners can be used with kernels in an analogous way. The only change needed to the earlier examples is to define 'kernel=GaussianKernel' and supply the kernel parameters under [Parameters].
#Binary classification (accuracy maximizing) # #In this example we utilize the gaussian kernel # #this examples chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RLS regressor is used as the training algorithm learner=RLS #Leave-one-out cross-validation can be used for parameter selection mselection=LOOSelection #Alternatively, 10-fold cross-validation with randomized fold partition #could be used. It is also possible to supply your own folds. #mselection=NfoldSelection #Accuracy measures the fraction of correct classifications made measure=accuracy kernel=GaussianKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #width parameter of the gaussian kernel gamma=0.01 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['gamma'] = '0.01'
kwargs['kernel'] = 'GaussianKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
#Binary classification (accuracy maximizing) # #In this example we utilize the polynomial kernel # #this example chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RLS regressor is used as the training algorithm learner=RLS #Leave-one-out cross-validation can be used for parameter selection mselection=LOOSelection #Alternatively, 10-fold cross-validation with randomized fold partition #could be used. It is also possible to supply your own folds. #mselection=NfoldSelection #Accuracy measures the fraction of correct classifications made measure=accuracy kernel=PolynomialKernel [Parameters] #The polynomial kernel is defined as #k(xi,xj) = (gamma * <xi,xj> + coef0)**degree # #We use here a simple homogenous polynomial kernel of #degree 3 gamma=1 coef0=0 degree=3 #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['coef0'] = '0'
kwargs['degree'] = '3'
kwargs['gamma'] = '1'
kwargs['kernel'] = 'PolynomialKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
In settings where both the number of training data and the number of features are large, but the data is sparse (most entries in data matrix zeroes), regression, classification and ranking can be done much more efficiently using the conjugate gradient training algorithms. In this case, kernels are not supported, only linear models. The methods allow substantial savings in memory usage and improved scaling, since they need only the non-zero entries in the data matrix for training, and avoid computing samples x samples or features x features sized matrices.
In this setting, the CRGRLS module can be used analogously to the RLS module, and the CGRankRLS module can be used analogously to AllPairsRankRLS / LabelRankRLS. The CG-implementations do not support cross-validation.
In addition to learning from utility scores of data points, CGRankRLS also supports learning from pairwise preferences.
#This config file runs Conjugate Gradient version of RLS #The CGRLS is useful for very large and high-dimensional but sparse data sets, #and can be used only with the linear kernel. # # #Binary classification (accuracy maximizing) # #Accuracy maximizing binary classification is done by training a RLS regressor #with labels +1 for positive and -1 for negative class. The algorithm #is equivalent to the so-called 'Least-squares support vector machine', #and is known to provide similar performance as SVMs # [Modules] #RLS regressor is used as the training algorithm learner=CGRLS #Accuracy measures the fraction of correct classifications made measure=accuracy [Parameters] regparam=1 #bias is mostly useful for linear models with low-dimensional data bias=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['bias'] = '2'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
#This config file runs Conjugate Gradient version of RankRLS #The CGRankRLS is useful for very large and high-dimensional but sparse data sets, #and can be used only with the linear kernel. # [Modules] #RankRLS is used as the training algorithm learner=CGRankRLS #Area under ROC Curve measure=auc [Parameters] regparam=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = auc
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
#Ranking with query ids. #Conjugate gradient RankRLS training [Modules] #CGRankRLS can be used for ranking problems with qids learner=CGRankRLS #We use the validation set for parameter selection mselection=ValidationSetSelection #Disagreement error measures the average number of pairwise mis-orderings per #query measure=disagreement [Parameters] #search regularization parameter reggrid=0.001 0.1 10 1000 [Input] #features of the training examples train_features=./examples/data/rank_train.features #labels of the training examples train_labels=./examples/data/rank_train.labels #qids of the training examples train_qids=./examples/data/rank_train.qids #validation set files validation_features=./examples/data/rank_test.features validation_labels=./examples/data/rank_test.labels validation_qids=./examples/data/rank_test.qids [Output] #the learned model is written here model=./examples/models/rankqids.model
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import disagreement
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/rank_train.labels')
kwargs['devel_labels'] = reader.read_dense('./examples/data/rank_test.labels')
kwargs['train_qids'] = reader.read_qids('./examples/data/rank_train.qids')
kwargs['devel_qids'] = reader.read_qids('./examples/data/rank_test.qids')
kwargs['train_features'] = reader.read_sparse('./examples/data/rank_train.features')
kwargs['devel_features'] = reader.read_sparse('./examples/data/rank_test.features')
kwargs['reggrid'] = '0.001 0.1 10 1000'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = disagreement
kwargs['mselection'] = 'DevelSetSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
writer.write_pickle('./examples/models/rankqids.model', model)
#This config file runs Conjugate Gradient version of RankRLS with a list of pairwise preferences between data points rather than labeled data points. #The CGRankRLS is useful for very large and high-dimensional but sparse data sets, #and can be used only with the linear kernel. # [Modules] #RankRLS is used as the training algorithm learner=CGRankRLS #Area under ROC Curve measure=auc #Linear kernel is always a reasonable first choice, advanced examples #show how to learn non-linear models with other kernels. kernel=LinearKernel [Parameters] regparam=1 [Input] #features of the training examples train_features=./examples/data/class_train.features #pairwise preferences between data points train_preferences=./examples/data/rank_train.preferences #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc
kwargs = {}
kwargs['train_preferences'] = reader.read_preferences('./examples/data/rank_train.preferences')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = auc
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
Once training data set size exceeds several thousand examples, training the learning methods with (non-linear) kernels becomes infeasible. For this case RLScore implements the reduced set approximation algorithm, where only a pre-specified subset of training examples are used to represent the dual solution learned.
To use the reduced set approximation, one should supply the indices of those training examples which are used to represent the learned solution (so-called 'basis 'vectors') in a file. The file should contain one line, where the indices are separated with whitespaces.
The best way for selecting the basis vectors is an open research question, uniform random subsampling of training set indices provides usually decent results.
While cross-validation can be performed with the reduced set approximation, the results are only approximative. For small regularization parameter values pessimistic bias has been observed in the cross-validation estimates.
#Binary classification (accuracy maximizing) # #In this example we utilize the gaussian kernel, and use the #reduced set approximation with 10 basis vectors given in a file # #this examples chooses regularization parameter, trains the method, #makes test predictions and calculates test performance all at once [Modules] #RLS regressor is used as the training algorithm learner=RLS regparam=1.0 #Accuracy measures the fraction of correct classifications made measure=accuracy kernel=GaussianKernel [Parameters] #search regularization parameter from grid 2^-10...2^10 reggrid=-10_10 #width parameter of the gaussian kernel gamma=0.01 [Input] #features of the training examples train_features=./examples/data/class_train.features #labels of the training examples train_labels=./examples/data/class_train.labels #features of the test examples prediction_features=./examples/data/class_test.features #true labels of the test examples test_labels=./examples/data/class_test.labels #the ten basis vectors, the file contains indices of ten training examples basis_vectors=./examples/data/bvectors.indices [Output] #the learned model is written here model=./examples/models/classacc.model #the predicted labels are written here predicted_labels=./examples/predictions/classacc.predictions
from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['basis_vectors'] = reader.read_bvectors('./examples/data/bvectors.indices')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['gamma'] = '0.01'
kwargs['kernel'] = 'GaussianKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)
Regularized least-squares regression with Kronecker kernels is a method that takes advantage of the computational short-cuts for inverting so-called shifted Kronecker product systems. The current implementation only works with the library interface and with kernel matrices for training and prediction that are constructed in advance.
from rlscore.learner import KronRLS
from rlscore import reader
from rlscore import writer
from rlscore.measure import sqerror
kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/Kron_train.labels')
kwargs['kmatrix1'] = reader.read_dense('./examples/data/Kron_train_1.kernelm')
kwargs['kmatrix2'] = reader.read_dense('./examples/data/Kron_train_2.kernelm')
K_test1 = reader.read_dense('./examples/data/Kron_test_1.kernelm')
K_test2 = reader.read_dense('./examples/data/Kron_test_2.kernelm')
test_labels = reader.read_dense('./examples/data/Kron_test.labels')
kwargs['regparam'] = 0.001
learner = KronRLS.createLearner(**kwargs)
learner.train()
kronmodel = learner.getModel()
kronpred = kronmodel.predictWithKernelMatrices(K_test1, K_test2)
print sqerror(test_labels, kronpred)
| Former Contributors: | |
|---|---|
| Evgeni Tsivtsivadze - participated in designing the version 0.1 and co-authored some of the articles in which the implemented methods were proposed. | |
| [1] | Tapio Pahikkala, Sebastian Okser, Antti Airola, Tapio Salakoski, and Tero Aittokallio. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms for Molecular Biology, 7(1):11, 2012. [ bib | DOI | http ] |
| [2] | Antti Airola, Tapio Pahikkala, Willem Waegeman, Bernard De Baets, and Tapio Salakoski. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55(4):1828-1844, April 2011. [ bib | DOI | http ] |
| [3] | Tapio Pahikkala, Antti Airola, and Tapio Salakoski. Speeding up greedy forward selection for regularized least-squares. In Sorin Draghici, Taghi M. Khoshgoftaar, Vasile Palade, Witold Pedrycz, M. Arif Wani, and Xingquan Zhu, editors, Proceedings of The Ninth International Conference on Machine Learning and Applications (ICMLA'10), pages 325-330. IEEE Computer Society, December 2010. [ bib | DOI | .pdf ] |
| [4] | Tapio Pahikkala, Willem Waegeman, Antti Airola, Tapio Salakoski, and Bernard De Baets. Conditional ranking on relational data. In José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'10), Part II, volume 6322 of Lecture Notes in Computer Science, pages 499-514. Springer, 2010. [ bib | DOI | http ] |
| [5] | Antti Airola, Tapio Pahikkala, and Tapio Salakoski. Large scale training methods for linear RankRLS. In Eyke Hüllermeier and Johannes Fürnkranz, editors, Proceedings of the ECML/PKDD-Workshop on Preference Learning (PL-10), 2010. [ bib ] |
| [6] | Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast evolutionary maximum margin clustering. In Léon Bottou and Michael Littman, editors, ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, volume 382 of ACM International Conference Proceeding Series, pages 361-368, New York, NY, USA, June 2009. ACM. [ bib | DOI ] |
| [7] | Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jouni Järvinen, and Jorma Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129-165, 2009. [ bib | DOI | .pdf ] |
| [8] | Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Timo Honkela, Matti Pöllä, Mari-Sanna Paukkeri, and Olli Simula, editors, Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'08), pages 1-8. Helsinki University of Technology, 2008. [ bib | .pdf ] |
| [9] | Tapio Pahikkala, Antti Airola, Hanna Suominen, Jorma Boberg, and Tapio Salakoski. Efficient AUC maximization with regularized least-squares. In Anders Holst, Per Kreuger, and Peter Funk, editors, Proceedings of the 10th Scandinavian Conference on Artificial Intelligence (SCAI 2008), volume 173 of Frontiers in Artificial Intelligence and Applications, pages 12-19. IOS Press, Amsterdam, Netherlands, 2008. [ bib | .pdf ] |
| [10] | Evgeni Tsivtsivadze, Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. A sparse regularized least-squares preference learning algorithm. In Anders Holst, Per Kreuger, and Peter Funk, editors, Proceedings of the 10th Scandinavian Conference on Artificial Intelligence (SCAI 2008), volume 173, pages 76-83. IOS Press, 2008. [ bib | .pdf ] |
| [11] | Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jorma Boberg, and Tapio Salakoski. Learning to rank with pairwise regularized least-squares. In Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai, editors, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 27-33, 2007. [ bib | .pdf ] |
| [12] | Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. Fast n-fold cross-validation for regularized least-squares. In Timo Honkela, Tapani Raiko, Jukka Kortela, and Harri Valpola, editors, Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence, pages 83-90, Espoo, Finland, 2006. Otamedia Oy. [ bib | .pdf ] |