RLScore

RLScore - regularized least-squares based machine learning algorithms for regression, classification, ranking, clustering, and feature selection.

Authors:Tapio Pahikkala, Antti Airola
Email:firstname.lastname@utu.fi
Homepage:http://staff.cs.utu.fi/~aatapa/software/RLScore/
Version:0.5
License:The MIT License
Date:2012.06.19

Contents

Overview

RLScore is a Regularized Least-Squares (RLS) based algorithm package. It contains implementations of the RLS and RankRLS learners allowing the optimization of performance measures for the tasks of regression, ranking and classification. In addition, the package contains linear time greedy forward feature selection with leave-one-out criterion for RLS (greedy RLS). Finally, the package contains an implementation of a maximum margin clustering method based on RLS and stochastic hill climbing. Implementations of efficient cross-validation algorithms are integrated to the package, combined together with functionality for fast parallel learning of multiple outputs.

Reduced set approximation for large-scale learning with kernels is included. In this setting approximation is introduced also to the cross-validation methods. For learning linear models from large but sparse data sets, RLS and RankRLS can be trained using conjugate gradient optimization techniques.

Support for different tasks

Download

Download RLScore.zip containing the python source code of RLScore.

Software dependencies

RLScore is written in Python and thus requires a working installation of Python 2.6.x. The package is also dependent on the NumPy 1.3.x package for matrix operations, and SciPy 0.7.x package for sparse matrix implementations. The psyco package is automatically used if installed.

Usage

RLScore is designed to be used by supplying a configuration file defining the learning task to the rls_core program.

The easiest way to use RLScore is by modifying one of the example configuration files delivered with the distribution, to match your task. The software supports a wide variety of different learning tasks, ranging from supervised learning to clustering and feature selection.

To run RLScore using a configuration defined in a file example.cfg, simply write:

python rls_core.py example.cfg

The structure of the configuration file is described in detail next.

There is also a programming interface to RLScore. There is not yet a documentation of the API available, but for each configuration file we provide also example Python code for executing the same run.

Configuration file

The configuration file consists of [Sections], which contain attribute=value pairs. The configuration file is case sensitive, the ordering within sections does not matter. Use # to start comment. None of the attributes are mandatory. However, setting certain attributes also requires some other attributes to be set. The sections in the configuration are [Modules], [Parameters], [Input], and [Output] sections.

[Modules]

The Modules section defines the main modules used for model selection, learning and performance evaluation. The attributes in the section are

learner:by defining the learner you inform RLScore that it should train one of the available learning algorithms
kernel:defines the used kernel function. For kernel parameters, see [Parameters]
measure:defines the performance measure used for model selection and/or evaluating
mselection:defines the used model selection strategy. The model selection strategies are not compatible with all the learners, and some not with all performance measures.

learner

RLScore currently has the following four possible values of the learner attribute:

  • Value:RLS
    Description:Regularized least-squares regression, or accuracy maximizing classification.
    Modules:kernel , mselection (optional, compatible with LOOSelection, NfoldSelection, ValidationSetSelection)
    Parameters:regparam (or reggrid if mselection used), bias , kernel parameters
    Input data:train_features , train_labels
  • Value:AllPairsRankRLS
    Description:Regularized least-squares ranking, or AUC-maximizing classification.
    Modules:kernel, mselection (optional, compatible with LPOSelection, NfoldSelection, ValidationSetSelection)
    Parameters:regparam (or reggrid if mselection used), kernel parameters
    Input data:train_features , train_labels
  • Value:LabelRankRLS
    Description:Regularized least-squares ranking with a query-structure.
    Modules:kernel, 'mselection'_ (optional, compatible with NfoldSelection, ValidationSetSelection)
    Parameters:regparam (or reggrid if mselection used), kernel parameters
    Input data:train_features , train_labels , train_qids
  • Value:CGRLS
    Description:Regularized least-squares regression, or accuracy maximizing classification. Large scale algoritm for large and high-dimensional but sparse data sets, and linear kernel. Gives equivalent results as RLS.
    Modules:mselection (optional, compatible with ValidationSetSelection)
    Parameters:regparam (or reggrid if mselection used), bias
    Input data:train_features , train_labels . Supplying validation_features and validation_labels will automatically lead to using early stopping for faster training, by measuring sqerror on validation data, and terminating after no improvement is seen for 10 iterations.
  • Value:CGRankRLS
    Description:Regularized least-squares ranking, or AUC-maximizing classification. Also ranking with a query-structure. Large scale algoritm for large and high-dimensional but sparse data sets, and linear kernel. Gives equivalent results as AllPairsRankRLS (or LabelRankRLS, if queries are supplied).
    Modules:mselection (optional, compatible with ValidationSetSelection)
    Parameters:regparam (or reggrid if mselection used),
    Input data:train_features , train_labels , train_qids (optional). Supplying validation_features , validation_labels and optionally validation_qids , will automatically lead to using early stopping for faster training, by measuring sqmprank-error on validation data, and terminating after no improvement is seen for 10 iterations.
  • Value:GreedyRLS
    Description:Feature selecting regularized least-squares learner.
    Modules:kernel , mselection (compatible with ValidationSetSelection)
    Parameters:regparam (or reggrid if 'mselection'_ used), subsetsize , bias
    Input data:train_features , train_labels
  • Value:MMC
    Description:Maximum margin clustering based on evolutionary search and regularized least-squares.
    Modules:kernel
    Parameters:regparam (or reggrid if mselection used), number_of_clusters , bias , kernel parameters
    Input data:train_features

Of these, the first six are supervised learners and the last is an unsupervised clustering method.

kernel

The kernel attribute defines the used kernel function. This should be supplied to the kernel-based learners, default behaviour is to use linear kernel if this is not supplied. Parameters can be supplied for kernel functions in the [Parameters] section. For the kernel atribute, RLScore currently supports the following three values

  • Value:LinearKernel
    Description:The linear kernel aka the standard inner product <x,z> of feature vectors x and z. This is the default value for the kernel attribute.
    Parameters:None.
    Requirements:None.
  • Value:GaussianKernel
    Description:The Gaussian radial basis function kernel e^(-gamma*<x-z,x-z>) for feature vectors x and z, where g is the width of the Gaussian kernel.
    Parameters:gamma (default 1)
    Requirements:gamma > 0
  • Value:PolynomialKernel
    Description:The polynomial kernel k(x,z) = (gamma * <x,z> + coef0)^degree for feature vectors x and z, where d, c, and g are kernel parameters.
    Parameters:gamma (default 1), coef0 (default 0.), degree (default 2)
    Requirements:degree>0, coef0>=0, gamma>0. Moreover, degree must be integer, while c and g may be floats.

measure

The measure attribute defines the performance measure used for model selection and/or evaluating test performance.

  • Value:sqerror
    Description:Mean squared error, for regression.
    Requirements:None
  • Value:accuracy
    Description:Accuracy, for binary classification.
    Requirements:The correct labels must be +1 or -1.
  • Value:auc
    Description:Area under ROC curver, for classification (bipartite ranking).
    Requirements:The correct labels must be +1 or -1.
  • Value:ova_accuracy
    Description:Multiclass classification accuracy, one-vs-all strategy.
    Requirements:The correct labels must be +1 or -1 and there must be one and only one +1 per data point.
  • Value:disagreement
    Description:Disagreement error, the number of misordered pairs in pairwise ranking.
    Requirements:None
  • Value:sqmprank
    Description:Squared magnitude-preserving ranking error. Average value of ((f(x1)-f(x2))-(y1-y2))**2 over all data point pairs.
    Requirements:None
  • Value:fscore
    Description:F1-score
    Requirements:The correct labels must be +1 or -1.

mselection

The mselection attribute defines the model selection strategy used for selecting the regularization parameter. The model selection strategies are not compatible with all the learners, and some not with all performance measures.

  • Value:NfoldSelection
    Description:N-fold cross-validation or repeated hold-out for RLS or AllPairsRankRLS. Uses by default randomized 10-fold partition. User supplied hold-out sets can be provided via cross-validation_folds attribute in the [Input] section. For LabelRankRLS, each query forms a fold and user supplied hold-out sets are not supported.
  • Value:LOOSelection
    Description:Leave-one-out cross-validation. Supported by RLS.
  • Value:LPOSelection
    Description:Leave-pair-out cross-validation. Supported by AllPairsRankRLS. Based on disagreement error.
  • Value:ValidationSetSelection
    Description:Parameter selection on a separate validation set. Supported by all the supervised learners. Requires in the [Input] section validation_features , validation_labels (also optionally for RankRLS learners, validation_qids ).

[Parameters]

Parameters section contains the parameters supplied to RLScore. The meaning of kernel and learner parameters differs for different learning and kernel modules.

regparam

Supply a float valued regularization parameter if you wish to train a learner with a pre-selected parameter value. This value is used, if no model selection module is defined. Must be positive. The default value is 1.

reggrid

Regularization parameter grid searched during model selection. The value of the attribute is given as lower_upper, where lower and upper must be integers, with upper > lower. The grid becomes 2**lower ... 2**upper, that is, all integer powers of 2 between 2**lower and 2**upper are tested as values of the regularization parameter and the one with the best performance is selected. The default grid is -5_5. Alternatively, all the parameter values in the grid can be given directly, e.g. '0.001, 0.1, 1, 10, 50'.

bias

Float valued bias term, that corresponds to a new constant-valued feature added to each data point. Allows learning models of the type f(x)+b, where a constant value (learned from data) is added to each prediction. The value must be positive, the default value is 0. Can be useful for RLS learners, when using linear kernel and low-dimensional data.

number_of_clusters

Parameter supplied to the MMC learner. Its value is an integer specifying the desired number of clusters.

subsetsize

Parameter supplied to the GreedyRLS learner. Its value is an integer defining the number of selected features.

gamma

Float valued positive kernel parameter for the Gaussian or the polynomial kernel. For the Gaussian kernel k(x,z) = e^(-gamma*<x-z,x-z>), for polynomial kernel k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 1.).

coef0

Float valued kernel parameter for the polynomial kernel. k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 0)

degree

Integer valued positive kernel parameter for the polynomial kernel. k(x,z) = (gamma * <x,z> + coef0)^degree. (default = 2)

[Input]

The attributes in this section are names of RLScore variables used inside the RLScore software. The values of the attributes are filenames from which data is loaded to the variables. For example, the feature representations of the training data are loaded into a variable of name train_features. Some of the loaded [Modules] require certain valiables to be loaded. The loaded variables also have an effect on what rls_core does.

All variables have their corresponding default file formats. Detailed descriptions of the variables and their default file formats are given in RLScore variables.

[Output]

Analogously to the [Input] section, the attributes in this section are names of variables used inside RLScore. The values of the attributes are names of files into which the contents of the variable are written to. The files are written in the default format of the variable in question.

RLScore variables

RLScore variables are used to refer to the different types of data inside the RLScore software. The contents of the variables can be loaded from a file via the [Input] section or they are generated by the software itself. For example, if the contents of the model and prediction_features variables are provided, the software uses the model to perform predictions for the data points represented by the prediction_features variable and the predictions are put to the variable predicted_labels. The contents of predicted_labels can then be saved into file or used for performance evaluation if the contents of the test_labels variable are also provided.

train_features

Variable containing features for training data. The default file format is the one described in Featurefile.

train_labels

Variable containing labels for training data. Necessary when training supervised learners. The default file format is the one described in Labelfile.

train_qids

Qids for the training data. The default file format is the one described in Qid file.

basis_vectors

Use reduced set approximation to speed up training and prediction. Restricts the learned hypothesis to be represented only by the training data points whose indices are in the basis vector file. The default file format is the one described in Basis vectors.

cross-validation_folds

Variable containing indices of holdout data points, one row per hold-out set. This can be used to define folds for cross-validation or, more generally, hold-out sets for repeated hold-out. The default file format is the one described in Fold file.

model

This variable contains a model learned from a data. It will be generated if user provides a learner attribute and training data for the learner. Model can be saved into a file via Python's pickle protocol. Previously learned model can be loaded from a file in order to perform predictions for unseen data.

prediction_features

Features for data one wishes to make predictions for. Prediction will be performed if a model is loaded from a file or if a predictor has been trained. The default file format is the one described in Featurefile.

test_labels

Correct labels for test data, supply these if you want to measure performance on test data. The default file format is the one described in Labelfile.

predicted_labels

Predicted labels for test data. These are generated if a model is used to perform predictions. These are also needed if one wants to measure performance on test data. The default file format is the one described in Labelfile.

prediction_qids

Qids for test data, supply these if you want to evaluate performance on test data as an average over queries. The default file format is the one described in Qid file.

selected_features

The indices of the features selected by the GreedyRLS learner (see Feature selection with greedy RLS).

GreedyRLS_LOO_performances

The list containing the LOO performances made by GreedyRLS during the greedy forward selection process (see Feature selection with greedy RLS).

validation_features

Variable containing features for validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter. The default file format is the one described in Featurefile.

validation_labels

Variable containing labels for validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter. The default file format is the one described in Labelfile.

validation_qids

Qids for the validation data. Necessary when using ValidationSetSelection for choosing the regularization parameter, for query structured data with LabelRankRLS, or CGRankRLS (see the learner attribute).

File formats

The following types of files can be supplied as input for rls_core

Featurefile - the file containing attribute:value pairs for the training data.

Labelfile - the file containing the values of the correct labels for training data.

Fold file - Indices of holdout data, can be used to define folds for cross-validation.

Basis vectors - Indices of the training data points used as basis vectors, for the reduced set approximation. Normally, all the training data are basis vectors.

Qid file - File contains a query id for each data point. This can be used in query structured ranking tasks to define which document are related to the same query (information retrieval tasks), to define which parses correspond to the same sentence (parse ranking), etc.

The convention used when indexing features or data points is to start the indexing from zero. Thus if there are m distinct features/data points, the possible indices are from the range [0 ... m-1].

Below we give detailed descriptions of the file formats.

Featurefile

In all tasks, the data are provided in the input file one per line using sparse representation. Technically, the format of a line can be expressed as follows:

<line> .=. <index>:<value> <index>:<value> ... <index>:<value> # <comment>
<index> .=. <integer>
<value> .=. <float>
<comment> .=. <string>

The features are provided in tokens consisting of a feature index, a colon, and a real number indicating the value of the feature. The feature representation is sparse so that only the features whose values differ from 0 are present in the line. Further, the feature indices have to be given from the smallest to the largest starting from zero. For example, the line:

0:0.43 3:0.12 9284:0.2

specifies a data point that has non-zero values for features number 0, 3 and 9284, and value 0 for all the other possible features. If a data point has no non-zero valued attribute, then use 0:0 to differentiate this from empty line.

Labelfile

Labels are the correct output values associated with some set of data points. These are required in training supervised learners and in performance estimation, but naturally not when making predictions for new examples. The labels are provided in the label file so that each line corresponds to one training data point, the data being in the same order as in the feature file. The file label file has the following dense matrix format:

<line> .=. <value> <value> ... <value> # <comment>
<value> .=. <float>
<comment> .=. <string>

Note that there may be several labels per each line but each line must have the same number of labels. Having multiple labels is useful for multi-class and multi-label classification tasks or in general if there are many learning tasks to be solved simultaneously. For classification 1 is used to represent the positive class and -1 the negative. For regression and ranking any real values can be used.

Examples:

Lines:

1
-1
1

Could represent two positive (lines 1 and 3) and one negative data points in a binary classification task.

Line:

1 -1 -1 -1 1

could represent the labels for a data point in a multi-label classification task where a data point may belong to several different classes simultaneously. In this case the data point would belong to classes 1 and 5.

Lines:

1 -1 -1
-1 -1 1
-1 -1 1
-1 1 -1

could represent the labels for four data points in a multi-class classification task with three possible classes. In this setting each label corresponds to one class, and each data point has value 1 for the class it belongs to, and -1 for the other classes.

Lines:

1.123
3.433
0.0023

could represent real valued outputs for a simple regression task, where each data point is associated with one value, which we want to learn to predict.

Fold file

The cross-validation folds file format is the following. For each separate hold-out set, there is a line in the file consisting of a list of indices of the training inputs that belong to the hold-out set. Technically, the format of a line can be expressed as follows:

<line> .=. <index> ... <index> # <comment>
<index> .=. <integer>
<comment> .=. <string>

The indices are separated with a white-space character. An index can not be more than one time in a single line. However, a single training input can belong to several hold-out sets simultaneously, and hence an index can be in multiple lines. The indexing of the training inputs starts from zero.

Basis vectors

The basis vectors file contains a single line, where the indices of the basis vectors are contained, separated by whitespace. The format can be expressed as follows:

<line> .=. <index> ... <index>
<index> .=. <integer>

For example:

0 23 25 44

Would mean that the data points number 0, 23 25 and 44 are used as basis vectors. An index can not be more than once in this file. The indexing of the training inputs starts from zero.

Qid file

When performing ranking, the qid value is used to restrict the pairwise preference relations. By default, the preference relation covers all pairs of data points. Qids can be used to restrict which pairs are included in the relation. A pair of data points is included in the preference relation only, if the value of "qid" is the same for both of them.

Each line in the query id file contains the id of the query the data point belongs to. The format can be expressed as follows:

<line>.=. <qid>
<qid>.=. <integer>

For example:

1
1
1
2
2

Would mean that the first three data points belong to query number 1, and the last second to query number 2. In this case pairwise preferences would be observed between the first and second, first and third, second and third and fourth and fifth data points. However, preferences between other pairs would not be considered, as they have different qids. The qids mainly have an effect on the pairwise performance measures, such as disagreement error or squared magnitude preserving ranking error. However, they may also have an effect on the other performance measure due to the averaging over the queries. For example, if squared error is used together with the qids provided in the above example file, the average squared error is first calculated for each query and the overall error is the average taken over the queries. Therefore, the three first data points have a lesser weight than the last two data points. This is in contrast to the case without qids, where the overall error is the average error taken over all data.

Currently, using qid file and a fold file together is not supported.

Examples

RLScore is designed to be used by supplying a configuration file defining the learning task to the rls_core program.

The easiest way to use RLScore is by modifying one of the example configuration files presented next, to match your task. The software supports a wide variety of different learning tasks, ranging from supervised learning to clustering and feature selection. Examples of typical use-cases for each type of task are provided below.

The configuration files, and the example data sets used by them can be found in the 'examples' folder of the RLScore distribution. For example, to run the configuration 'reg_train.cfg' included in examples/cfgs from the command line, go to the folder containing the RLScore distribution, and execute the command 'python rls_core.py examples/cfgs/reg_train.cfg'

While the examples use Unix-style paths with '/' separator, they work also in Windows with no modifications needed.

Binary classification, maximize accuracy

In binary classification, the data is separated into two classes. Classification accuracy measures the fraction of correct classifications made by the learned classifier. This is perhaps the most widely used performance measure for binary classification. However, for very unbalanced data-sets it may be preferable to optimize the area under ROC curve (AUC) measure, considered in a later example, instead.

When training a classifier according to the accuracy criterion, using the RLS module which minimizes a least squares loss on the training set class labels is recommended. The approach is equivalent to the so-called least-squares support vector machine.

Requirements: - class labels should be either 1 (positive) or -1 (negative)

Config file (classacc_all)

#Binary classification (accuracy maximizing)
#
#Accuracy maximizing binary classification is done by training a RLS regressor
#with labels +1 for positive and -1 for negative class. The algorithm
#is equivalent to the so-called 'Least-squares support vector machine',
#and is known to provide similar performance as SVMs
#
#For Area under the ROC curve (AUC) maximizing binary classification, check
#the corresponding example.
#
#this examples chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RLS regressor is used as the training algorithm
learner=RLS

#Leave-one-out cross-validation can be used for parameter selection
mselection=LOOSelection

#Alternatively, 10-fold cross-validation with randomized fold partition
#could be used. It is also possible to supply your own folds.
#mselection=NfoldSelection

#Accuracy measures the fraction of correct classifications made
measure=accuracy

#Linear kernel is always a reasonable first choice, advanced examples
#show how to learn non-linear models with other kernels.
kernel=LinearKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#bias is mostly useful for linear models with low-dimensional data
bias=1

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (classacc_all)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Binary classification with RankRLS, maximize area under ROC curve (AUC)

In binary classification, the data is separated into two classes, which are often referred to as the positive, and the negative class. AUC measures the probability, that a randomly drawn positive data point receives a higher predicted value than a randomly drawn negative one. The measure is especially suitable for unbalanced data.

When training a classifier according to the AUC criterion, using the RankRLS learner which minimizes a pairwise least-squares loss on the training set class labels is recommended. Leave-pair-out cross-validation is recommended for model selection, unless the data set is very large.

Config file (classAUC_all)

#Binary classification (AUC maximizing)
#
#AUC maximizing binary classification is done by training the RankRLS ranker
#with labels +1 for positive and -1 for negative class. The ranker aims
#to solve the bipartite ranking task of ranking positive examples higher
#than negative ones, which corresponds to AUC-maximization.
#
#(see An efficient algorithm for learning to rank from preference
#graphs", Machine Learning, 2009 for further details on the training algorithm) 
#
#Leave-pair-out cross-validation is the recommended strategy for model
#selection, as leave-one-out estimation of AUC is known to have serious
#negative bias in some cases
#
#(see A Comparison Of AUC-Estimators In Small-Sample Studies, JMLR proceedings
#of MLSB'09. 2010)
#
#this examples trains chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RankRLS is used as the training algorithm
learner=AllPairsRankRLS

#Leave-pair-out cross-validation can be used for parameter selection
mselection=LPOSelection

#Alternatively, 10-fold cross-validation with randomized fold partition
#can be used for large data sets. It is also possible to supply your own
#folds.
#mselection=NfoldSelection

#AUC measures the probability, that a randomly chosen positive example
#receives a higher score than a randomly chosen negative (which corresponds
#to the area under the ROC curve). 
measure=auc

#Linear kernel is always a reasonable first choice, advanced examples
#show how to learn non-linear models with other kernels.
kernel=LinearKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#bias is mostly useful for linear models with low-dimensional data
bias=1

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classAUC.model
predicted_labels=./examples/predictions/classAUC.predictions

Python code (classAUC_all)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'AllPairsRankRLS'
kwargs['measure'] = auc
kwargs['mselection'] = 'LPOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classAUC.model', model)
writer.write_dense('./examples/predictions/classAUC.predictions', predicted_labels)

Ranking with RankRLS, minimize pairwise mis-orderings

In ranking the aim is to learn a function, whose predictions result in an accurate ranking when ordering new examples according to the predicted values. That is, more relevant examples should receive higher predicted scores than less relevant.

Using qids means that instead of a total order over all examples, each query has it's own ordering, and examples from different queries should not be compared. For example in information retrieval, each query might consist of the ordering of a set of documents according to a query posed by a user.

When training a ranker, using the RankRLS learner which minimizes a pairwise least-squares loss on the training set class labels is recommended. Leave-query-out cross-validation is recommended for parameter selection.

In case you have a total order over all examples, instead of query structrue, proceed as follows: - do not supply qid files - replace LabelRankRLS with AllPairsRankRLS in the Modules section

If the data is both high dimensional and sparse, one should use the module CGRankRLS, which is optimized for such a data (see Learning linear models from large sparse data sets).

In addition to learning from utility scores of data points, CGRankRLS also supports learning from pairwise preferences, see Config file (cgrank_test_with_preferences) and Python code (cgrank_test_with_preferences)

Config file (rankqids_all)

#Ranking with query ids.
#
#In ranking the aim is to learn a function, whose predictions result in an
#accurate ranking when ordering new examples according to the predicted
#values. That is, more relevant examples should receive higher predicted
#scores than less relevant.
#
#Using qids means that instead of a total order over all examples, each
#query has it's own order, and examples from different queries should
#not be compared. For example in information retrieval, each query
#might consist of the ordering of a set of documents according to
#a query posed by a user.
#
#This example combines training, prediction and performance evaluation
#together


[Modules]

#LabelRankRLS is meant for ranking problems with qids
learner=LabelRankRLS

#For LabelRankRLS NfoldSelection performs leave-query-out cross-validation
#when choosing regularization parameter value
mselection=NfoldSelection

#Disagreement error measures the average number of pairwise mis-orderings per
#query
measure=disagreement

#Alternative: squared magnitude preserving ranking error
#measure=SqMPRankMeasure

#Linear kernel is always a reasonable first choice, advanced examples
#show how to learn non-linear models with other kernels.
kernel=LinearKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#bias is mostly useful for linear models with low-dimensional data
bias=1

[Input]
#features of the training examples
train_features=./examples/data/rank_train.features
#labels of the training examples
train_labels=./examples/data/rank_train.labels
#qids of the training examples
train_qids=./examples/data/rank_train.qids

#features of the test examples
prediction_features=./examples/data/rank_test.features
#true labels of the test examples
test_labels=./examples/data/rank_test.labels
#qids for the test examples
test_qids=./examples/data/rank_test.qids


[Output]
#the learned model is written here
model=./examples/models/rankqids.model
#the predicted labels are written here
predicted_labels=./examples/predictions/rankqids.predictions

Python code (rankqids_all)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import disagreement
from numpy import mean

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/rank_train.labels')
test_labels = reader.read_dense('./examples/data/rank_test.labels')
kwargs['train_qids'] = reader.read_qids('./examples/data/rank_train.qids')
prediction_features = reader.read_sparse('./examples/data/rank_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/rank_train.features')
test_qids = reader.read_qids('./examples/data/rank_test.qids')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'LabelRankRLS'
kwargs['measure'] = disagreement
kwargs['mselection'] = 'NfoldSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
print 'calculating performance as averages over queries'
performances = []
for query in test_qids:
    performances.append(disagreement(test_labels[query], predicted_labels[query]))
performance = mean(performances)
print 'Performance: %f %s' % (performance, disagreement.__name__)
writer.write_pickle('./examples/models/rankqids.model', model)
writer.write_dense('./examples/predictions/rankqids.predictions', predicted_labels)

Regression

In regression, the task is to predict real-valued labels. The regularized least-squares (RLS) module is suitable for solving this task.

Config file (reg_all)

#Regression
#
#Regressor is trained by optimizing regularized least-squares loss
#
#this examples trains chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RLS regressor is used as the training algorithm
learner=RLS

#Leave-one-out cross-validation can be used for parameter selection
mselection=LOOSelection

#Alternatively, 10-fold cross-validation with randomized fold partition
#could be used. It is also possible to supply your own folds.
#mselection=NfoldSelection

#Mean squared error
measure=sqerror

#Linear kernel is always a reasonable first choice, advanced examples
#show how to learn non-linear models with other kernels.
kernel=LinearKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#bias is mostly useful for linear models with low-dimensional data
bias=2

[Input]
#features of the training examples
train_features=./examples/data/reg_train.features
#labels of the training examples
train_labels=./examples/data/reg_train.labels
#features of the test examples
prediction_features=./examples/data/reg_test.features
#true labels of the test examples
test_labels=./examples/data/reg_test.labels

[Output]
#the learned model is written here
model=./examples/models/reg.model
#the predicted labels are written here
predicted_labels=./examples/predictions/reg.predictions

Python code (reg_all)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import sqerror

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/reg_train.labels')
prediction_features = reader.read_sparse('./examples/data/reg_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/reg_train.features')
test_labels = reader.read_dense('./examples/data/reg_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['bias'] = '2'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = sqerror
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = sqerror(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, sqerror.__name__)
writer.write_pickle('./examples/models/reg.model', model)
writer.write_dense('./examples/predictions/reg.predictions', predicted_labels)

Clustering with evolutionary maximum margin clustering

In clustering, the task is to divide unlabeled data into several clusters. One aims to find such cluster structure that within a cluster the data points are similar to each other, but dissimilar with respect to the examples in the other clusters.

The clustering algorithm implemented in RLScore aims to divide the data so that the resulting division yields minimal regularized least-squares error. The approach is analogous to the maximum margin clustering approach. The resulting combinatorial optimization problem is NP-hard, stochastic hill-climbing together with computational shortcuts is used to search for a locally optimal solution. Re-starts may be necessary for discovering good clustering.

Config file (clustering)

#Performs maximum-margin clustering on the data set
#
#Details of the method can be found in
#'Fast Evolutionary Maximum Margin Clustering'
#

[Modules]
#The only clustering method currently supported
learner=MMC

[Parameters]

#number_of_clusters controls the number of clusters
number_of_clusters=2

#Currently model selection is not supported, so we fix this
#without search.
regparam=1

#bias is mostly useful for linear models with low-dimensional data
bias=1

[Input]
#features of the training examples
train_features=./examples/data/class_train.features


[Output]
#the predicted cluster memberships are written here
predicted_clusters_for_training_data=./examples/predictions/clusters.txt

Python code (clustering)

from rlscore import core
from rlscore import reader
from rlscore import writer

kwargs = {}
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
kwargs['regparam'] = '1'
kwargs['bias'] = '1'
kwargs['number_of_clusters'] = '2'
kwargs['learner'] = 'MMC'
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
writer.write_ints('./examples/predictions/clusters.txt', trainresults['predicted_clusters_for_training_data'])

Feature selection with greedy RLS

GreedyRLS, the feature selection module of RLScore, allows selecting a fixed size subset of features. The selection criterion is the performance of a RLS learner when trained on the selected features, which is measured using leave-one-out cross-validation. Both regression and classification tasks are supported.

In addition to feature selection, the module can be used to train sparse RLS predictors that use only a specified amount of features for making predictions. Only linear learning is supported. The method scales linearly with respect to the number of examples, features and selected features.

The indices of the selected features are written to the file provided as the 'selected_features' parameter. The LOO performances made by GreedyRLS in each step of the greedy forward selection process are written to the file provided as the 'GreedyRLS_LOO_performances' parameter.

Config file (fselection)

#Performs incremental forward selection, where k features
#which lead to good leave-one-out performance are chosen.
#
#Further, the method trains a sparse linear prediction model on the chosen
#features.
#
#This example is about binary classification, but by changing
#the performance measure the method is also suitable for
#regression or multiclass problems.
#
#Details of the method can be found in the forthcoming article
#'Linear Time Feature Selection for regularized least-squares'
#
#The quality of the learner model is tested on independent test data
#

[Modules]
#The only feature selecting learner currently supported
learner=GreedyRLS
#Since we are doing feature selection for a classification task,
#classification accuracy is a reasonable performance measure
measure=accuracy

[Parameters]

#subsetsize controls the number of selected features
subsetsize=3

#Currently cross-validated search for regularization parameter
#choosing is not supported for feature selection, so we fix this
#without search.
regparam=1

#bias is mostly useful for linear models with low-dimensional data
bias=1



[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels

#Calculating test performance is of course not necessary,
#but it gives some idea about the quality of selected feature
#set

#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels


[Output]
#the learned model that has non-zero coefficients only for
#the selected features is written here
model=./examples/models/sparse.model

#the indices of selected features are written here
selected_features=./examples/predictions/selected.findices

#The LOO performances made by GreedyRLS in each step of
#the greedy forward selection process are written here
GreedyRLS_LOO_performances=./examples/predictions/GreedyRLS_LOO.performance

Python code (fselection)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['subsetsize'] = '3'
kwargs['bias'] = '1'
kwargs['learner'] = 'GreedyRLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/sparse.model', model)
writer.write_dense('./examples/predictions/GreedyRLS_LOO.performance', trainresults['GreedyRLS_LOO_performances'])
writer.write_ints('./examples/predictions/selected.findices', trainresults['selected_features'])

Using kernels

Most of the learning algorithms included in the RLScore package support the use of also other kernels than the linear one. Efficient implementations for calculating the Gaussian and the polynomial kernel are included.

The training algorithms explicitly construct and decompose the full kernel matrix, resulting in squared memory and cubic training complexity. Performing cross-validation or multiple output learning does not increase this complexity due to computational shortcuts. In practice kernels can be used with several thousands of training data points. For large scale learning with kernels, see reduced set approximation

Currently grid searching for kernel parameters is not supported, the way to accomplish this is to write a wrapper script around rls_core.

In the following example we traing a RLS classifier using Gaussian kernel, the other learners can be used with kernels in an analogous way. The only change needed to the earlier examples is to define 'kernel=GaussianKernel' and supply the kernel parameters under [Parameters].

Config file (gaussian_kernel)

#Binary classification (accuracy maximizing)
#
#In this example we utilize the gaussian kernel
#
#this examples chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RLS regressor is used as the training algorithm
learner=RLS

#Leave-one-out cross-validation can be used for parameter selection
mselection=LOOSelection

#Alternatively, 10-fold cross-validation with randomized fold partition
#could be used. It is also possible to supply your own folds.
#mselection=NfoldSelection

#Accuracy measures the fraction of correct classifications made
measure=accuracy

kernel=GaussianKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#width parameter of the gaussian kernel
gamma=0.01

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (gaussian_kernel)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['gamma'] = '0.01'
kwargs['kernel'] = 'GaussianKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Config file (polynomial_kernel)

#Binary classification (accuracy maximizing)
#
#In this example we utilize the polynomial kernel
#
#this example chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RLS regressor is used as the training algorithm
learner=RLS

#Leave-one-out cross-validation can be used for parameter selection
mselection=LOOSelection

#Alternatively, 10-fold cross-validation with randomized fold partition
#could be used. It is also possible to supply your own folds.
#mselection=NfoldSelection

#Accuracy measures the fraction of correct classifications made
measure=accuracy

kernel=PolynomialKernel

[Parameters]
#The polynomial kernel is defined as
#k(xi,xj) = (gamma * <xi,xj> + coef0)**degree
#
#We use here a simple homogenous polynomial kernel of
#degree 3
gamma=1
coef0=0
degree=3

#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (polynomial_kernel)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['coef0'] = '0'
kwargs['degree'] = '3'
kwargs['gamma'] = '1'
kwargs['kernel'] = 'PolynomialKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
kwargs['mselection'] = 'LOOSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Learning linear models from large sparse data sets

In settings where both the number of training data and the number of features are large, but the data is sparse (most entries in data matrix zeroes), regression, classification and ranking can be done much more efficiently using the conjugate gradient training algorithms. In this case, kernels are not supported, only linear models. The methods allow substantial savings in memory usage and improved scaling, since they need only the non-zero entries in the data matrix for training, and avoid computing samples x samples or features x features sized matrices.

In this setting, the CRGRLS module can be used analogously to the RLS module, and the CGRankRLS module can be used analogously to AllPairsRankRLS / LabelRankRLS. The CG-implementations do not support cross-validation.

In addition to learning from utility scores of data points, CGRankRLS also supports learning from pairwise preferences.

Config file (cgrls_test)

#This config file runs Conjugate Gradient version of RLS
#The CGRLS is useful for very large and high-dimensional but sparse data sets,
#and can be used only with the linear kernel.
#
#
#Binary classification (accuracy maximizing)
#
#Accuracy maximizing binary classification is done by training a RLS regressor
#with labels +1 for positive and -1 for negative class. The algorithm
#is equivalent to the so-called 'Least-squares support vector machine',
#and is known to provide similar performance as SVMs
#

[Modules]

#RLS regressor is used as the training algorithm
learner=CGRLS

#Accuracy measures the fraction of correct classifications made
measure=accuracy

[Parameters]
regparam=1

#bias is mostly useful for linear models with low-dimensional data
bias=1

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (cgrls_test)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['bias'] = '2'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Config file (cgrank_test)

#This config file runs Conjugate Gradient version of RankRLS
#The CGRankRLS is useful for very large and high-dimensional but sparse data sets,
#and can be used only with the linear kernel.
#

[Modules]

#RankRLS is used as the training algorithm
learner=CGRankRLS

#Area under ROC Curve
measure=auc

[Parameters]
regparam=1


[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (cgrank_test)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = auc
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Config file (cgrank_qids)

#Ranking with query ids.
#Conjugate gradient RankRLS training

[Modules]

#CGRankRLS can be used for ranking problems with qids
learner=CGRankRLS

#We use the validation set for parameter selection
mselection=ValidationSetSelection

#Disagreement error measures the average number of pairwise mis-orderings per
#query
measure=disagreement

[Parameters]
#search regularization parameter
reggrid=0.001 0.1 10 1000

[Input]
#features of the training examples
train_features=./examples/data/rank_train.features
#labels of the training examples
train_labels=./examples/data/rank_train.labels
#qids of the training examples
train_qids=./examples/data/rank_train.qids

#validation set files
validation_features=./examples/data/rank_test.features
validation_labels=./examples/data/rank_test.labels
validation_qids=./examples/data/rank_test.qids

[Output]
#the learned model is written here
model=./examples/models/rankqids.model

Python code (cgrank_qids)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import disagreement

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/rank_train.labels')
kwargs['devel_labels'] = reader.read_dense('./examples/data/rank_test.labels')
kwargs['train_qids'] = reader.read_qids('./examples/data/rank_train.qids')
kwargs['devel_qids'] = reader.read_qids('./examples/data/rank_test.qids')
kwargs['train_features'] = reader.read_sparse('./examples/data/rank_train.features')
kwargs['devel_features'] = reader.read_sparse('./examples/data/rank_test.features')
kwargs['reggrid'] = '0.001 0.1 10 1000'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = disagreement
kwargs['mselection'] = 'DevelSetSelection'
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
writer.write_pickle('./examples/models/rankqids.model', model)

Config file (cgrank_test_with_preferences)

#This config file runs Conjugate Gradient version of RankRLS with a list of pairwise preferences between data points rather than labeled data points.
#The CGRankRLS is useful for very large and high-dimensional but sparse data sets,
#and can be used only with the linear kernel.
#

[Modules]

#RankRLS is used as the training algorithm
learner=CGRankRLS

#Area under ROC Curve
measure=auc

#Linear kernel is always a reasonable first choice, advanced examples
#show how to learn non-linear models with other kernels.
kernel=LinearKernel

[Parameters]
regparam=1


[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#pairwise preferences between data points
train_preferences=./examples/data/rank_train.preferences
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (cgrank_test_with_preferences)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import auc

kwargs = {}
kwargs['train_preferences'] = reader.read_preferences('./examples/data/rank_train.preferences')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['regparam'] = '1'
kwargs['kernel'] = 'LinearKernel'
kwargs['learner'] = 'CGRankRLS'
kwargs['measure'] = auc
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = auc(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, auc.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Reduced set approximation

Once training data set size exceeds several thousand examples, training the learning methods with (non-linear) kernels becomes infeasible. For this case RLScore implements the reduced set approximation algorithm, where only a pre-specified subset of training examples are used to represent the dual solution learned.

To use the reduced set approximation, one should supply the indices of those training examples which are used to represent the learned solution (so-called 'basis 'vectors') in a file. The file should contain one line, where the indices are separated with whitespaces.

The best way for selecting the basis vectors is an open research question, uniform random subsampling of training set indices provides usually decent results.

While cross-validation can be performed with the reduced set approximation, the results are only approximative. For small regularization parameter values pessimistic bias has been observed in the cross-validation estimates.

Config file (reduced_set)

#Binary classification (accuracy maximizing)
#
#In this example we utilize the gaussian kernel, and use the
#reduced set approximation with 10 basis vectors given in a file
#
#this examples chooses regularization parameter, trains the method,
#makes test predictions and calculates test performance all at once

[Modules]

#RLS regressor is used as the training algorithm
learner=RLS

regparam=1.0

#Accuracy measures the fraction of correct classifications made
measure=accuracy

kernel=GaussianKernel

[Parameters]
#search regularization parameter from grid 2^-10...2^10
reggrid=-10_10

#width parameter of the gaussian kernel
gamma=0.01

[Input]
#features of the training examples
train_features=./examples/data/class_train.features
#labels of the training examples
train_labels=./examples/data/class_train.labels
#features of the test examples
prediction_features=./examples/data/class_test.features
#true labels of the test examples
test_labels=./examples/data/class_test.labels
#the ten basis vectors, the file contains indices of ten training examples
basis_vectors=./examples/data/bvectors.indices

[Output]
#the learned model is written here
model=./examples/models/classacc.model
#the predicted labels are written here
predicted_labels=./examples/predictions/classacc.predictions

Python code (reduced_set)

from rlscore import core
from rlscore import reader
from rlscore import writer
from rlscore.measure import accuracy

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/class_train.labels')
prediction_features = reader.read_sparse('./examples/data/class_test.features')
kwargs['basis_vectors'] = reader.read_bvectors('./examples/data/bvectors.indices')
kwargs['train_features'] = reader.read_sparse('./examples/data/class_train.features')
test_labels = reader.read_dense('./examples/data/class_test.labels')
kwargs['reggrid'] = '-10_10'
kwargs['gamma'] = '0.01'
kwargs['kernel'] = 'GaussianKernel'
kwargs['learner'] = 'RLS'
kwargs['measure'] = accuracy
mselector = None
trainresults = core.trainModel(**kwargs)
model = trainresults['model']
print 'Making predictions on test data'
predicted_labels = model.predict(prediction_features)
performance = accuracy(test_labels, predicted_labels)
print 'Performance: %f %s' % (performance, accuracy.__name__)
writer.write_pickle('./examples/models/classacc.model', model)
writer.write_dense('./examples/predictions/classacc.predictions', predicted_labels)

Regression with Kronecker kernel

Regularized least-squares regression with Kronecker kernels is a method that takes advantage of the computational short-cuts for inverting so-called shifted Kronecker product systems. The current implementation only works with the library interface and with kernel matrices for training and prediction that are constructed in advance.

Python code (Kronecker RLS)

from rlscore.learner import KronRLS
from rlscore import reader
from rlscore import writer
from rlscore.measure import sqerror

kwargs = {}
kwargs['train_labels'] = reader.read_dense('./examples/data/Kron_train.labels')
kwargs['kmatrix1'] = reader.read_dense('./examples/data/Kron_train_1.kernelm')
kwargs['kmatrix2'] = reader.read_dense('./examples/data/Kron_train_2.kernelm')
K_test1 = reader.read_dense('./examples/data/Kron_test_1.kernelm')
K_test2 = reader.read_dense('./examples/data/Kron_test_2.kernelm')
test_labels = reader.read_dense('./examples/data/Kron_test.labels')
kwargs['regparam'] = 0.001
learner = KronRLS.createLearner(**kwargs)
learner.train()
kronmodel = learner.getModel()
kronpred = kronmodel.predictWithKernelMatrices(K_test1, K_test2)
print sqerror(test_labels, kronpred)

History

Version 0.5 (2012.06.19)

  • CGRLS and CGRankRLS learners for conjugate gradient -based training of RLS/RankRLS on large and high-dimensional, but sparse data.
  • CGRankRLS supports learning from pairwise preferences between data points in addition to learning from utility values.
  • Library interface for Python. Code examples for almost all included learning algorithms.
  • Support for learning with Kronecker kernels.
  • Numerous internal changes in the software.

Version 0.4 (2010.04.14)

  • A linear time greedy forward feature selection with leave-one-out criterion for RLS (greedy RLS) included.
  • Example data and configurations for basic use cases included in the distribution.
  • Fixed a bug causing problems when reading/writing binary files in Windows.
  • Modifications to the configuration file format.
  • All command line interfaces other than rls_core.py removed.

Version 0.3 (2009.12.03)

  • Major restructuring of the code to make the software more modular.
  • Configuration files introduced for more flexible use of software.
  • Evolutionary maximum-margin clustering included.
  • Model file format changed.

Version 0.2.1 (2009.06.24)

  • Fixed a bug causing one of the features to get ignored.

Version 0.2 (2009.03.13)

  • Major overhaul of the file formats.
  • RLScore now supports learning multiple tasks simultaneously.
  • Reduced set approximation included for large scale learning.

Version 0.1.1 (2009.01.11)

  • Fixed a bug causing a memory leak after training with sparse data and linear kernel.

Version 0.1 (2008.10.18)

  • First public release.

Credits

Former Contributors:
 Evgeni Tsivtsivadze - participated in designing the version 0.1 and co-authored some of the articles in which the implemented methods were proposed.

References

Papers about algorithms implemented in RLScore

[1] Tapio Pahikkala, Sebastian Okser, Antti Airola, Tapio Salakoski, and Tero Aittokallio. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms for Molecular Biology, 7(1):11, 2012. [ bib | DOI | http ]
[2] Antti Airola, Tapio Pahikkala, Willem Waegeman, Bernard De Baets, and Tapio Salakoski. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55(4):1828-1844, April 2011. [ bib | DOI | http ]
[3] Tapio Pahikkala, Antti Airola, and Tapio Salakoski. Speeding up greedy forward selection for regularized least-squares. In Sorin Draghici, Taghi M. Khoshgoftaar, Vasile Palade, Witold Pedrycz, M. Arif Wani, and Xingquan Zhu, editors, Proceedings of The Ninth International Conference on Machine Learning and Applications (ICMLA'10), pages 325-330. IEEE Computer Society, December 2010. [ bib | DOI | .pdf ]
[4] Tapio Pahikkala, Willem Waegeman, Antti Airola, Tapio Salakoski, and Bernard De Baets. Conditional ranking on relational data. In José L. Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag, editors, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD'10), Part II, volume 6322 of Lecture Notes in Computer Science, pages 499-514. Springer, 2010. [ bib | DOI | http ]
[5] Antti Airola, Tapio Pahikkala, and Tapio Salakoski. Large scale training methods for linear RankRLS. In Eyke Hüllermeier and Johannes Fürnkranz, editors, Proceedings of the ECML/PKDD-Workshop on Preference Learning (PL-10), 2010. [ bib ]
[6] Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast evolutionary maximum margin clustering. In Léon Bottou and Michael Littman, editors, ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, volume 382 of ACM International Conference Proceeding Series, pages 361-368, New York, NY, USA, June 2009. ACM. [ bib | DOI ]
[7] Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jouni Järvinen, and Jorma Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129-165, 2009. [ bib | DOI | .pdf ]
[8] Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Timo Honkela, Matti Pöllä, Mari-Sanna Paukkeri, and Olli Simula, editors, Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'08), pages 1-8. Helsinki University of Technology, 2008. [ bib | .pdf ]
[9] Tapio Pahikkala, Antti Airola, Hanna Suominen, Jorma Boberg, and Tapio Salakoski. Efficient AUC maximization with regularized least-squares. In Anders Holst, Per Kreuger, and Peter Funk, editors, Proceedings of the 10th Scandinavian Conference on Artificial Intelligence (SCAI 2008), volume 173 of Frontiers in Artificial Intelligence and Applications, pages 12-19. IOS Press, Amsterdam, Netherlands, 2008. [ bib | .pdf ]
[10] Evgeni Tsivtsivadze, Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. A sparse regularized least-squares preference learning algorithm. In Anders Holst, Per Kreuger, and Peter Funk, editors, Proceedings of the 10th Scandinavian Conference on Artificial Intelligence (SCAI 2008), volume 173, pages 76-83. IOS Press, 2008. [ bib | .pdf ]
[11] Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jorma Boberg, and Tapio Salakoski. Learning to rank with pairwise regularized least-squares. In Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai, editors, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 27-33, 2007. [ bib | .pdf ]
[12] Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. Fast n-fold cross-validation for regularized least-squares. In Timo Honkela, Tapani Raiko, Jukka Kortela, and Harri Valpola, editors, Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence, pages 83-90, Espoo, Finland, 2006. Otamedia Oy. [ bib | .pdf ]