Performance measures
====================

RLScore implement a variety of performance measures for classification,
regression and ranking.

Let Y and P contain the
true outputs and predicted outputs for some problem. For single-target
learning problems both are one-dimensional lists or arrays of size [n_samples].
For multi-target problems, both are two-dimensional lists or arrays of
size [n_samples, n_targets].

A performance measure is a function measure(Y,P), that returns a floating
point value denoting how well P matches Y. If Y and P have several columns,
typically the performance measure is computed for each column separately and
then averaged. A performance measure has a property iserror, that is used by
the grid search codes to check whether large or small values are better. An
UndefinedPerformance error may be raised, if for some reason the performance
measure is not well defined for the given input.


Tutorial 1: Basic usage
***********************

First, let us consider some basic binary classification measures. These
performance measures assume that Y-values (true class labels) are from
set {-1,1}. P-values (predicted class labels) can be any real values, but
the are mapped with the rule P[i]>0 -> 1 and P[i]<=0 -> -1, before
computing the performance.

This is how one can compute simple binary classification accuracy.

.. literalinclude:: src/measure1.py

.. literalinclude:: src/measure1.out

Four out of five instances are correctly classified, so classification accuracy is 0.8.
Giving as input Y-values outside {-1, 1} causes an exception to be raised.

Next, we compute the area under ROC curve.

.. literalinclude:: src/measure2.py

.. literalinclude:: src/measure2.out

Everything works as one would expect, until we pass Y full of ones to auc. UndefinedPerformance
is raised, because AUC is not defined for problems, where only one class is present in the true
class labels.

Finally, we test cindex, a pairwise ranking measure that computes how many of the pairs where
Y[i] > Y[j] also have P[i] > P[j]. The measure is a generalization of the AUC.

.. literalinclude:: src/measure3.py

.. literalinclude:: src/measure3.out

We also observe, that when given Y and P with multiple columns, the performance measure is computed
separately for each column, and then averaged. This is what happens when using some performance
measure for parameter selection in cross-validation with multi-output prediction problems. The chosen
parameter is the one that leads to best mean performance over all the targets.

Tutorial 2: Multi-class accuracy
********************************

RLScore contains some tools for converting multi-class learning problems to several independent
binary classification problems, and for converting vector valued multi-target predictions back to
multi-class predictions.

.. literalinclude:: src/measure4.py

.. literalinclude:: src/measure4.out

When doing multi-class learning, one should use the ova_accuracy function for parameter selection and computing
the final performance.