Learning to rank
================


In this tutorial, we show how to train the ranking regularized least-squares (RankRLS)
method for learning to rank [1]_ [2]_. We will use three variants of the method, depending
on whether the data consists of (instance, utility score) pairs similar to
regression, query-structured data, or pairwise preferences. In our experience
for the first case
competitive results can often be achieved also by simply using RLS regression,
whereas for the latter two use cases RankRLS should be used. All of these learners
support using nonlinear kernels.

RankRLS minimizes the magnitude preserving ranking error ((y_i-y_j) - (f(x_i) - f(x_j)))^2.
We will also make use of the concordance index (a.k.a pairwise ranking accuracy), that computes
the relative fraction of correctly ordered pairs (s.t. y_i > y_j and f(x_i) > f(x_j) with tied
predictions broken randomly). For concordance index, trivial baselines such as random predictor
or mean or majority voter yield 0.5 performance. For bipartite ranking tasks where there are only two possible output values, the
concordance index is equivalent to area under ROC curve (AUC), a popular measure in binary
classification.

Tutorial 1: Ordinal regression
******************************

First, let us assume an ordinal regression type of setting, where similar to
regression each instance is associated with a score. However, now the aim is
to learn to predict the ordering of instances correctly, rather than the
scores exactly. We use the GlobalRankRLS implementation of the RankRLS.
Global in the name refers to the fact that there exists a single global
ranking over the data, rather than having many separate rankings such
as with query structured data considered later.

The leave-pair-out cross-validation approach consists of leaving in turn
each pair of training instances out as holdout data, and computing the
fraction of cases where f(x_i) > f(x_j), assuming y_i > y_j. This is
implemented using a fast algorithm described in [3]_ [2]_.

Data set
--------

Again, we consider the classical
`Boston Housing data set <https://archive.ics.uci.edu/ml/datasets/Housing>`_
from the UCI machine learning repository. The data consists of 506 instances,
13 features and 1 output to be predicted.

The data can be loaded from disk and split into a training set of 250, and test
set of 256 instances using the following code.

.. literalinclude:: src/housing_data.py

.. literalinclude:: src/housing_data.out

Linear ranking model with default parameters
--------------------------------------------

First, we train RankRLS with default parameters (linear kernel, regparam=1.0)
and compute the concordance for the test set.

.. literalinclude:: src/ranking1.py

The resulting output is as follows.
 
.. literalinclude:: src/ranking1.out

Clearly the model works much better than the trivial baseline or 0.5.


Leave-pair-out cross-validation
-------------------------------

Next, we use the fast leave-pair-out cross-validation algorithm for performance
estimation and regularization parameter selection. The LeavePairOutRankRLS module
is a high level interface to this functionality.

.. literalinclude:: src/ranking2.py

The resulting output is as follows.
 
.. literalinclude:: src/ranking2.out

We notice two things. First, the leave-pair-out estimates are very close to 
the cindex computed on the test set. Second, on this data set the ranking
accuracy does not seem to be much affected by the choice of regularization
parameter.

K-fold cross-validation
-----------------------

Even with the computational shortcuts, leave-pair-out cross-validation
becomes impractical and unnecessary once training set size grows beyond
couple of hundreds of instances. Instead, fast holdout method may be
used to compute k-fold cross-validation estimates for RankRLS as follows.

.. literalinclude:: src/ranking3.py

The resulting output is as follows.
 
.. literalinclude:: src/ranking3.out

Again, we may also use higher level wrapper code, that can be also used
to select regularization parameter.


.. literalinclude:: src/ranking4.py

The resulting output is as follows.
 
.. literalinclude:: src/ranking4.out

Kernel parameter selection
--------------------------

Finally, we consider how to select together regularization parameter and
kernel parameters using k-fold cross-validation (alternatively, leave-pair-out
could also be used here).

.. literalinclude:: src/ranking5.py

The resulting output is as follows.
 
.. literalinclude:: src/ranking5.out

The results are quite similar as before, the Gaussian kernel does not
allow us to outperform the linear one.

Tutorial 2: Query-structured data
*********************************

Next we consider the setting, where instead of having a singe global ranking, the data
is partitioned into subsets ("queries"). Each instance corresponds to a query-object pair,
and the pairs corresponding to the same query have a ranking defined between them.
At test time the model needs to predict rankings for new queries. The terminology comes from the
the context of learning to rank for information retrieval, where the task is given a user
query to rank documents, according to how well they match the query. Terms such as listwise
ranking, conditional ranking and dyadic ranking have also been used, and the setting is
similar to that of label ranking.

We consider an application from the field of natural language processing known as
parse ranking. Syntactic parsing refers to the process of analyzing natural language
text according to some formal grammar. Due to ambiguity of natural language
("I shot an elephant in my pyjamas." - just who was in the pyjamas?), a sentence
can have multiple grammatically correct parses. In parse ranking, an automated parser
generates a set of candidate parsers, which need to be scored according to how
well they match the true (human made) parse of the sentence.

This can be modeled as a ranking problem, where the data consists of inputs
representing sentence-parse pairs, and outputs that are scores describing the 'goodness'
of the parse for the sentence. Each sentence corresponds to a query, and the parses
of that sentence are objects that should be ranked.

QueryRankRLS minimizes the magnitude preserving ranking error within each training
query, but does not compare instances associated with different queries.

The data set
------------

The parse ranking data set can be downloaded from 
`here <http://staff.cs.utu.fi/~aatapa/data/parse.zip>`_.


First, we load the training set in and examine its properties

.. literalinclude:: src/parse_data.py
 
.. literalinclude:: src/parse_data.out

As is common in natural language applications the data is very high dimensional. In
addition to the data we load a list of sentence ids, denoting to which sentence each
instance is associated with. Finally, based on the ids we map the data to fold indices with teh map\_ids function, where
each fold contains the indices of all training instances associated with a given sentence.
Altogether there are 117 folds each corresponding to a sentence.

Learning a ranking function
---------------------------

First, we learn a ranking function. The main difference to learning global rankings is
how the queries are handled. When computing the cindex for test set, we compute the
cindex for each test query separately, and finally take the mean.

.. literalinclude:: src/parse_ranking1.py
 
.. literalinclude:: src/parse_ranking1.out

Leave-query-out cross-validation
--------------------------------

Next, we estimate the ranking accuracy on training set using leave-query-out cross-validation,
where each query is in turn left out of the training set as holdout set. This is implemented
using the efficient algorithm described in [2]_.

.. literalinclude:: src/parse_ranking2.py
 
.. literalinclude:: src/parse_ranking2.out

Leave-query-out and model selection
-----------------------------------

Finally, we use a higher level wrapper for the leave-query-out, and also make use of automated
model selection at the same time.

.. literalinclude:: src/parse_ranking3.py
 
.. literalinclude:: src/parse_ranking3.out

Tutorial 3: Learning from pairwise preferences
**********************************************

RankRLS also supports training data given as pairwise preferences of the type A > B meaning that
instance A is preferred to instance B. In the next example, we generate a set of such pairwise
preferences sampled from the Housing data training set, train a model and test to see how well
the model predicts on independent test data.

.. literalinclude:: src/pairwise_ranking.py
 
.. literalinclude:: src/pairwise_ranking.out

With a sample of 1000 training pairs, the model works as well as the one trained on the y-scores
directly. 

Precomputed kernels, reduced set approximation
**********************************************

See the regression tutorial for examples. RankRLS also supports precomputed kernel matrices
supplied together with the kernel="PrecomputedKernel" argument, as well as reduced set
approximation for large data sets.


References
**********
    
.. [1] Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jorma Boberg and Tapio Salakoski Learning to rank with pairwise regularized least-squares. In Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai, editors, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 27--33, 2007.
    
.. [2] Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jouni Jarvinen, and Jorma Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129-165, 2009.

.. [3] Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'08), pages 1-8, Espoo, Finland, 2008.