Learning to rank

In this tutorial, we show how to train the ranking regularized least-squares (RankRLS) method for learning to rank [1] [2]. We will use three variants of the method, depending on whether the data consists of (instance, utility score) pairs similar to regression, query-structured data, or pairwise preferences. In our experience for the first case competitive results can often be achieved also by simply using RLS regression, whereas for the latter two use cases RankRLS should be used. All of these learners support using nonlinear kernels.

RankRLS minimizes the magnitude preserving ranking error ((y_i-y_j) - (f(x_i) - f(x_j)))^2. We will also make use of the concordance index (a.k.a pairwise ranking accuracy), that computes the relative fraction of correctly ordered pairs (s.t. y_i > y_j and f(x_i) > f(x_j) with tied predictions broken randomly). For concordance index, trivial baselines such as random predictor or mean or majority voter yield 0.5 performance. For bipartite ranking tasks where there are only two possible output values, the concordance index is equivalent to area under ROC curve (AUC), a popular measure in binary classification.

Tutorial 1: Ordinal regression

First, let us assume an ordinal regression type of setting, where similar to regression each instance is associated with a score. However, now the aim is to learn to predict the ordering of instances correctly, rather than the scores exactly. We use the GlobalRankRLS implementation of the RankRLS. Global in the name refers to the fact that there exists a single global ranking over the data, rather than having many separate rankings such as with query structured data considered later.

The leave-pair-out cross-validation approach consists of leaving in turn each pair of training instances out as holdout data, and computing the fraction of cases where f(x_i) > f(x_j), assuming y_i > y_j. This is implemented using a fast algorithm described in [3] [2].

Data set

Again, we consider the classical Boston Housing data set from the UCI machine learning repository. The data consists of 506 instances, 13 features and 1 output to be predicted.

The data can be loaded from disk and split into a training set of 250, and test set of 256 instances using the following code.

import numpy as np

def load_housing():
    np.random.seed(1)
    D = np.loadtxt("housing.data")
    np.random.shuffle(D)
    X = D[:,:-1]
    Y = D[:,-1]
    X_train = X[:250]
    Y_train = Y[:250]
    X_test = X[250:]
    Y_test = Y[250:]
    return X_train, Y_train, X_test, Y_test

def print_stats():
    X_train, Y_train, X_test, Y_test = load_housing()
    print("Housing data set characteristics")
    print("Training set: %d instances, %d features" %X_train.shape)
    print("Test set: %d instances, %d features" %X_test.shape)

if __name__ == "__main__":
    print_stats()


Housing data set characteristics
Training set: 250 instances, 13 features
Test set: 256 instances, 13 features

Linear ranking model with default parameters

First, we train RankRLS with default parameters (linear kernel, regparam=1.0) and compute the concordance for the test set.

from rlscore.learner import GlobalRankRLS
from rlscore.measure import cindex

from housing_data import load_housing


def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = GlobalRankRLS(X_train, Y_train)
    #Test set predictions
    P_test = learner.predict(X_test)
    print("test cindex %f" %cindex(Y_test, P_test))

if __name__=="__main__":
    train_rls()

The resulting output is as follows.

test cindex 0.857257

Clearly the model works much better than the trivial baseline or 0.5.

Leave-pair-out cross-validation

Next, we use the fast leave-pair-out cross-validation algorithm for performance estimation and regularization parameter selection. The LeavePairOutRankRLS module is a high level interface to this functionality.

from rlscore.learner import LeavePairOutRankRLS
from rlscore.measure import cindex

from housing_data import load_housing


def train_rls():
    #Trains RankRLS with automatically selected regularization parameter
    X_train, Y_train, X_test, Y_test = load_housing()
    regparams = [2.**i for i in range(-10, 10)]
    learner = LeavePairOutRankRLS(X_train, Y_train, regparams = regparams)
    loo_errors = learner.cv_performances
    P_test = learner.predict(X_test)
    print("leave-pair-out performances " +str(loo_errors))
    print("chosen regparam %f" %learner.regparam)
    print("test set cindex %f" %cindex(Y_test, P_test))


if __name__=="__main__":
    train_rls()

The resulting output is as follows.

leave-pair-out performances [ 0.85697212  0.85697212  0.85697212  0.85697212  0.85697212  0.85697212
  0.85697212  0.85697212  0.85697212  0.85700443  0.85703673  0.85687522
  0.85690752  0.85687522  0.85732743  0.85765044  0.85781194  0.85739203
  0.85723053  0.85642301]
chosen regparam 64.000000
test set cindex 0.857134

We notice two things. First, the leave-pair-out estimates are very close to the cindex computed on the test set. Second, on this data set the ranking accuracy does not seem to be much affected by the choice of regularization parameter.

K-fold cross-validation

Even with the computational shortcuts, leave-pair-out cross-validation becomes impractical and unnecessary once training set size grows beyond couple of hundreds of instances. Instead, fast holdout method may be used to compute k-fold cross-validation estimates for RankRLS as follows.

import numpy as np
from rlscore.learner import GlobalRankRLS
from rlscore.measure import cindex
from rlscore.utilities.cross_validation import random_folds

from housing_data import load_housing


def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    #generate fold partition, arguments: train_size, k, random_seed
    folds = random_folds(len(Y_train), 5, 10)
    learner = GlobalRankRLS(X_train, Y_train)
    perfs = []
    for fold in folds:
        P = learner.holdout(fold)
        c = cindex(Y_train[fold], P)
        perfs.append(c)
    perf = np.mean(perfs)
    print("5-fold cross-validation cindex %f" %perf)
    P_test = learner.predict(X_test)
    print("test cindex %f" %cindex(Y_test, P_test))

if __name__=="__main__":
    train_rls()

The resulting output is as follows.

5-fold cross-validation cindex 0.857076
test cindex 0.857257

Again, we may also use higher level wrapper code, that can be also used to select regularization parameter.

from rlscore.learner import KfoldRankRLS
from rlscore.measure import cindex
from rlscore.utilities.cross_validation import random_folds

from housing_data import load_housing


def train_rls():
    #Trains RankRLS with automatically selected regularization parameter
    X_train, Y_train, X_test, Y_test = load_housing()
    #generate fold partition, arguments: train_size, k, random_seed
    folds = random_folds(len(Y_train), 5, 10)
    regparams = [2.**i for i in range(-10, 10)]
    learner = KfoldRankRLS(X_train, Y_train, folds = folds, regparams = regparams, measure=cindex)
    kfold_perfs = learner.cv_performances
    P_test = learner.predict(X_test)
    print("kfold performances " +str(kfold_perfs))
    print("chosen regparam %f" %learner.regparam)
    print("test set cindex %f" %cindex(Y_test, P_test))

if __name__=="__main__":
    train_rls()

The resulting output is as follows.

kfold performances [ 0.85756786  0.85756786  0.85756786  0.85756786  0.85756786  0.85756786
  0.85756786  0.85756786  0.85740406  0.85740406  0.85707578  0.85740392
  0.85789748  0.85756974  0.85707578  0.8580598   0.85789641  0.85724066
  0.85740393  0.85838699]
chosen regparam 512.000000
test set cindex 0.855995

Kernel parameter selection

Finally, we consider how to select together regularization parameter and kernel parameters using k-fold cross-validation (alternatively, leave-pair-out could also be used here).

import numpy as np
from rlscore.learner import KfoldRankRLS
from rlscore.measure import cindex
from rlscore.utilities.cross_validation import random_folds

from housing_data import load_housing


def train_rls():
    #Selects both the gamma parameter for Gaussian kernel, and regparam with kfoldcv
    X_train, Y_train, X_test, Y_test = load_housing()
    folds = random_folds(len(Y_train), 5, 10)
    regparams = [2.**i for i in range(-15, 16)]
    gammas = regparams
    best_regparam = None
    best_gamma = None
    best_perf = 0.
    best_learner = None
    for gamma in gammas:
        #New RLS is initialized for each kernel parameter
        learner = KfoldRankRLS(X_train, Y_train, kernel = "GaussianKernel", folds = folds, gamma = gamma, regparams = regparams, measure=cindex)
        perf = np.max(learner.cv_performances)
        if perf > best_perf:
            best_perf = perf
            best_regparam = learner.regparam
            best_gamma = gamma
            best_learner = learner
    P_test = best_learner.predict(X_test)
    print("best parameters gamma %f regparam %f" %(best_gamma, best_regparam))
    print("best kfoldcv cindex %f" %best_perf)
    print("test cindex %f" %cindex(Y_test, P_test))

if __name__=="__main__":
    train_rls()

The resulting output is as follows.

best parameters gamma 0.000031 regparam 0.062500
best kfoldcv cindex 0.854094
test cindex 0.869137

The results are quite similar as before, the Gaussian kernel does not allow us to outperform the linear one.

Tutorial 2: Query-structured data

Next we consider the setting, where instead of having a singe global ranking, the data is partitioned into subsets (“queries”). Each instance corresponds to a query-object pair, and the pairs corresponding to the same query have a ranking defined between them. At test time the model needs to predict rankings for new queries. The terminology comes from the the context of learning to rank for information retrieval, where the task is given a user query to rank documents, according to how well they match the query. Terms such as listwise ranking, conditional ranking and dyadic ranking have also been used, and the setting is similar to that of label ranking.

We consider an application from the field of natural language processing known as parse ranking. Syntactic parsing refers to the process of analyzing natural language text according to some formal grammar. Due to ambiguity of natural language (“I shot an elephant in my pyjamas.” - just who was in the pyjamas?), a sentence can have multiple grammatically correct parses. In parse ranking, an automated parser generates a set of candidate parsers, which need to be scored according to how well they match the true (human made) parse of the sentence.

This can be modeled as a ranking problem, where the data consists of inputs representing sentence-parse pairs, and outputs that are scores describing the ‘goodness’ of the parse for the sentence. Each sentence corresponds to a query, and the parses of that sentence are objects that should be ranked.

QueryRankRLS minimizes the magnitude preserving ranking error within each training query, but does not compare instances associated with different queries.

The data set

The parse ranking data set can be downloaded from here.

First, we load the training set in and examine its properties

import numpy as np
from rlscore.utilities.reader import read_sparse
from rlscore.utilities.cross_validation import map_ids

def print_stats():
    X_train =  read_sparse("train_2000_x.txt")
    Y_train =  np.loadtxt("train_2000_y.txt")
    ids =  np.loadtxt("train_2000_qids.txt", dtype=int)
    folds = map_ids(ids)
    print("Parse data set characteristics")
    print("Training set: %d instances, %d features" %X_train.shape)
    print("Instances grouped into %d sentences" %len(folds))
    

if __name__=="__main__":
    print_stats()
Parse data set characteristics
Training set: 2000 instances, 195100 features
Instances grouped into 117 sentences

As is common in natural language applications the data is very high dimensional. In addition to the data we load a list of sentence ids, denoting to which sentence each instance is associated with. Finally, based on the ids we map the data to fold indices with teh map_ids function, where each fold contains the indices of all training instances associated with a given sentence. Altogether there are 117 folds each corresponding to a sentence.

Learning a ranking function

First, we learn a ranking function. The main difference to learning global rankings is how the queries are handled. When computing the cindex for test set, we compute the cindex for each test query separately, and finally take the mean.

import numpy as np
from rlscore.learner import QueryRankRLS
from rlscore.measure import cindex
from rlscore.utilities.reader import read_sparse
from rlscore.utilities.cross_validation import map_ids

def train_rls():
    #Select regparam with k-fold cross-validation,
    #where instances related to a single sentence form
    #together a fold
    X_train =  read_sparse("train_2000_x.txt")
    Y_train =  np.loadtxt("train_2000_y.txt")
    X_test =  read_sparse("test_2000_x.txt", X_train.shape[1])
    Y_test =  np.loadtxt("test_2000_y.txt")
    #list of sentence ids
    qids_train =  np.loadtxt("train_2000_qids.txt")
    qids_test = np.loadtxt("test_2000_qids.txt")
    learner = QueryRankRLS(X_train, Y_train, qids_train)
    P_test = learner.predict(X_test)
    perfs = []
    partition = map_ids(qids_test)
    #compute the ranking accuracy separately for each test query
    for query in partition:
        #skip such queries, where all instances have the same
        #score, since in this case cindex is undefined
        if np.var(Y_test[query]) != 0:
            perf = cindex(Y_test[query], P_test[query])
            perfs.append(perf)
    test_perf = np.mean(perfs)
    print("test cindex %f" %test_perf)


if __name__=="__main__":
    train_rls()
test cindex 0.730632

Leave-query-out cross-validation

Next, we estimate the ranking accuracy on training set using leave-query-out cross-validation, where each query is in turn left out of the training set as holdout set. This is implemented using the efficient algorithm described in [2].

import numpy as np
from rlscore.learner import QueryRankRLS
from rlscore.measure import cindex
from rlscore.utilities.reader import read_sparse
from rlscore.utilities.cross_validation import map_ids

def train_rls():
    #Select regparam with k-fold cross-validation,
    #where instances related to a single sentence form
    #together a fold
    X_train =  read_sparse("train_2000_x.txt")
    Y_train =  np.loadtxt("train_2000_y.txt")
    X_test =  read_sparse("test_2000_x.txt", X_train.shape[1])
    Y_test =  np.loadtxt("test_2000_y.txt")
    #list of sentence ids
    qids_train =  np.loadtxt("train_2000_qids.txt")
    qids_test = np.loadtxt("test_2000_qids.txt")
    learner = QueryRankRLS(X_train, Y_train, qids_train)
    P_test = learner.predict(X_test)
    folds = map_ids(qids_train)
    perfs = []
    for fold in folds:
        if np.var(Y_train[fold]) != 0:
            P = learner.holdout(fold)
            c = cindex(Y_train[fold], P)
            perfs.append(c)
    perf = np.mean(perfs)
    print("leave-query-out cross-validation cindex %f" %perf)
    partition = map_ids(qids_test)
    test_perfs = []
    #compute the ranking accuracy separately for each test query
    for query in partition:
        #skip such queries, where all instances have the same
        #score, since in this case cindex is undefined
        if np.var(Y_test[query]) != 0:
            perf = cindex(Y_test[query], P_test[query])
            test_perfs.append(perf)
    test_perf = np.mean(test_perfs)
    print("test cindex %f" %test_perf)


if __name__=="__main__":
    train_rls()
leave-query-out cross-validation cindex 0.758013
test cindex 0.730632

Leave-query-out and model selection

Finally, we use a higher level wrapper for the leave-query-out, and also make use of automated model selection at the same time.

import numpy as np
from rlscore.learner import LeaveQueryOutRankRLS
from rlscore.measure import cindex
from rlscore.utilities.reader import read_sparse
from rlscore.utilities.cross_validation import map_ids

def train_rls():
    #Select regparam with k-fold cross-validation,
    #where instances related to a single sentence form
    #together a fold
    X_train =  read_sparse("train_2000_x.txt")
    Y_train =  np.loadtxt("train_2000_y.txt")
    X_test =  read_sparse("test_2000_x.txt", X_train.shape[1])
    Y_test =  np.loadtxt("test_2000_y.txt")
    #list of sentence ids
    qids_train =  np.loadtxt("train_2000_qids.txt")
    qids_test = np.loadtxt("test_2000_qids.txt")
    regparams = [2.**i for i in range(-10, 10)]
    learner = LeaveQueryOutRankRLS(X_train, Y_train, qids_train, regparams = regparams, measure = cindex)
    lqo_perfs = learner.cv_performances
    P_test = learner.predict(X_test)
    print("leave-query-out performances " +str(lqo_perfs))
    print("chosen regparam %f" %learner.regparam)
    partition = map_ids(qids_test)
    #compute the ranking accuracy separately for each test query
    test_perfs = []
    for query in partition:
        #skip such queries, where all instances have the same
        #score, since in this case cindex is undefined
        if np.var(Y_test[query]) != 0:
            perf = cindex(Y_test[query], P_test[query])
            test_perfs.append(perf)
    test_perf = np.mean(test_perfs)
    print("test cindex %f" %test_perf)


if __name__=="__main__":
    train_rls()
leave-query-out performances [ 0.74507212  0.74531795  0.74594602  0.74685541  0.74635137  0.74748445
  0.75077485  0.75197701  0.75505724  0.75813978  0.75801346  0.75832107
  0.75813583  0.76626898  0.76610342  0.76648227  0.7598727   0.7566041
  0.76145765  0.75382555]
chosen regparam 32.000000
test cindex 0.726531

Tutorial 3: Learning from pairwise preferences

RankRLS also supports training data given as pairwise preferences of the type A > B meaning that instance A is preferred to instance B. In the next example, we generate a set of such pairwise preferences sampled from the Housing data training set, train a model and test to see how well the model predicts on independent test data.

from rlscore.learner import PPRankRLS
from rlscore.measure import cindex

from housing_data import load_housing
import random
random.seed(33)

def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    pairs_start = []
    pairs_end = []
    #Sample 1000 pairwise preferences from the data 
    trange = range(len(Y_train))
    while len(pairs_start) < 1000:
        ind0 = random.choice(trange)
        ind1 = random.choice(trange)
        if Y_train[ind0] > Y_train[ind1]:
            pairs_start.append(ind0)
            pairs_end.append(ind1)
        elif Y_train[ind0] < Y_train[ind1]:
            pairs_start.append(ind1)
            pairs_end.append(ind0)
    learner = PPRankRLS(X_train, pairs_start, pairs_end)
    #Test set predictions
    P_test = learner.predict(X_test)
    print("test cindex %f" %cindex(Y_test, P_test))

if __name__=="__main__":
    train_rls()
test cindex 0.861997

With a sample of 1000 training pairs, the model works as well as the one trained on the y-scores directly.

Precomputed kernels, reduced set approximation

See the regression tutorial for examples. RankRLS also supports precomputed kernel matrices supplied together with the kernel=”PrecomputedKernel” argument, as well as reduced set approximation for large data sets.

References

[1]Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jorma Boberg and Tapio Salakoski Learning to rank with pairwise regularized least-squares. In Thorsten Joachims, Hang Li, Tie-Yan Liu, and ChengXiang Zhai, editors, SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 27–33, 2007.
[2](1, 2, 3) Tapio Pahikkala, Evgeni Tsivtsivadze, Antti Airola, Jouni Jarvinen, and Jorma Boberg. An efficient algorithm for learning to rank from preference graphs. Machine Learning, 75(1):129-165, 2009.
[3]Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR‘08), pages 1-8, Espoo, Finland, 2008.