Prediction

The RLScore learners make predictions via the learner.predict( ) -function. Usually the user will not have to worry about the internal details of how this is implemented. In this tutorial we will look under the hood of RLScore to see how the predictors are implemented.

Tutorial 1: Linear predictor

Data set

We consider the classical Boston Housing data set from the UCI machine learning repository. The data consists of 506 instances, 13 features and 1 output to be predicted.

The data can be loaded from disk and split into a training set of 250, and test set of 256 instances using the following code.

import numpy as np

def load_housing():
    np.random.seed(1)
    D = np.loadtxt("housing.data")
    np.random.shuffle(D)
    X = D[:,:-1]
    Y = D[:,-1]
    X_train = X[:250]
    Y_train = Y[:250]
    X_test = X[250:]
    Y_test = Y[250:]
    return X_train, Y_train, X_test, Y_test

def print_stats():
    X_train, Y_train, X_test, Y_test = load_housing()
    print("Housing data set characteristics")
    print("Training set: %d instances, %d features" %X_train.shape)
    print("Test set: %d instances, %d features" %X_test.shape)

if __name__ == "__main__":
    print_stats()


Housing data set characteristics
Training set: 250 instances, 13 features
Test set: 256 instances, 13 features

The linear predictor is of the form w.T*x + b. The predictor takes as input the n_samples x features - data matrix, and computes predictions.

The predictor is stored in the “predictor” -variable of the trained learner. The “w” and “b” coefficients store the coefficients of the learner model.

from rlscore.learner import RLS

from housing_data import load_housing


def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = RLS(X_train, Y_train)
    #This is how we make predictions
    P_test = learner.predict(X_test)
    #We can separate the predictor from learner
    predictor = learner.predictor
    #And do the same predictions
    P_test = predictor.predict(X_test)
    #Let's get the coefficients of the predictor
    w = predictor.W
    b = predictor.b
    print("number of coefficients %d" %len(w))
    print("w-coefficients " +str(w))
    print("bias term %f" %b)

if __name__=="__main__":
    train_rls()
number of coefficients 13
w-coefficients [-0.15710164  0.03234827  0.00937111  3.94567546 -2.65055668  5.44180178
 -0.01920908 -1.14675642  0.21950548 -0.01116277 -0.6555072   0.01360432
 -0.46781757]
bias term 10.682364

Here, we have taken the predictor apart. There are 13 w-coefficients each corresponding to one feature, and additionally a bias feature.

Now let us do feature selection using the greedy RLS method, and select 5 most useful features for the data set.

from rlscore.learner import GreedyRLS

from housing_data import load_housing


def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = GreedyRLS(X_train, Y_train, 5)
    #This is how we make predictions
    P_test = learner.predict(X_test)
    #We can separate the predictor from learner
    predictor = learner.predictor
    #And do the same predictions
    P_test = predictor.predict(X_test)
    #Let's get the coefficients of the predictor
    w = predictor.W
    b = predictor.b
    print("number of coefficients %d" %len(w))
    print("w-coefficients " +str(w))
    print("bias term %f" %b)

if __name__=="__main__":
    train_rls()
number of coefficients 13
w-coefficients [ 0.          0.          0.          0.          0.          5.48127826
  0.         -0.69774095  0.          0.         -0.84012061  0.01430101
 -0.5867683 ]
bias term 8.308351

Now, all but 5 of the w-coefficients are set to zero by the feature selection process.

If we would perform multi-target learning (i.e. Y has several columns), “W” would be [n_features, n_targets]- sized, and “b” of length n_targets.

Tutorial 2: Non-linear predictor

Non-linear predictors are of the form K_test * A, where the rows of K_test correspond to test instances, columns to training instances, and elements to kernel evaluations.

from rlscore.learner import RLS

from housing_data import load_housing


def train_rls():
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = RLS(X_train, Y_train, kernel="GaussianKernel", regparam=0.0003, gamma=0.00003)
    #This is how we make predictions
    P_test = learner.predict(X_test)
    #We can separate the predictor from learner
    predictor = learner.predictor
    #And do the same predictions
    P_test = predictor.predict(X_test)
    #Let's get the coefficients of the predictor
    A = predictor.A
    print("A-coefficients " +str(A))
    print("number of coefficients %d" %len(A))

if __name__=="__main__":
    train_rls()
A-coefficients [ -3.21848518e+03  -7.18325391e+03   1.68292997e+03   1.88635418e+03
  -1.50372867e+04   2.11766278e+03  -5.47728475e+03  -6.03145381e+03
   2.11777586e+03  -7.94765717e+03  -5.22478904e+03   4.24563161e+03
   5.98455530e+03   7.26288767e+02  -2.89504333e+03   8.16153777e+03
   2.50715715e+03   1.09771576e+04   5.97832154e+03   1.14081891e+04
  -5.19149505e+03  -1.68006796e+02  -7.70424578e+02  -4.19819316e+02
  -1.56432037e+03  -1.46401058e+04  -1.47514750e+02  -6.50687349e+02
   1.29133974e+04  -5.33236403e+03  -3.50568691e+02   1.04280207e+04
   4.77455649e+02  -1.31758226e+04   2.03105963e+03   5.74486314e+03
  -4.45479845e+03   1.50911614e+04  -9.65225108e+03   8.15386522e+03
  -1.82814924e+03  -1.25716629e+04   3.72501352e+03   8.43071122e+03
  -2.92310407e+03  -2.35741375e+03   9.72316441e+03  -1.63538885e+04
  -2.80020260e+02   1.50297091e+02   2.21755472e+04  -1.00940447e+04
   4.40720525e+03   6.74705023e+03   1.57143628e+02  -2.70105781e+02
  -1.08095754e+04  -9.07256068e+03   8.66260167e+02   1.42571785e+04
  -1.05396832e+04  -4.55407520e+03  -1.89895434e+04   3.96783729e+03
   2.26347527e+03  -9.34782740e+03  -8.33040507e+03  -1.16722904e+03
  -4.06921874e+03  -1.11025086e+04  -3.67606545e+03   1.10582543e+04
   4.94145417e+02  -3.69447349e+03  -1.18339408e+03  -1.81421574e+03
   3.80781736e+04  -8.11065496e+03  -2.61031541e+04   1.17120682e+04
  -7.90548592e+03  -6.70312860e+03   3.08047608e+03   5.57060820e+02
   4.96089774e+03  -2.91608141e+03  -4.96452341e+03  -1.83163235e+03
  -5.85196933e+03   8.57200871e+02   5.48548444e+03   4.05906956e+03
  -1.03997054e+03   3.32149429e+03   1.84037796e+03  -3.83738852e+03
   4.66957152e+02  -5.30059936e+03   3.28132605e+03  -2.84824884e+03
  -3.62889939e+02   4.71469241e+03   2.25481777e+03   4.22668727e+03
   6.42860403e+03  -4.08451775e+03   2.18682124e+03  -8.91089093e+03
  -1.53658553e+03  -5.04682609e+03  -1.56882697e+03  -5.96755083e+03
   7.16668950e+02   3.11510936e+02  -3.33709709e+03   5.43541284e+03
  -6.77942022e+03   2.10605968e+02   1.66298111e+03   7.79417782e+03
  -1.39530503e+02   6.99465154e+03  -3.50757979e+03  -1.33938502e+02
   5.10700279e+02  -4.02934286e+03   1.16451922e+03   1.13987374e+04
   1.68288777e+04   8.95623678e+02  -1.39569596e+03  -3.12021736e+03
   3.80562429e+03   2.48401788e+03   2.46098834e+03   4.34997296e+03
  -6.88494036e+03  -4.23278092e+02   1.04971143e+03  -7.32519733e+01
   2.30193178e+03   4.16231543e+03   6.80232682e+03  -9.60062002e+03
  -1.30777778e+04   6.23700531e+03  -1.56085185e+03   2.24310939e+04
  -4.99962325e+03   7.39406234e+02  -8.45503311e+03   2.39773753e+03
   6.59558093e+03  -1.17229203e+03   2.75558500e+03   1.84164765e+02
  -1.81711063e+04  -6.93611087e+03   4.28305749e+03  -1.22503166e+04
   1.82647220e+04  -3.54672042e+04  -7.15054507e+02  -1.67638527e+04
   1.28315969e+03  -1.18110110e+03  -2.17810499e+03   8.89420746e+03
   3.86094711e+03   1.52533324e+03  -7.64662467e+03   1.43940242e+04
   6.85966088e+02  -3.59038988e+03  -2.16623124e+03  -7.33610171e+02
   2.01618287e+03  -3.43962120e+03   2.05688862e+04   6.40283527e+00
  -6.91568621e+03  -5.28574938e+03   9.57276294e+03   3.05178380e+03
   8.48751907e+03   1.74009471e+01  -2.36965980e+03  -1.08677190e+03
   1.94974046e+02   2.69794586e+03  -3.87858824e+03   5.41986795e+03
   4.96552017e+02  -1.69013789e+04  -1.39530891e+04   5.76851547e+03
   1.54474439e+04  -4.58658613e+03   1.28805307e+04   5.63442166e+04
   7.13409923e+03   1.07210419e+04   2.37307098e+03   7.05856550e+02
   2.27313687e+03   2.68410344e+03  -6.85249880e+03  -3.62251976e+03
  -8.07693905e+03  -2.73202012e+03  -1.47581808e+03  -2.09220456e+03
  -2.27206732e+03  -1.98714655e+03   1.05686459e+03  -2.17582290e+03
   2.85612719e+02   2.70191491e+03  -5.73578783e+03  -3.30621877e+03
  -2.35265161e+03   8.39953931e+03  -2.19487391e+03   3.02291581e+03
   1.32266314e+03   4.97991783e+03  -8.66046811e+03  -6.45146380e+03
   2.59967511e+03   6.35966247e+03   1.53722352e+03   8.50561416e+02
  -5.20179707e+03  -5.78315087e+03  -1.19178606e+04  -1.60309428e+03
   5.09204057e+03   3.91003925e+03  -2.19660056e+02   3.53319812e+03
  -6.88274513e+03  -2.33740088e+03   7.13366621e+02   7.08262004e+03
  -8.83582256e+02   2.55027331e+03   2.27309011e+03   4.17114726e+03
  -4.73291765e+03  -1.11451224e+04]
number of coefficients 250

Now there are as many dual coefficients as training examples. If we would perform multi-target learning (i.e. Y has several columns), “A” would be [n_samples, n_targets]- sized.

The predictor automatically computes the kernel matrix between training and test examples (unless PrecomputedKernel option is used, in that case the caller must compute it).

Tutorial 3: Reduced set approximation

The reduced set approximation restricts the predictor of the form K_test * A, where the rows of K_test correspond to test instances, columns to basis vectors, and elements to kernel evaluations.

from rlscore.learner.rls import RLS

from housing_data import load_housing
import random
random.seed(55)

def train_rls():
    X_train, Y_train, X_test, Y_test = load_housing()
    #select randomly 20 basis vectors
    indices = range(X_train.shape[0])
    indices = random.sample(indices, 20)
    basis_vectors = X_train[indices]    
    learner = RLS(X_train, Y_train, basis_vectors = basis_vectors, kernel="GaussianKernel", regparam=0.0003, gamma=0.00003)
    #Test set predictions
    P_test = learner.predict(X_test)
    #We can separate the predictor from learner
    predictor = learner.predictor
    #And do the same predictions
    P_test = predictor.predict(X_test)
    #Let's get the coefficients of the predictor
    A = predictor.A
    print("A-coefficients " +str(A))
    print("number of coefficients %d" %len(A))

if __name__=="__main__":
    train_rls()
A-coefficients [  862.63230538   837.61114212    39.17191328   -15.99028084  -249.15000363
   476.52001256   -49.51264111 -5488.71521975    21.71807705   414.03875161
  4868.67259333 -1226.65851169   -10.0901432   -404.30643619  1261.68699787
  -605.84484937  -658.46617574  -462.07959936   122.59860061   405.20656904]
number of coefficients 20

Now the predictor needs to construct only 20 rows for the test kernel matrix. Again, with multi-target learning A would be of dimensions [n_bvectors, n_targets].

Tutorial 4: Pairwise predictors

Data set

The pairwise predictors take as input either two data matrices, or two kernel matrices, corresponding to two sets of instances. By default, the pairwise models compute the predictions for all pairwise combinations of the two sets of instances in column-major order [(X1[0], X2[0]), (X1[1], X2[0])…]. However, one can alternatively supply as argument to the predictor two lists of indices, first corresponding to the rows of X1 / K1, and the second corresponding to rows of X2 / K2. In this case, predictions are computed only for these pairs.

For these experiments, we need to download from the drug-target binding affinity data sets page for the Davis et al. data the drug-target interaction affinities (Y), drug-drug 2D similarities (X1), and WS target-target similarities (X2). In the following we will use similarity scores directly as features for the linear kernel, since the similarity matrices themselves are not valid positive semi-definite kernel matrices.

We can load the data set as follows:

import numpy as np

def load_davis():
    Y = np.loadtxt("drug-target_interaction_affinities_Kd__Davis_et_al.2011.txt")
    XD = np.loadtxt("drug-drug_similarities_2D.txt")
    XT = np.loadtxt("target-target_similarities_WS.txt")    
    return XD, XT, Y

def settingB_split():
    np.random.seed(1)
    XD, XT, Y = load_davis()
    drug_ind = list(range(Y.shape[0]))
    np.random.shuffle(drug_ind)
    train_drug_ind = drug_ind[:40]
    test_drug_ind = drug_ind[40:]
    #Setting 2: split according to drugs
    Y_train = Y[train_drug_ind]
    Y_test = Y[test_drug_ind]
    Y_train = Y_train.ravel(order='F')
    Y_test = Y_test.ravel(order='F')
    XD_train = XD[train_drug_ind]
    XT_train = XT
    XD_test = XD[test_drug_ind]
    XT_test = XT
    return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test   

def settingC_split():
    np.random.seed(1)
    XD, XT, Y = load_davis()
    drug_ind = list(range(Y.shape[0]))
    target_ind = list(range(Y.shape[1]))
    np.random.shuffle(target_ind)
    train_target_ind = target_ind[:300]
    test_target_ind = target_ind[300:]
    #Setting 3: split according to targets
    Y_train = Y[:, train_target_ind]
    Y_test = Y[:, test_target_ind]
    Y_train = Y_train.ravel(order='F')
    Y_test = Y_test.ravel(order='F')
    XD_train = XD
    XT_train = XT[train_target_ind]
    XD_test = XD
    XT_test = XT[test_target_ind]
    return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test  

def settingD_split():
    np.random.seed(1)
    XD, XT, Y = load_davis()
    drug_ind = list(range(Y.shape[0]))
    target_ind = list(range(Y.shape[1]))
    np.random.shuffle(drug_ind)
    np.random.shuffle(target_ind)
    train_drug_ind = drug_ind[:40]
    test_drug_ind = drug_ind[40:]
    train_target_ind = target_ind[:300]
    test_target_ind = target_ind[300:]
    #Setting 4: ensure that d,t pairs do not overlap between
    #training and test set
    Y_train = Y[np.ix_(train_drug_ind, train_target_ind)]
    Y_test = Y[np.ix_(test_drug_ind, test_target_ind)]
    Y_train = Y_train.ravel(order='F')
    Y_test = Y_test.ravel(order='F')
    XD_train = XD[train_drug_ind]
    XT_train = XT[train_target_ind]
    XD_test = XD[test_drug_ind]
    XT_test = XT[test_target_ind]
    return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test    

if __name__=="__main__":
    XD, XT, Y = load_davis()
    print("Y dimensions %d %d" %Y.shape)
    print("XD dimensions %d %d" %XD.shape)
    print("XT dimensions %d %d" %XT.shape)
    print("drug-target pairs: %d" %(Y.shape[0]*Y.shape[1]))
Y dimensions 68 442
XD dimensions 68 68
XT dimensions 442 442
drug-target pairs: 30056

Linear pairwise model

In this example, we compute predictions first for all combinations of rows from X1_test and X2_test, and then only for three pairs. The number of coefficients in the linear model is the product of the number of features in X1 and X2.

from rlscore.learner.kron_rls import KronRLS
import davis_data

def main():
    X1_train, X2_train, Y_train, X1_test, X2_test, Y_test = davis_data.setting4_split()
    learner = KronRLS(X1 = X1_train, X2 = X2_train, Y = Y_train, regparam=2.**30)
    predictor = learner.predictor
    print(predictor.W)
    #Predict labels for all X1_test - X2_test combinations)
    #Order: column-major: [(X1[0], X2[0]), (X1[1], X2[0])...]
    P = predictor.predict(X1_test, X2_test)
    print("Number of predictions: %d" %P.shape)
    print("three first predictions: " +str(P[:3]))
    x1_ind = [0,1,2]
    x2_ind = [0,0,0]
    P2 = predictor.predict(X1_test, X2_test, x1_ind, x2_ind)
    print("three first predictions again: " +str(P2))
    print("Number of coefficients %d x %d" %predictor.W.shape)

if __name__=="__main__":
    main()
[[-0.00691324  0.02180716  0.02180716 ..., -0.03287607  0.00731334
   0.03048977]
 [ 0.00394694  0.01871022  0.01871022 ..., -0.01892131  0.00183264
   0.0281447 ]
 [-0.00636608 -0.03658065 -0.03658065 ..., -0.05397157 -0.0318122
   0.00433355]
 ..., 
 [ 0.00762781 -0.04508932 -0.04508932 ..., -0.03190214 -0.00378545
   0.02263159]
 [-0.00062419  0.00640143  0.00640143 ...,  0.00545435  0.00910089
   0.02487114]
 [ 0.01135781  0.00607456  0.00607456 ..., -0.01189411  0.00392702
   0.03315039]]
Number of predictions: 3976
three first predictions: [ 10040.18223234   5696.9725815   10743.33734885]
three first predictions again: [ 10040.18223234   5696.9725815   10743.33734885]
Number of coefficients 68 x 442

Kernel pairwise model

Next, we compute predictions first for all combinations of rows from K1_test and L2_test, and then only for three pairs. The number of coefficients in the dual model is the product of the number of training examples (rows /columns) in K1_train and K2_train. For sparse data, the number of dual coefficients would correspond to number of training pairs with known labels.

from rlscore.learner import KronRLS
from rlscore.kernel import GaussianKernel
import davis_data

def main():
    X1_train, X2_train, Y_train, X1_test, X2_test, Y_test = davis_data.setting4_split()
    kernel1 = GaussianKernel(X1_train, gamma=0.01)
    kernel2 = GaussianKernel(X2_train, gamma=10**-9)
    K1_train = kernel1.getKM(X1_train)
    K1_test = kernel1.getKM(X1_test)
    K2_train = kernel2.getKM(X2_train)
    K2_test = kernel2.getKM(X2_test)
    learner = KronRLS(K1 = K1_train, K2 = K2_train, Y = Y_train, regparam=2**-5)
    predictor = learner.predictor
    P = predictor.predict(K1_test, K2_test)
    print("Number of predictions: %d" %P.shape)
    print("three first predictions: " +str(P[:3]))
    x1_ind = [0,1,2]
    x2_ind = [0,0,0]
    P2 = predictor.predict(K1_test, K2_test, x1_ind, x2_ind)
    print("three first predictions again: " +str(P2))
    print("Number of coefficients %d" %predictor.A.shape)

if __name__=="__main__":
    main()
Number of predictions: 3976
three first predictions: [ 10807.34978298   6328.36217703  10289.06372652]
three first predictions again: [ 10807.34978298   6328.36217703  10289.06372652]
Number of coefficients 12000

Saving the predictor

When saving a predictor for future use one can use either pickle, or alternatively save the model coefficients directly to a text file. Saving the predictor instead of the learner can save disk space and memory, since many of the learners store internally matrix decompositions etc. that are needed by the fast cross-validation and regularization algorithms.