Prediction¶
The RLScore learners make predictions via the learner.predict( ) -function. Usually the user will not have to worry about the internal details of how this is implemented. In this tutorial we will look under the hood of RLScore to see how the predictors are implemented.
Tutorial 1: Linear predictor¶
Data set¶
We consider the classical Boston Housing data set from the UCI machine learning repository. The data consists of 506 instances, 13 features and 1 output to be predicted.
The data can be loaded from disk and split into a training set of 250, and test set of 256 instances using the following code.
import numpy as np
def load_housing():
np.random.seed(1)
D = np.loadtxt("housing.data")
np.random.shuffle(D)
X = D[:,:-1]
Y = D[:,-1]
X_train = X[:250]
Y_train = Y[:250]
X_test = X[250:]
Y_test = Y[250:]
return X_train, Y_train, X_test, Y_test
def print_stats():
X_train, Y_train, X_test, Y_test = load_housing()
print("Housing data set characteristics")
print("Training set: %d instances, %d features" %X_train.shape)
print("Test set: %d instances, %d features" %X_test.shape)
if __name__ == "__main__":
print_stats()
Housing data set characteristics
Training set: 250 instances, 13 features
Test set: 256 instances, 13 features
The linear predictor is of the form w.T*x + b. The predictor takes as input the n_samples x features - data matrix, and computes predictions.
The predictor is stored in the “predictor” -variable of the trained learner. The “w” and “b” coefficients store the coefficients of the learner model.
from rlscore.learner import RLS
from housing_data import load_housing
def train_rls():
#Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
X_train, Y_train, X_test, Y_test = load_housing()
learner = RLS(X_train, Y_train)
#This is how we make predictions
P_test = learner.predict(X_test)
#We can separate the predictor from learner
predictor = learner.predictor
#And do the same predictions
P_test = predictor.predict(X_test)
#Let's get the coefficients of the predictor
w = predictor.W
b = predictor.b
print("number of coefficients %d" %len(w))
print("w-coefficients " +str(w))
print("bias term %f" %b)
if __name__=="__main__":
train_rls()
number of coefficients 13
w-coefficients [-0.15710164 0.03234827 0.00937111 3.94567546 -2.65055668 5.44180178
-0.01920908 -1.14675642 0.21950548 -0.01116277 -0.6555072 0.01360432
-0.46781757]
bias term 10.682364
Here, we have taken the predictor apart. There are 13 w-coefficients each corresponding to one feature, and additionally a bias feature.
Now let us do feature selection using the greedy RLS method, and select 5 most useful features for the data set.
from rlscore.learner import GreedyRLS
from housing_data import load_housing
def train_rls():
#Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
X_train, Y_train, X_test, Y_test = load_housing()
learner = GreedyRLS(X_train, Y_train, 5)
#This is how we make predictions
P_test = learner.predict(X_test)
#We can separate the predictor from learner
predictor = learner.predictor
#And do the same predictions
P_test = predictor.predict(X_test)
#Let's get the coefficients of the predictor
w = predictor.W
b = predictor.b
print("number of coefficients %d" %len(w))
print("w-coefficients " +str(w))
print("bias term %f" %b)
if __name__=="__main__":
train_rls()
number of coefficients 13
w-coefficients [ 0. 0. 0. 0. 0. 5.48127826
0. -0.69774095 0. 0. -0.84012061 0.01430101
-0.5867683 ]
bias term 8.308351
Now, all but 5 of the w-coefficients are set to zero by the feature selection process.
If we would perform multi-target learning (i.e. Y has several columns), “W” would be [n_features, n_targets]- sized, and “b” of length n_targets.
Tutorial 2: Non-linear predictor¶
Non-linear predictors are of the form K_test * A, where the rows of K_test correspond to test instances, columns to training instances, and elements to kernel evaluations.
from rlscore.learner import RLS
from housing_data import load_housing
def train_rls():
X_train, Y_train, X_test, Y_test = load_housing()
learner = RLS(X_train, Y_train, kernel="GaussianKernel", regparam=0.0003, gamma=0.00003)
#This is how we make predictions
P_test = learner.predict(X_test)
#We can separate the predictor from learner
predictor = learner.predictor
#And do the same predictions
P_test = predictor.predict(X_test)
#Let's get the coefficients of the predictor
A = predictor.A
print("A-coefficients " +str(A))
print("number of coefficients %d" %len(A))
if __name__=="__main__":
train_rls()
A-coefficients [ -3.21848518e+03 -7.18325391e+03 1.68292997e+03 1.88635418e+03
-1.50372867e+04 2.11766278e+03 -5.47728475e+03 -6.03145381e+03
2.11777586e+03 -7.94765717e+03 -5.22478904e+03 4.24563161e+03
5.98455530e+03 7.26288767e+02 -2.89504333e+03 8.16153777e+03
2.50715715e+03 1.09771576e+04 5.97832154e+03 1.14081891e+04
-5.19149505e+03 -1.68006796e+02 -7.70424578e+02 -4.19819316e+02
-1.56432037e+03 -1.46401058e+04 -1.47514750e+02 -6.50687349e+02
1.29133974e+04 -5.33236403e+03 -3.50568691e+02 1.04280207e+04
4.77455649e+02 -1.31758226e+04 2.03105963e+03 5.74486314e+03
-4.45479845e+03 1.50911614e+04 -9.65225108e+03 8.15386522e+03
-1.82814924e+03 -1.25716629e+04 3.72501352e+03 8.43071122e+03
-2.92310407e+03 -2.35741375e+03 9.72316441e+03 -1.63538885e+04
-2.80020260e+02 1.50297091e+02 2.21755472e+04 -1.00940447e+04
4.40720525e+03 6.74705023e+03 1.57143628e+02 -2.70105781e+02
-1.08095754e+04 -9.07256068e+03 8.66260167e+02 1.42571785e+04
-1.05396832e+04 -4.55407520e+03 -1.89895434e+04 3.96783729e+03
2.26347527e+03 -9.34782740e+03 -8.33040507e+03 -1.16722904e+03
-4.06921874e+03 -1.11025086e+04 -3.67606545e+03 1.10582543e+04
4.94145417e+02 -3.69447349e+03 -1.18339408e+03 -1.81421574e+03
3.80781736e+04 -8.11065496e+03 -2.61031541e+04 1.17120682e+04
-7.90548592e+03 -6.70312860e+03 3.08047608e+03 5.57060820e+02
4.96089774e+03 -2.91608141e+03 -4.96452341e+03 -1.83163235e+03
-5.85196933e+03 8.57200871e+02 5.48548444e+03 4.05906956e+03
-1.03997054e+03 3.32149429e+03 1.84037796e+03 -3.83738852e+03
4.66957152e+02 -5.30059936e+03 3.28132605e+03 -2.84824884e+03
-3.62889939e+02 4.71469241e+03 2.25481777e+03 4.22668727e+03
6.42860403e+03 -4.08451775e+03 2.18682124e+03 -8.91089093e+03
-1.53658553e+03 -5.04682609e+03 -1.56882697e+03 -5.96755083e+03
7.16668950e+02 3.11510936e+02 -3.33709709e+03 5.43541284e+03
-6.77942022e+03 2.10605968e+02 1.66298111e+03 7.79417782e+03
-1.39530503e+02 6.99465154e+03 -3.50757979e+03 -1.33938502e+02
5.10700279e+02 -4.02934286e+03 1.16451922e+03 1.13987374e+04
1.68288777e+04 8.95623678e+02 -1.39569596e+03 -3.12021736e+03
3.80562429e+03 2.48401788e+03 2.46098834e+03 4.34997296e+03
-6.88494036e+03 -4.23278092e+02 1.04971143e+03 -7.32519733e+01
2.30193178e+03 4.16231543e+03 6.80232682e+03 -9.60062002e+03
-1.30777778e+04 6.23700531e+03 -1.56085185e+03 2.24310939e+04
-4.99962325e+03 7.39406234e+02 -8.45503311e+03 2.39773753e+03
6.59558093e+03 -1.17229203e+03 2.75558500e+03 1.84164765e+02
-1.81711063e+04 -6.93611087e+03 4.28305749e+03 -1.22503166e+04
1.82647220e+04 -3.54672042e+04 -7.15054507e+02 -1.67638527e+04
1.28315969e+03 -1.18110110e+03 -2.17810499e+03 8.89420746e+03
3.86094711e+03 1.52533324e+03 -7.64662467e+03 1.43940242e+04
6.85966088e+02 -3.59038988e+03 -2.16623124e+03 -7.33610171e+02
2.01618287e+03 -3.43962120e+03 2.05688862e+04 6.40283527e+00
-6.91568621e+03 -5.28574938e+03 9.57276294e+03 3.05178380e+03
8.48751907e+03 1.74009471e+01 -2.36965980e+03 -1.08677190e+03
1.94974046e+02 2.69794586e+03 -3.87858824e+03 5.41986795e+03
4.96552017e+02 -1.69013789e+04 -1.39530891e+04 5.76851547e+03
1.54474439e+04 -4.58658613e+03 1.28805307e+04 5.63442166e+04
7.13409923e+03 1.07210419e+04 2.37307098e+03 7.05856550e+02
2.27313687e+03 2.68410344e+03 -6.85249880e+03 -3.62251976e+03
-8.07693905e+03 -2.73202012e+03 -1.47581808e+03 -2.09220456e+03
-2.27206732e+03 -1.98714655e+03 1.05686459e+03 -2.17582290e+03
2.85612719e+02 2.70191491e+03 -5.73578783e+03 -3.30621877e+03
-2.35265161e+03 8.39953931e+03 -2.19487391e+03 3.02291581e+03
1.32266314e+03 4.97991783e+03 -8.66046811e+03 -6.45146380e+03
2.59967511e+03 6.35966247e+03 1.53722352e+03 8.50561416e+02
-5.20179707e+03 -5.78315087e+03 -1.19178606e+04 -1.60309428e+03
5.09204057e+03 3.91003925e+03 -2.19660056e+02 3.53319812e+03
-6.88274513e+03 -2.33740088e+03 7.13366621e+02 7.08262004e+03
-8.83582256e+02 2.55027331e+03 2.27309011e+03 4.17114726e+03
-4.73291765e+03 -1.11451224e+04]
number of coefficients 250
Now there are as many dual coefficients as training examples. If we would perform multi-target learning (i.e. Y has several columns), “A” would be [n_samples, n_targets]- sized.
The predictor automatically computes the kernel matrix between training and test examples (unless PrecomputedKernel option is used, in that case the caller must compute it).
Tutorial 3: Reduced set approximation¶
The reduced set approximation restricts the predictor of the form K_test * A, where the rows of K_test correspond to test instances, columns to basis vectors, and elements to kernel evaluations.
from rlscore.learner.rls import RLS
from housing_data import load_housing
import random
random.seed(55)
def train_rls():
X_train, Y_train, X_test, Y_test = load_housing()
#select randomly 20 basis vectors
indices = range(X_train.shape[0])
indices = random.sample(indices, 20)
basis_vectors = X_train[indices]
learner = RLS(X_train, Y_train, basis_vectors = basis_vectors, kernel="GaussianKernel", regparam=0.0003, gamma=0.00003)
#Test set predictions
P_test = learner.predict(X_test)
#We can separate the predictor from learner
predictor = learner.predictor
#And do the same predictions
P_test = predictor.predict(X_test)
#Let's get the coefficients of the predictor
A = predictor.A
print("A-coefficients " +str(A))
print("number of coefficients %d" %len(A))
if __name__=="__main__":
train_rls()
A-coefficients [ 862.63230538 837.61114212 39.17191328 -15.99028084 -249.15000363
476.52001256 -49.51264111 -5488.71521975 21.71807705 414.03875161
4868.67259333 -1226.65851169 -10.0901432 -404.30643619 1261.68699787
-605.84484937 -658.46617574 -462.07959936 122.59860061 405.20656904]
number of coefficients 20
Now the predictor needs to construct only 20 rows for the test kernel matrix. Again, with multi-target learning A would be of dimensions [n_bvectors, n_targets].
Tutorial 4: Pairwise predictors¶
Data set¶
The pairwise predictors take as input either two data matrices, or two kernel matrices, corresponding to two sets of instances. By default, the pairwise models compute the predictions for all pairwise combinations of the two sets of instances in column-major order [(X1[0], X2[0]), (X1[1], X2[0])…]. However, one can alternatively supply as argument to the predictor two lists of indices, first corresponding to the rows of X1 / K1, and the second corresponding to rows of X2 / K2. In this case, predictions are computed only for these pairs.
For these experiments, we need to download from the drug-target binding affinity data sets page for the Davis et al. data the drug-target interaction affinities (Y), drug-drug 2D similarities (X1), and WS target-target similarities (X2). In the following we will use similarity scores directly as features for the linear kernel, since the similarity matrices themselves are not valid positive semi-definite kernel matrices.
We can load the data set as follows:
import numpy as np
def load_davis():
Y = np.loadtxt("drug-target_interaction_affinities_Kd__Davis_et_al.2011.txt")
XD = np.loadtxt("drug-drug_similarities_2D.txt")
XT = np.loadtxt("target-target_similarities_WS.txt")
return XD, XT, Y
def settingB_split():
np.random.seed(1)
XD, XT, Y = load_davis()
drug_ind = list(range(Y.shape[0]))
np.random.shuffle(drug_ind)
train_drug_ind = drug_ind[:40]
test_drug_ind = drug_ind[40:]
#Setting 2: split according to drugs
Y_train = Y[train_drug_ind]
Y_test = Y[test_drug_ind]
Y_train = Y_train.ravel(order='F')
Y_test = Y_test.ravel(order='F')
XD_train = XD[train_drug_ind]
XT_train = XT
XD_test = XD[test_drug_ind]
XT_test = XT
return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test
def settingC_split():
np.random.seed(1)
XD, XT, Y = load_davis()
drug_ind = list(range(Y.shape[0]))
target_ind = list(range(Y.shape[1]))
np.random.shuffle(target_ind)
train_target_ind = target_ind[:300]
test_target_ind = target_ind[300:]
#Setting 3: split according to targets
Y_train = Y[:, train_target_ind]
Y_test = Y[:, test_target_ind]
Y_train = Y_train.ravel(order='F')
Y_test = Y_test.ravel(order='F')
XD_train = XD
XT_train = XT[train_target_ind]
XD_test = XD
XT_test = XT[test_target_ind]
return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test
def settingD_split():
np.random.seed(1)
XD, XT, Y = load_davis()
drug_ind = list(range(Y.shape[0]))
target_ind = list(range(Y.shape[1]))
np.random.shuffle(drug_ind)
np.random.shuffle(target_ind)
train_drug_ind = drug_ind[:40]
test_drug_ind = drug_ind[40:]
train_target_ind = target_ind[:300]
test_target_ind = target_ind[300:]
#Setting 4: ensure that d,t pairs do not overlap between
#training and test set
Y_train = Y[np.ix_(train_drug_ind, train_target_ind)]
Y_test = Y[np.ix_(test_drug_ind, test_target_ind)]
Y_train = Y_train.ravel(order='F')
Y_test = Y_test.ravel(order='F')
XD_train = XD[train_drug_ind]
XT_train = XT[train_target_ind]
XD_test = XD[test_drug_ind]
XT_test = XT[test_target_ind]
return XD_train, XT_train, Y_train, XD_test, XT_test, Y_test
if __name__=="__main__":
XD, XT, Y = load_davis()
print("Y dimensions %d %d" %Y.shape)
print("XD dimensions %d %d" %XD.shape)
print("XT dimensions %d %d" %XT.shape)
print("drug-target pairs: %d" %(Y.shape[0]*Y.shape[1]))
Y dimensions 68 442
XD dimensions 68 68
XT dimensions 442 442
drug-target pairs: 30056
Linear pairwise model¶
In this example, we compute predictions first for all combinations of rows from X1_test and X2_test, and then only for three pairs. The number of coefficients in the linear model is the product of the number of features in X1 and X2.
from rlscore.learner.kron_rls import KronRLS
import davis_data
def main():
X1_train, X2_train, Y_train, X1_test, X2_test, Y_test = davis_data.setting4_split()
learner = KronRLS(X1 = X1_train, X2 = X2_train, Y = Y_train, regparam=2.**30)
predictor = learner.predictor
print(predictor.W)
#Predict labels for all X1_test - X2_test combinations)
#Order: column-major: [(X1[0], X2[0]), (X1[1], X2[0])...]
P = predictor.predict(X1_test, X2_test)
print("Number of predictions: %d" %P.shape)
print("three first predictions: " +str(P[:3]))
x1_ind = [0,1,2]
x2_ind = [0,0,0]
P2 = predictor.predict(X1_test, X2_test, x1_ind, x2_ind)
print("three first predictions again: " +str(P2))
print("Number of coefficients %d x %d" %predictor.W.shape)
if __name__=="__main__":
main()
[[-0.00691324 0.02180716 0.02180716 ..., -0.03287607 0.00731334
0.03048977]
[ 0.00394694 0.01871022 0.01871022 ..., -0.01892131 0.00183264
0.0281447 ]
[-0.00636608 -0.03658065 -0.03658065 ..., -0.05397157 -0.0318122
0.00433355]
...,
[ 0.00762781 -0.04508932 -0.04508932 ..., -0.03190214 -0.00378545
0.02263159]
[-0.00062419 0.00640143 0.00640143 ..., 0.00545435 0.00910089
0.02487114]
[ 0.01135781 0.00607456 0.00607456 ..., -0.01189411 0.00392702
0.03315039]]
Number of predictions: 3976
three first predictions: [ 10040.18223234 5696.9725815 10743.33734885]
three first predictions again: [ 10040.18223234 5696.9725815 10743.33734885]
Number of coefficients 68 x 442
Kernel pairwise model¶
Next, we compute predictions first for all combinations of rows from K1_test and L2_test, and then only for three pairs. The number of coefficients in the dual model is the product of the number of training examples (rows /columns) in K1_train and K2_train. For sparse data, the number of dual coefficients would correspond to number of training pairs with known labels.
from rlscore.learner import KronRLS
from rlscore.kernel import GaussianKernel
import davis_data
def main():
X1_train, X2_train, Y_train, X1_test, X2_test, Y_test = davis_data.setting4_split()
kernel1 = GaussianKernel(X1_train, gamma=0.01)
kernel2 = GaussianKernel(X2_train, gamma=10**-9)
K1_train = kernel1.getKM(X1_train)
K1_test = kernel1.getKM(X1_test)
K2_train = kernel2.getKM(X2_train)
K2_test = kernel2.getKM(X2_test)
learner = KronRLS(K1 = K1_train, K2 = K2_train, Y = Y_train, regparam=2**-5)
predictor = learner.predictor
P = predictor.predict(K1_test, K2_test)
print("Number of predictions: %d" %P.shape)
print("three first predictions: " +str(P[:3]))
x1_ind = [0,1,2]
x2_ind = [0,0,0]
P2 = predictor.predict(K1_test, K2_test, x1_ind, x2_ind)
print("three first predictions again: " +str(P2))
print("Number of coefficients %d" %predictor.A.shape)
if __name__=="__main__":
main()
Number of predictions: 3976
three first predictions: [ 10807.34978298 6328.36217703 10289.06372652]
three first predictions again: [ 10807.34978298 6328.36217703 10289.06372652]
Number of coefficients 12000
Saving the predictor¶
When saving a predictor for future use one can use either pickle, or alternatively save the model coefficients directly to a text file. Saving the predictor instead of the learner can save disk space and memory, since many of the learners store internally matrix decompositions etc. that are needed by the fast cross-validation and regularization algorithms.