TwoStepRLS - Two step training method with RLS for pairwise data with complete training set¶
-
class
rlscore.learner.two_step_rls.
TwoStepRLS
(**kwargs)¶ Bases:
rlscore.predictor.pairwise_predictor.PairwisePredictorInterface
Two-step regularized least-squares regression with paired-input (dyadic) data. Closed form solution for complete data set with labels for all pairs known.
Parameters: - X1 : {array-like}, shape = [n_samples1, n_features1]
Data matrix 1 (for linear TwoStepRLS)
- X2 : {array-like}, shape = [n_samples2, n_features2]
Data matrix 2 (for linear TwoStepRLS)
- K1 : {array-like}, shape = [n_samples1, n_samples1]
Kernel matrix 1 (for kernel TwoStepRLS)
- K2 : {array-like}, shape = [n_samples1, n_samples1]
Kernel matrix 2 (for kernel TwoStepRLS)
- Y : {array-like}, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to Y[i + j*n_samples1] (column order).
- regparam1 : float
regularization parameter 1, regparam1 > 0
- regparam2 : float
regularization parameter 2, regparam2 > 0
Notes
Computational complexity of training: m = n_samples1, n = n_samples2, d = n_features1, e = n_features2
O(mnd + mne) Linear version (assumption: d < m, e < n)
O(m^3 + n^3) Kernel version
TwoStepRLS implements the closed form solution described in [1].
References
[1] Tapio Pahikkala, Michiel Stock, Antti Airola, Tero Aittokallio, Bernard De Baets, and Willem Waegeman. A two-step learning approach for solving full and almost full cold start problems in dyadic prediction. Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2014). Volume 8725 of Lecture Notes in Computer Science, pages 517–532. Springer, 2014.
Attributes: - predictor : {LinearPairwisePredictor, KernelPairwisePredictor}
trained predictor
-
in_sample_kfoldcv
(folds, maxiter=None)¶ Computes the in-sample k-fold cross-validation predictions. By in-sample we denote the setting, where we leave a set of arbitrary entries of Y out at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
in_sample_loo
()¶ Computes the in-sample leave-one-out cross-validation predictions. By in-sample we denote the setting, where we leave out one entry of Y at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
Notes
Computational complexity:
m = n_samples1, n = n_samples2, d = n_features1, e = n_features2
O(mne + mnd) Linear version (assumption: d < m, e < n)
O(mn^2 + m^2n) Kernel version
-
in_sample_loo_symmetric
()¶ Computes the in-sample leave-one-out cross-validation predictions. By in-sample we denote the setting, where we leave out one entry of Y at a time, and due to (anti-)symmetry, where X1 = X2 (or K1 = K2) and (a,b)=(b,a) (or (a,b)=-(b,a)) in Y, we further remove (b,a).
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
leave_x1_out
()¶ Computes the leave-row-out cross-validation predictions. Here, all instances related to a single object from domain 1 are left out together at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
leave_x2_out
()¶ Computes the leave-column-out cross-validation predictions. Here, all instances related to a single object from domain 2 are left out together at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
out_of_sample_kfold_cv
(rowfolds, colfolds)¶ Computes the out-of-sample cross-validation predictions with given subset of rows and columns. By out-of-sample we denote the setting, where when leaving out an entry (a,b) in Y, we also remove from training set all instances of type (a,x) and (x,b).
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
Notes
Computational complexity [TODO]
-
out_of_sample_loo
()¶ Computes the out-of-sample cross-validation predictions. By out-of-sample we denote the setting, where when leaving out an entry (a,b) in Y, we also remove from training set all instances of type (a,x) and (x,b).
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
Notes
Computational complexity [TODO: check]:
m = n_samples1, n = n_samples2, d = n_features1, e = n_features2
O(mne + mnd) Linear version (assumption: d < m, e < n)
O(mn^2 + m^2n) Kernel version
-
out_of_sample_loo_symmetric
()¶ Computes the out-of-sample cross-validation predictions. By out-of-sample we denote the setting, where when leaving out an entry (a,b) in Y, we also remove from training set all instances of type (a,x) and (x,b), and due to (anti-)symmetry, where X1 = X2 (or K1 = K2) and (a,b)=(b,a) (or (a,b)=-(b,a)), we further remove from training set all instances of type (x,a) and (b,x).
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
predict
(X1=None, X2=None, inds_X1pred=None, inds_X2pred=None, pko=None)¶ Computes predictions for test examples.
Parameters: - X1 : {array-like}, shape = [n_samples1, n_features1]
first test data matrix
- X2 : {array-like}, shape = [n_samples2, n_features2]
second test data matrix
- inds_X1pred : array of indices, optional
rows of X1, for which predictions are needed
- inds_X2pred : array of indices, optional
rows of X2, for which predictions are needed
Notes
If using kernels, give kernel matrices K1 and K2 as arguments instead of X1 and X2
-
solve
(regparam1, regparam2)¶ Re-trains TwoStepRLS for the given regparams
Parameters: - regparam1: float
regularization parameter 1, regparam1 > 0
- regparam2: float
regularization parameter 2, regparam2 > 0
Notes
Computational complexity of re-training:
m = n_samples1, n = n_samples2, d = n_features1, e = n_features2
O(ed^2 + de^2) Linear version (assumption: d < m, e < n)
O(m^3 + n^3) Kernel version
-
x1_kfold_cv
(folds)¶ Computes the leave-row-out cross-validation predictions. Here, all instances related to a single object from domain 1 are left out together at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).
-
x2_kfold_cv
(folds)¶ Computes the leave-column-out cross-validation predictions. Here, all instances related to a single object from domain 2 are left out together at a time.
Returns: - F : array, shape = [n_samples1*n_samples2]
Training set labels. Label for (X1[i], X2[j]) maps to F[i + j*n_samples1] (column order).