RLS - regularized least-squares regression¶
-
class
rlscore.learner.rls.
RLS
(X, Y, regparam=1.0, kernel='LinearKernel', basis_vectors=None, **kwargs)¶ Bases:
rlscore.predictor.predictor.PredictorInterface
Regularized least-squares regression/classification
Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Data matrix
- Y : {array-like}, shape = [n_samples] or [n_samples, n_labels]
Training set labels
- regparam : float, optional
regularization parameter, regparam > 0 (default=1.0)
- kernel : {‘LinearKernel’, ‘GaussianKernel’, ‘PolynomialKernel’, ‘PrecomputedKernel’, …}
kernel function name, imported dynamically from rlscore.kernel
- basis_vectors : {array-like, sparse matrix}, shape = [n_bvectors, n_features], optional
basis vectors (typically a randomly chosen subset of the training data)
Other Parameters: - bias : float, optional
LinearKernel: the model is w*x + bias*w0, (default=1.0)
- gamma : float, optional
GaussianKernel: k(xi,xj) = e^(-gamma*<xi-xj,xi-xj>) (default=1.0) PolynomialKernel: k(xi,xj) = (gamma * <xi, xj> + coef0)**degree (default=1.0)
- coef0 : float, optional
PolynomialKernel: k(xi,xj) = (gamma * <xi, xj> + coef0)**degree (default=0.)
- degree : int, optional
PolynomialKernel: k(xi,xj) = (gamma * <xi, xj> + coef0)**degree (default=2)
Notes
Computational complexity of training: m = n_samples, d = n_features, l = n_labels, b = n_bvectors
O(m^3 + dm^2 + lm^2): basic case
O(md^2 +lmd): Linear Kernel, d < m
O(mb^2 +lmb): Sparse approximation with basis vectors
Basic information about RLS, and a description of the fast leave-one-out method can be found in [1]. The efficient K-fold cross-validation algorithm implemented in the method holdout is based on results in [2] and [3]. The leave-pair-out cross-validation algorithm implemented in leave_pair_out is a modification of the method described in [4] , its use for AUC-estimation has been analyzed in [5].
References
[1] Ryan Rifkin, Ross Lippert. Notes on Regularized Least Squares. Technical Report, MIT, 2007.
[2] Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. Fast n-Fold Cross-Validation for Regularized Least-Squares. Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence, 83-90, Otamedia Oy, 2006.
[3] Tapio Pahikkala, Hanna Suominen, and Jorma Boberg. Efficient cross-validation for kernelized least-squares regression with sparse basis expansions. Machine Learning, 87(3):381–407, June 2012.
[4] Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR‘08), pages 1-8, Espoo, Finland, 2008.
[5] Antti Airola, Tapio Pahikkala, Willem Waegeman, Bernard De Baets, and Tapio Salakoski An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55(4):1828-1844, April 2011.
Attributes: - predictor: {LinearPredictor, KernelPredictor}
trained predictor
-
holdout
(indices)¶ Computes hold-out predictions
Parameters: - indices : list of indices, shape = [n_hsamples]
list of indices of training examples belonging to the set for which the hold-out predictions are calculated. The list can not be empty.
Returns: - F : array, shape = [n_hsamples, n_labels]
holdout predictions
Notes
Computational complexity of holdout: m = n_samples, d = n_features, l = n_labels, b = n_bvectors, h=n_hsamples
O(h^3 + lmh): basic case
O(min(h^3 + lh^2, d^3 + ld^2) +ldh): Linear Kernel, d < m
O(min(h^3 + lh^2, b^3 + lb^2) +lbh): Sparse approximation with basis vectors
The fast holdout algorithm is based on results presented in [1,2]. However, the removal of basis vectors decribed in [2] is currently not implemented.
References
[1] Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. Fast n-Fold Cross-Validation for Regularized Least-Squares. Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence, 83-90, Otamedia Oy, 2006.
[2] Tapio Pahikkala, Hanna Suominen, and Jorma Boberg. Efficient cross-validation for kernelized least-squares regression with sparse basis expansions. Machine Learning, 87(3):381–407, June 2012.
-
leave_one_out
()¶ Computes leave-one-out predictions
Returns: - F : array, shape = [n_samples, n_labels]
leave-one-out predictions
Notes
Computational complexity of leave-one-out: m = n_samples, d = n_features, l = n_labels, b = n_bvectors
O(lm^2): basic case
O(ldm): Linear Kernel, d < m
O(lbm): Sparse approximation with basis vectors
Implements the classical leave-one-out algorithm described for example in [1].
References
[1] Ryan Rifkin, Ross Lippert. Notes on Regularized Least Squares. Technical Report, MIT, 2007.
-
leave_pair_out
(pairs_start_inds, pairs_end_inds)¶ Computes leave-pair-out predictions
Parameters: - pairs_start_inds : list of indices, shape = [n_pairs]
list of indices from range [0, n_samples-1]
- pairs_end_inds : list of indices, shape = [n_pairs]
list of indices from range [0, n_samples-1]
Returns: - P1 : array, shape = [n_pairs, n_labels]
holdout predictions for pairs_start_inds
- P2 : array, shape = [n_pairs, n_labels]
holdout predictions for pairs_end_inds
Notes
Computes the leave-pair-out cross-validation predictions, where each (i,j) pair with i= pair_start_inds[k] and j = pairs_end_inds[k] is left out in turn.
When estimating area under ROC curve with leave-pair-out, one should leave out all positive-negative pairs, while for estimating the general ranking error one should leave out all pairs with different labels.
Computational complexity of leave-pair-out with most pairs left out: m = n_samples, d = n_features, l = n_labels, b = n_bvectors
O(lm^2+m^3): basic case
O(lm^2+dm^2): Linear Kernel, d < m
O(lm^2+bm^2): Sparse approximation with basis vectors
The algorithm is an adaptation of the method published originally in [1]. The use of leave-pair-out cross-validation for AUC estimation has been analyzed in [2].
References
[1] Tapio Pahikkala, Antti Airola, Jorma Boberg, and Tapio Salakoski. Exact and efficient leave-pair-out cross-validation for ranking RLS. In Proceedings of the 2nd International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR‘08), pages 1-8, Espoo, Finland, 2008.
[2] Antti Airola, Tapio Pahikkala, Willem Waegeman, Bernard De Baets, and Tapio Salakoski. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis, 55(4):1828–1844, April 2011.
-
predict
(X)¶ Predicts outputs for new inputs
Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features]
input data matrix
Returns: - P : array, shape = [n_samples, n_tasks]
predictions
-
solve
(regparam=1.0)¶ Re-trains RLS for the given regparam
Parameters: - regparam : float, optional
regularization parameter, regparam > 0 (default=1.0)
Notes
Computational complexity of re-training: m = n_samples, d = n_features, l = n_labels, b = n_bvectors
O(lm^2): basic case
O(lmd): Linear Kernel, d < m
O(lmb): Sparse approximation with basis vectors
References
[1] Ryan Rifkin, Ross Lippert. Notes on Regularized Least Squares. Technical Report, MIT, 2007.