GreedyRLS - greedy regularized least-squares feature selection

class rlscore.learner.greedy_rls.GreedyRLS(X, Y, subsetsize, regparam=1.0, bias=1.0, callbackfun=None, **kwargs)

Bases: rlscore.predictor.predictor.PredictorInterface

Linear time greedy forward selection for RLS.

Performs greedy forward selection, where at each step the feature selected is the one whose addition leads to lowest leave-one-out mean squared error.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Data matrix

Y : {array-like}, shape = [n_samples] or [n_samples, n_labels] (if n_labels >1)

Training set labels

regparam : float (regparam > 0)

regularization parameter

subsetsize : int (0 < subsetsize <= n_labels)

number of features to be selected

bias : float, optional

value of constant feature added to each data point (default 1)

Notes

Computational complexity of training: m = n_samples, d = n_features, k = subsetsize, l = n_labels

O(mdkl)

Greedy RLS is described in [1,2]. The extension of the method to multi-target learning was considered in [3].

References

[1] Tapio Pahikkala, Antti Airola, and Tapio Salakoski. Speeding up Greedy Forward Selection for Regularized Least-Squares. Proceedings of The Ninth International Conference on Machine Learning and Applications, 325-330, IEEE Computer Society, 2010.

[2] Tapio Pahikkala, Sebastian Okser, Antti Airola, Tapio Salakoski, and Tero Aittokallio. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations. Algorithms for Molecular Biology, 7(1):11, 2012.

[3] Pekka Naula, Antti Airola, Tapio Salakoski, and Tapio Pahikkala. Multi-label learning under feature extraction budgets. Pattern Recognition Letters, 40:56–65, April 2014.

Attributes:
selected : list, shape = [subsetsize]

indices of the selected features, in the same order as they were selected

performances : list, shape = [subsetsize]

leave-one-out (mean squared) error after adding each feature

predict(X)

Predicts outputs for new inputs

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

input data matrix

Returns:
P : array, shape = [n_samples, n_tasks]

predictions