Node List

Full API documentation: nodes

class mdp.nodes.PCANode

Filter the input data through the most significatives of its principal components.

avg

Mean of the input data (available after training).

v

Transposed of the projection matrix (available after training).

d

Variance corresponding to the PCA components (eigenvalues of the covariance matrix).

explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.


Reference

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Full API documentation: PCANode

class mdp.nodes.WhiteningNode

Whiten the input data by filtering it through the most significant of its principal components.

All output signals have zero mean, unit variance and are decorrelated.

avg

Mean of the input data (available after training).

v

Transpose of the projection matrix (available after training).

d

Variance corresponding to the PCA components (eigenvalues of the covariance matrix).

explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

Full API documentation: WhiteningNode

class mdp.nodes.NIPALSNode

Perform Principal Component Analysis using the NIPALS algorithm.

This algorithm is particularly useful if you have more variables than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be infeasible. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.

avg

Mean of the input data (available after training).

d

Variance corresponding to the PCA components.

v

Transposed of the projection matrix (available after training).

explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.


Reference

Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).

More information about Principal Component Analysis*, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Full API documentation: NIPALSNode

class mdp.nodes.FastICANode

Perform Independent Component Analysis using the FastICA algorithm.

Note that FastICA is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode). criterium is not robust in telescope mode).

History:

  • 1.4.1998 created for Matlab by Jarmo Hurri, Hugo Gavert, Jaakko Sarela, and Aapo Hyvarinen

  • 7.3.2003 modified for Python by Thomas Wendler

  • 3.6.2004 rewritten and adapted for scipy and MDP by MDP’s authors

  • 25.5.2005 now independent from scipy. Requires Numeric or numarray

  • 26.6.2006 converted to numpy

  • 14.9.2007 updated to Matlab version 2.5

white

The whitening node used for preprocessing.

filters

The ICA filters matrix (this is the transposed of the projection matrix after whitening).

convergence

The value of the convergence threshold.

Reference

Aapo Hyvarinen (1999). Fast and Robust Fixed-Point Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626-634.

Full API documentation: FastICANode

class mdp.nodes.CuBICANode

Perform Independent Component Analysis using the CuBICA algorithm.

Note that CuBICA is a batch-algorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

As an alternative to this batch mode you might consider the telescope mode (see the docs of the __init__ method).

white

The whitening node used for preprocessing.

filters

The ICA filters matrix (this is the transposed of the projection matrix after whitening).

convergence

The value of the convergence threshold.

Reference

Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third- and Fourth-Order Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 1250-1256.

Full API documentation: CuBICANode

class mdp.nodes.TDSEPNode

Perform Independent Component Analysis using the TDSEP algorithm.

Note

That TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.

white

The whitening node used for preprocessing.

filters

The ICA filters matrix (this is the transposed of the projection matrix after whitening).

convergence

The value of the convergence threshold.

Reference

Ziehe, Andreas and Muller, Klaus-Robert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).

Full API documentation: TDSEPNode

class mdp.nodes.JADENode

Perform Independent Component Analysis using the JADE algorithm.

Note that JADE is a batch-algorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.

JADE does not support the telescope mode.


Reference

Cardoso, Jean-Francois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362-370.

Cardoso, Jean-Francois (1999). High-order contrasts for independent component analysis. Neural Computation, 11(1): 157-192.

Original code contributed by: Gabriel Beckers (2008).


History

  • May 2005 version 1.8 for MATLAB released by Jean-Francois Cardoso

  • Dec 2007 MATLAB version 1.8 ported to Python/NumPy by Gabriel Beckers

  • Feb 15 2008 Python/NumPy version adapted for MDP by Gabriel Beckers

Full API documentation: JADENode

class mdp.nodes.SFANode

Extract the slowly varying components from the input data.

avg

Mean of the input data (available after training)

sf

Matrix of the SFA filters (available after training)

d

Delta values corresponding to the SFA components (generalized eigenvalues). [See the docs of the get_eta_values method for more information]

Reference

More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Full API documentation: SFANode

class mdp.nodes.SFA2Node

Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components.

The get_quadratic_form method returns the input-output

function of one of the learned unit as a QuadraticForm object. See the documentation of mdp.utils.QuadraticForm for additional information.

Reference:

More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770 (2002).

Full API documentation: SFA2Node

class mdp.nodes.ISFANode

Perform Independent Slow Feature Analysis on the input data.

RP

The global rotation-permutation matrix. This is the filter applied on input_data to get output_data

RPC

The complete global rotation-permutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)

covs

A mdp.utils.MultipleCovarianceMatrices instance input_data. After convergence the uppermost output_dim x output_dim submatrices should be almost diagonal. self.covs[n-1] is the covariance matrix relative to the n-th time-lag

Note

They are not cleared after convergence. If you need to free some memory, you can safely delete them with:

>>> del self.covs
initial_contrast

A dictionary with the starting contrast and the SFA and ICA parts of it.

final_contrast

Like the above but after convergence.

Note

If you intend to use this node for large datasets please have a look at the stop_training method documentation for speeding things up.

Reference

Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):994-1021 (2007) http://itb.biologie.hu-berlin.de/~wiskott/Publications/BlasZitoWisk2007-ISFA-NeurComp.pdf

Full API documentation: ISFANode

class mdp.nodes.XSFANode

Perform Non-linear Blind Source Separation using Slow Feature Analysis. This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).

The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:

>>> flow = mdp.Flow([XSFANode()])
>>> flow.train(x)

doing so will automatically train all training phases. The argument x to the Flow.train method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info). If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.

If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:

>>> flow = mdp.Flow([XSFANode()])
>>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x)
>>> ex_filename, out = bimdp.show_execution(flow, x=x)

this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.


Reference

Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskott-Cogprints-2010.pdf

Full API documentation: XSFANode

class mdp.nodes.GSFANode

This node implements “Graph-Based SFA (GSFA)”, which is the main component of hierarchical GSFA (HGSFA).

For further information, see: Escalante-B A.-N., Wiskott L, “How to solve classification and regression problems on high-dimensional data with a supervised extension of Slow Feature Analysis”. Journal of Machine Learning Research 14:3683-3719, 2013

Full API documentation: GSFANode

class mdp.nodes.iGSFANode

This node implements “information-preserving graph-based SFA (iGSFA)”, which is the main component of hierarchical iGSFA (HiGSFA).

For further information, see: Escalante-B., A.-N. and Wiskott, L., “Improved graph-based {SFA}: Information preservation complements the slowness principle”, e-print arXiv:1601.03945, http://arxiv.org/abs/1601.03945, 2017.

Full API documentation: iGSFANode

class mdp.nodes.FDANode

Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.

Note

FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:

  • call the train method with two arguments: the input data and the labels (see the doc string of the train method for details).

  • if you are training the node by hand, call the train method twice.

  • if you are training the node using a flow (recommended), the only argument to Flow.train must be a list of (data_point, label) tuples or an iterator returning lists of such tuples, not a generator. The Flow.train function can be called just once as usual, since it takes care of rewinding the iterator to perform the second training step.

avg

Mean of the input data (available after training)

v

Transposed of the projection matrix, so that output = dot(input-self.avg, self.v) (available after training).

Reference

More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105-112.

Full API documentation: FDANode

class mdp.nodes.FANode

Perform Factor Analysis.

The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EM-cycles are performed at its end.

The execute method returns the Maximum A Posteriori estimate of the latent variables. The generate_input method generates observations from the prior distribution.

mu

Mean of the input data (available after training)

A

Generating weights (available after training)

E_y_mtx

Weights for Maximum A Posteriori inference

sigma

Vector of estimated variance of the noise for all input components


Reference

More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.

Full API documentation: FANode

class mdp.nodes.RBMNode

Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables. By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input. Use the sample_v method to sample from the observed variables given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.


Reference

For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

w

Generative weights between hidden and observed variables.

bv

Bias vector of the observed variables.

bh

Bias vector of the hidden variables.

Full API documentation: RBMNode

class mdp.nodes.RBMWithLabelsNode

Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels. By default, the execute method returns the probability of one of the hiden variables being equal to 1 given the input. Use the sample_v method to sample from the observed variables (visible and labels) given a setting of the hidden variables, and sample_h to do the opposite. The energy method can be used to compute the energy of a given setting of all variables.


Reference

The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711-1800

For more information on RBMs with labels, see:

  • Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.

  • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.

w

Generative weights between hidden and observed variables.

bv

Bias vector of the observed variables.

bh

Bias vector of the hidden variables.

Full API documentation: RBMWithLabelsNode

class mdp.nodes.GrowingNeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation.

The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.

graph

The corresponding mdp.graph.Graph object.


Reference

More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625-632. MIT Press, Cambridge MA, 1995.

Full API documentation: GrowingNeuralGasNode

class mdp.nodes.LLENode

Perform a Locally Linear Embedding analysis on the data.

training_projection

The LLE projection of the training data (defined when training finishes).

desired_variance

Variance limit used to compute intrinsic dimensionality.

Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.


Reference

Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 2323-2326, 2000.

Original code contributed by: Jake VanderPlas, University of Washington,

Full API documentation: LLENode

class mdp.nodes.HLLENode

Perform a Hessian Locally Linear Embedding analysis on the data.

training_projection

The HLLE projection of the training data (defined when training finishes)

desired_variance

Variance limit used to compute intrinsic dimensionality.


Note

Many methods are inherited from LLENode, including _execute(), _adjust_output_dim(), etc. The main advantage of the Hessian estimator is to limit distortions of the input manifold. Once the model has been trained, it is sufficient (and much less computationally intensive) to determine projections for new points using the LLE framework.


Reference

Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 100(10): 5591-5596, 2003.

Original code contributed by: Jake Vanderplas, University of Washington

Full API documentation: HLLENode

class mdp.nodes.LinearRegressionNode

Compute least-square, multivariate linear regression on the input data, i.e., learn coefficients b_j so that the linear combination y_i = b_0 + b_1 x_1 + ... b_N x_N , for i = 1 ... M, minimizes the sum of squared error given the training x’s and y’s.

This is a supervised learning node, and requires input data x and target data y to be supplied during training (see train docstring).

beta

The coefficients of the linear regression

Full API documentation: LinearRegressionNode

class mdp.nodes.QuadraticExpansionNode

Perform expansion in the space formed by all linear and quadratic monomials. QuadraticExpansionNode() is equivalent to a PolynomialExpansionNode(2)

Full API documentation: QuadraticExpansionNode

class mdp.nodes.PolynomialExpansionNode

Perform expansion in a polynomial space.

Full API documentation: PolynomialExpansionNode

class mdp.nodes.RBFExpansionNode

Expand input space with Gaussian Radial Basis Functions (RBFs).

The input data is filtered through a set of unnormalized Gaussian filters, i.e.:

y_j = exp(-0.5/s_j * ||x - c_j||^2)

for isotropic RBFs, or more in general:

y_j = exp(-0.5 * (x-c_j)^T S^-1 (x-c_j))

for anisotropic RBFs.

Full API documentation: RBFExpansionNode

class mdp.nodes.GeneralExpansionNode

Expands the input samples by applying to them one or more functions provided.

The functions to be applied are specified by a list [f_0, …, f_k], where f_i, for 0 <= i <= k, denotes a particular function. The input data given to these functions is a two-dimensional array and the output is another two-dimensional array. The dimensionality of the output should depend only on the dimensionality of the input. Given a two-dimensional input array x, the output of the node is then [f_0(x), …, f_k(x)], that is, the concatenation of each one of the computed arrays f_i(x).

This node has been designed to facilitate nonlinear, fixed but arbitrary transformations of the data samples within MDP flows.

Original code contributed by Alberto Escalante.


Example

>>> import mdp
>>> from mdp import numx
>>> def identity(x): return x
>>> def u3(x): return numx.absolute(x)**3 #A simple nonlinear transformation
>>> def norm2(x): #Computes the norm of each sample returning an Nx1 array
>>>     return ((x**2).sum(axis=1)**0.5).reshape((-1,1)) 
>>> x = numx.array([[-2., 2.], [0.2, 0.3], [0.6, 1.2]])
>>> gen = mdp.nodes.GeneralExpansionNode(funcs=[identity, u3, norm2])
>>> print(gen.execute(x))
>>> [[-2.          2.          8.          8.          2.82842712]
>>>  [ 0.2         0.3         0.008       0.027       0.36055513]
>>>  [ 0.6         1.2         0.216       1.728       1.34164079]]

Full API documentation: GeneralExpansionNode

class mdp.nodes.GrowingNeuralGasExpansionNode

Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.

The positions of RBFs correspond to position of the nodes of the neural gas The sizes of the RBFs correspond to mean distance to the neighbouring nodes.


Note

Adjust the maximum number of nodes to control the dimension of the expansion.


Reference

More information on this expansion type can be found in: B. Fritzke. Growing cell structures-a self-organizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).

Full API documentation: GrowingNeuralGasExpansionNode

class mdp.nodes.NeuralGasNode

Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).

graph

The corresponding mdp.graph.Graph object.

max_epochs

Maximum number of epochs until which to train.


Reference

The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “Neural-Gas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, North-Holland., 1991.

Full API documentation: NeuralGasNode

class mdp.nodes.SignumClassifier

This classifier node classifies as 1 if the sum of the data points is positive and as -1 if the data point is negative.

Full API documentation: SignumClassifier

class mdp.nodes.PerceptronClassifier

A simple perceptron with input_dim input nodes.

Full API documentation: PerceptronClassifier

class mdp.nodes.SimpleMarkovClassifier

A simple version of a Markov classifier.

It can be trained on a vector of tuples the label being the next element in the testing data.

Full API documentation: SimpleMarkovClassifier

class mdp.nodes.DiscreteHopfieldClassifier

Node for simulating a simple discrete Hopfield model

Full API documentation: DiscreteHopfieldClassifier

class mdp.nodes.KMeansClassifier

Employs K-Means Clustering for a given number of centroids.

Full API documentation: KMeansClassifier

class mdp.nodes.NormalizeNode

Make input signal meanfree and unit variance.

Full API documentation: NormalizeNode

class mdp.nodes.GaussianClassifier

Perform a supervised Gaussian classification.

Given a set of labelled data, the node fits a gaussian distribution to each class.

Full API documentation: GaussianClassifier

class mdp.nodes.NearestMeanClassifier

Nearest-Mean classifier.

Full API documentation: NearestMeanClassifier

class mdp.nodes.KNNClassifier

K-Nearest-Neighbour Classifier.

Full API documentation: KNNClassifier

class mdp.nodes.EtaComputerNode

Compute the eta values of the normalized training data.

The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e. delta(x) = mean(dx/dt(t)^2). delta(x) is zero if x is a constant signal, and increases if the temporal variation of the signal is bigger.

The eta value is a more intuitive measure of temporal variation, defined as:

eta(x) = T/(2*pi) * sqrt(delta(x))

If x is a signal of length T which consists of a sine function that accomplishes exactly N oscillations, then eta(x)=N.

EtaComputerNode normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.

Note

  • If a data chunk is tlen data points long, this node is going to consider only the first tlen-1 points together with their derivatives. This means in particular that the variance of the signal is not computed on all data points. This behavior is compatible with that of SFANode.

  • This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the method get_eta to access them.


Reference

Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715-770.

Full API documentation: EtaComputerNode

class mdp.nodes.HitParadeNode

Collect the first n local maxima and minima of the training signal which are separated by a minimum gap d.

This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the get_maxima and get_minima methods to access them.

Full API documentation: HitParadeNode

class mdp.nodes.NoiseNode

Inject multiplicative or additive noise into the input data.

Original code contributed by Mathias Franzius.

Full API documentation: NoiseNode

class mdp.nodes.NormalNoiseNode

Special version of NoiseNode for Gaussian additive noise.

Unlike NoiseNode it does not store a noise function reference but simply uses numx_rand.normal.

Full API documentation: NormalNoiseNode

class mdp.nodes.TimeFramesNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1) X(3) Y(3) X(5) Y(5)
  X(2) Y(2)          X(2) Y(2) X(4) Y(4) X(6) Y(6)
  X(3) Y(3)   -->    X(3) Y(3) X(5) Y(5) X(7) Y(7)
  X(4) Y(4)          X(4) Y(4) X(6) Y(6) X(8) Y(8)
  X(5) Y(5)          ...  ...  ...  ...  ...  ... ]
  X(6) Y(6)
  X(7) Y(7)
  X(8) Y(8)
  ...  ...  ]

It is not always possible to invert this transformation (the transformation is not surjective. However, the pseudo_inverse method does the correct thing when it is indeed possible.

Full API documentation: TimeFramesNode

class mdp.nodes.TimeDelayNode

Copy delayed version of the input signal on the space dimensions.

For example, for time_frames=3 and gap=2:

[ X(1) Y(1)        [ X(1) Y(1)   0    0    0    0
  X(2) Y(2)          X(2) Y(2)   0    0    0    0
  X(3) Y(3)   -->    X(3) Y(3) X(1) Y(1)   0    0
  X(4) Y(4)          X(4) Y(4) X(2) Y(2)   0    0
  X(5) Y(5)          X(5) Y(5) X(3) Y(3) X(1) Y(1)
  X(6) Y(6)          ...  ...  ...  ...  ...  ... ]
  X(7) Y(7)
  X(8) Y(8)
  ...  ...  ]

This node provides similar functionality as the TimeFramesNode, only that it performs a time embedding into the past rather than into the future.

See TimeDelaySlidingWindowNode for a sliding window delay node for application in a non-batch manner.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelayNode

class mdp.nodes.TimeDelaySlidingWindowNode

TimeDelaySlidingWindowNode is an alternative to TimeDelayNode which should be used for online learning/execution. Whereas the TimeDelayNode works in a batch manner, for online application a sliding window is necessary which yields only one row per call.

Applied to the same data the collection of all returned rows of the TimeDelaySlidingWindowNode is equivalent to the result of the TimeDelayNode.

Original code contributed by Sebastian Hoefer. Dec 31, 2010

Full API documentation: TimeDelaySlidingWindowNode

class mdp.nodes.CutoffNode

Node to cut off values at specified bounds.

Works similar to numpy.clip, but also works when only a lower or upper bound is specified.

Full API documentation: CutoffNode

class mdp.nodes.AdaptiveCutoffNode

Node which uses the data history during training to learn cutoff values.

As opposed to the simple CutoffNode, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as a HistogramNode, so the histogram data is stored.

When stop_training is called the cutoff values for each coordinate are calculated based on the collected histogram data.

Full API documentation: AdaptiveCutoffNode

class mdp.nodes.HistogramNode

Node which stores a history of the data during its training phase.

The data history is stored in self.data_hist and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.

Note that data is only stored during training.

Full API documentation: HistogramNode

class mdp.nodes.IdentityNode

Execute returns the input data and the node is not trainable.

This node can be instantiated and is for example useful in complex network layouts.

Full API documentation: IdentityNode

class mdp.nodes.OnlineCenteringNode

OnlineCenteringNode centers the input data, that is, subtracts the arithmetic mean (average) from the input data. This is an online learnable node.

Note

The node’s train method updates the average (avg) according to the update rule:

avg <- (1 / n) * x + (1-1/n) * avg, where n is the total number of samples observed while training.

The node’s execute method subtracts the updated average from the input and returns it.

This node also supports centering via an exponentially weighted moving average that resembles a leaky integrator:

avg <- alpha * x + (1-alpha) * avg, where alpha = 2. / (avg_n + 1).

avg_n intuitively denotes a “window size”. For a large avg_n, ‘avg_n’-samples represent about 86% of the total weight.

avg

The updated average of the input data

Full API documentation: OnlineCenteringNode

class mdp.nodes.OnlineTimeDiffNode

Compute the discrete time derivative of the input using backward difference approximation:

dx(n) = x(n) - x(n-1), where n is the total number of input samples observed during training.

This is an online learnable node that uses a buffer to store the previous input sample = x(n-1). The node’s train method updates the buffer. The node’s execute method returns the time difference using the stored buffer as its previous input sample x(n-1).

This node supports both “incremental” and “batch” training types.


Example

If the training and execute methods are called sample by sample incrementally::

train(x[1]), y[1]=execute(x[1]), train(x[2]), y[2]=execute(x[2]), …,

then::

y[1] = x[1] y[2] = x[2] - x[1] y[3] = x[3] - x[2] …

If training and execute methods are called block by block::

train([x[1], x[2], x[3]]), [y[3], y[4], y[5]] = execute([x[3], x[4], x[5]])

then::

y[3] = x[3] - x[2] y[4] = x[4] - x[3] y[5] = x[5] - x[4]

Note that the stored buffer is still = x[2]. Only train() method changes the state of the node. execute’s input data is always assumed to start at get_current_train_iteration() time step.

Full API documentation: OnlineTimeDiffNode

class mdp.nodes.CCIPCANode

Candid-Covariance free Incremental Principal Component Analysis (CCIPCA) extracts the principal components from the input data incrementally.

v

Eigen vectors

d

Eigen values


Reference

More information about Candid-Covariance free Incremental Principal Component Analysis can be found in Weng J., Zhang Y. and Hwang W., Candid covariance-free incremental principal component analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, 1034–1040, 2003.

Full API documentation: CCIPCANode

class mdp.nodes.CCIPCAWhiteningNode
Incrementally updates whitening vectors for the input data using CCIPCA.

Candid-Covariance free Incremental Principal Component Analysis (CCIPCA) extracts the principal components from the input data incrementally.

v

Eigen vectors

d

Eigen values


Reference

More information about Candid-Covariance free Incremental Principal Component Analysis can be found in Weng J., Zhang Y. and Hwang W., Candid covariance-free incremental principal component analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, 1034–1040, 2003.

Full API documentation: CCIPCAWhiteningNode

class mdp.nodes.MCANode

Minor Component Analysis (MCA) extracts minor components (dual of principal components) from the input data incrementally.

v

Eigen vectors

d

Eigen values


Reference

More information about MCA can be found in Peng, D. and Yi, Z, A new algorithm for sequential minor component analysis, International Journal of Computational Intelligence Research, 2(2):207–215, 2006.

Full API documentation: MCANode

class mdp.nodes.IncSFANode

Incremental Slow Feature Analysis (IncSFA) extracts the slowly varying components from the input data incrementally.

sf

Slow feature vectors

wv

Whitening vectors

sf_change

Difference in slow features after update

Reference

More information about IncSFA can be found in Kompella V.R, Luciw M. and Schmidhuber J., Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams, Neural Computation, 2012.

Full API documentation: IncSFANode

class mdp.nodes.RecursiveExpansionNode

Recursively computable (orthogonal) expansions.

lower

The lower bound of the domain on which the recursion function is defined or orthogonal.

upper

The upper bound of the domain on which the recursion function is defined or orthogonal.

Full API documentation: RecursiveExpansionNode

class mdp.nodes.NormalizingRecursiveExpansionNode

Recursively computable (orthogonal) expansions and a trainable transformation to the domain of the expansions.

lower

The lower bound of the domain on which the recursion function is defined or orthogonal.

upper

The upper bound of the domain on which the recursion function is defined or orthogonal.

Full API documentation: NormalizingRecursiveExpansionNode

class mdp.nodes.Convolution2DNode

Convolve input data with filter banks.

Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.

Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the input_shape argument must be specified.

This node depends on scipy.

filters

Specifies a set of 2D filters that are convolved with the input data during execution.

Full API documentation: Convolution2DNode

class mdp.nodes.FunctionTransformerScikitsLearnNode

Constructs a transformer from an arbitrary callable. This node has been automatically generated by wrapping the sklearn.preprocessing._function_transformer.FunctionTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A FunctionTransformer forwards its X (and optionally y) arguments to a user-defined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc.

Note: If a lambda is used as the function, then the resulting transformer will not be pickleable.

New in version 0.17.

Read more in the User Guide.

Parameters

funccallable, optional default=None

The callable to use for the transformation. This will be passed the same arguments as transform, with args and kwargs forwarded. If func is None, then func will be the identity function.

inverse_funccallable, optional default=None

The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded. If inverse_func is None, then inverse_func will be the identity function.

validatebool, optional default=False

Indicate that the input X array should be checked before calling func. The possibilities are:

  • If False, there is no input validation.

  • If True, then X will be converted to a 2-dimensional NumPy array or sparse matrix. If the conversion is not possible an exception is raised.

Changed in version 0.22: The default of validate changed from True to False.

accept_sparseboolean, optional

Indicate that func accepts a sparse matrix as input. If validate is False, this has no effect. Otherwise, if accept_sparse is false, sparse matrix inputs will cause an exception to be raised.

check_inversebool, default=True

Whether to check that or func followed by inverse_func leads to the original inputs. It can be used for a sanity check, raising a warning when the condition is not fulfilled.

New in version 0.20.

kw_argsdict, optional

Dictionary of additional keyword arguments to pass to func.

New in version 0.18.

inv_kw_argsdict, optional

Dictionary of additional keyword arguments to pass to inverse_func.

New in version 0.18.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import FunctionTransformer
>>> transformer = FunctionTransformer(np.log1p)
>>> X = np.array([[0, 1], [2, 3]])
>>> transformer.transform(X)
array([[0.       , 0.6931...],
       [1.0986..., 1.3862...]])

Full API documentation: FunctionTransformerScikitsLearnNode

class mdp.nodes.BinarizerScikitsLearnNode

Binarize data (set feature values to 0 or 1) according to a threshold This node has been automatically generated by wrapping the sklearn.preprocessing._data.Binarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1.

Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurrences for instance.

It can also be used as a pre-processing step for estimators that consider boolean random variables (e.g. modelled using the Bernoulli distribution in a Bayesian setting).

Read more in the User Guide.

Parameters

thresholdfloat, optional (0.0 by default)

Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not be less than 0 for operations on sparse matrices.

copyboolean, optional, default True

set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Examples

>>> from sklearn.preprocessing import Binarizer
>>> X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]
>>> transformer = Binarizer().fit(X)  # fit does nothing.
>>> transformer
Binarizer()
>>> transformer.transform(X)
array([[1., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.]])

Notes

If the input is a sparse matrix, only the non-zero values are subject to update by the Binarizer class.

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

See also

binarize: Equivalent function without the estimator API.

Full API documentation: BinarizerScikitsLearnNode

class mdp.nodes.KernelCentererScikitsLearnNode

Center a kernel matrix This node has been automatically generated by wrapping the sklearn.preprocessing._data.KernelCenterer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Let K(x, z) be a kernel defined by phi(x)^T phi(z), where phi is a function mapping x to a Hilbert space. KernelCenterer centers (i.e., normalize to have zero mean) the data without explicitly computing phi(x). It is equivalent to centering phi(x) with sklearn.preprocessing.StandardScaler(with_std=False).

Read more in the User Guide.

Attributes

K_fit_rows_array, shape (n_samples,)

Average of each column of kernel matrix

K_fit_all_float

Average of kernel matrix

Examples

>>> from sklearn.preprocessing import KernelCenterer
>>> from sklearn.metrics.pairwise import pairwise_kernels
>>> X = [[ 1., -2.,  2.],
...      [ -2.,  1.,  3.],
...      [ 4.,  1., -2.]]
>>> K = pairwise_kernels(X, metric='linear')
>>> K
array([[  9.,   2.,  -2.],
       [  2.,  14., -13.],
       [ -2., -13.,  21.]])
>>> transformer = KernelCenterer().fit(K)
>>> transformer
KernelCenterer()
>>> transformer.transform(K)
array([[  5.,   0.,  -5.],
       [  0.,  14., -14.],
       [ -5., -14.,  19.]])

Full API documentation: KernelCentererScikitsLearnNode

class mdp.nodes.MinMaxScalerScikitsLearnNode

Transform features by scaling each feature to a given range. This node has been automatically generated by wrapping the sklearn.preprocessing._data.MinMaxScaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.

The transformation is given by:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min

where min, max = feature_range.

This transformation is often used as an alternative to zero mean, unit variance scaling.

Read more in the User Guide.

Parameters

feature_rangetuple (min, max), default=(0, 1)

Desired range of transformed data.

copybool, default=True

Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).

Attributes

min_ndarray of shape (n_features,)

Per feature adjustment for minimum. Equivalent to min - X.min(axis=0) * self.scale_

scale_ndarray of shape (n_features,)

Per feature relative scaling of the data. Equivalent to (max - min) / (X.max(axis=0) - X.min(axis=0))

New in version 0.17: scale_ attribute.

data_min_ndarray of shape (n_features,)

Per feature minimum seen in the data

New in version 0.17: data_min_

data_max_ndarray of shape (n_features,)

Per feature maximum seen in the data

New in version 0.17: data_max_

data_range_ndarray of shape (n_features,)

Per feature range (data_max_ - data_min_) seen in the data

New in version 0.17: data_range_

n_samples_seen_int

The number of samples processed by the estimator. It will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import MinMaxScaler
>>> data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
>>> scaler = MinMaxScaler()
>>> print(scaler.fit(data))
MinMaxScaler()
>>> print(scaler.data_max_)
[ 1. 18.]
>>> print(scaler.transform(data))
[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]]
>>> print(scaler.transform([[2, 2]]))
[[1.5 0. ]]

See also

minmax_scale: Equivalent function without the estimator API.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

Full API documentation: MinMaxScalerScikitsLearnNode

class mdp.nodes.MaxAbsScalerScikitsLearnNode

Scale each feature by its maximum absolute value. This node has been automatically generated by wrapping the sklearn.preprocessing._data.MaxAbsScaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.

This scaler can also be applied to sparse CSR or CSC matrices.

New in version 0.17.

Parameters

copyboolean, optional, default is True

Set to False to perform inplace scaling and avoid a copy (if the input is already a numpy array).

Attributes

scale_ndarray, shape (n_features,)

Per feature relative scaling of the data.

New in version 0.17: scale_ attribute.

max_abs_ndarray, shape (n_features,)

Per feature maximum absolute value.

n_samples_seen_int

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import MaxAbsScaler
>>> X = [[ 1., -1.,  2.],
...      [ 2.,  0.,  0.],
...      [ 0.,  1., -1.]]
>>> transformer = MaxAbsScaler().fit(X)
>>> transformer
MaxAbsScaler()
>>> transformer.transform(X)
array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])

See also

maxabs_scale: Equivalent function without the estimator API.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

Full API documentation: MaxAbsScalerScikitsLearnNode

class mdp.nodes.NormalizerScikitsLearnNode

Normalize samples individually to unit norm. This node has been automatically generated by wrapping the sklearn.preprocessing._data.Normalizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1, l2 or inf) equals one.

This transformer is able to work both with dense numpy arrays and scipy.sparse matrix (use CSR format if you want to avoid the burden of a copy / conversion).

Scaling inputs to unit norms is a common operation for text classification or clustering for instance. For instance the dot product of two l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is the base similarity metric for the Vector Space Model commonly used by the Information Retrieval community.

Read more in the User Guide.

Parameters

norm‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)

The norm to use to normalize each non zero sample. If norm=’max’ is used, values will be rescaled by the maximum of the absolute values.

copyboolean, optional, default True

set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).

Examples

>>> from sklearn.preprocessing import Normalizer
>>> X = [[4, 1, 2, 2],
...      [1, 3, 9, 3],
...      [5, 7, 5, 1]]
>>> transformer = Normalizer().fit(X)  # fit does nothing.
>>> transformer
Normalizer()
>>> transformer.transform(X)
array([[0.8, 0.2, 0.4, 0.4],
       [0.1, 0.3, 0.9, 0.3],
       [0.5, 0.7, 0.5, 0.1]])

Notes

This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

See also

normalize: Equivalent function without the estimator API.

Full API documentation: NormalizerScikitsLearnNode

class mdp.nodes.RobustScalerScikitsLearnNode

Scale features using statistics that are robust to outliers. This node has been automatically generated by wrapping the sklearn.preprocessing._data.RobustScaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the transform method.

Standardization of a dataset is a common requirement for many machine learning estimators. Typically this is done by removing the mean and scaling to unit variance. However, outliers can often influence the sample mean / variance in a negative way. In such cases, the median and the interquartile range often give better results.

New in version 0.17.

Read more in the User Guide.

Parameters

with_centeringboolean, True by default

If True, center the data before scaling. This will cause transform to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_scalingboolean, True by default

If True, scale the data to interquartile range.

quantile_rangetuple (q_min, q_max), 0.0 < q_min < q_max < 100.0

Default: (25.0, 75.0) = (1st quantile, 3rd quantile) = IQR Quantile range used to calculate scale_.

New in version 0.18.

copyboolean, optional, default is True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

Attributes

center_array of floats

The median value for each feature in the training set.

scale_array of floats

The (scaled) interquartile range for each feature in the training set.

New in version 0.17: scale_ attribute.

Examples

>>> from sklearn.preprocessing import RobustScaler
>>> X = [[ 1., -2.,  2.],
...      [ -2.,  1.,  3.],
...      [ 4.,  1., -2.]]
>>> transformer = RobustScaler().fit(X)
>>> transformer
RobustScaler()
>>> transformer.transform(X)
array([[ 0. , -2. ,  0. ],
       [-1. ,  0. ,  0.4],
       [ 1. ,  0. , -1.6]])

See also

robust_scale: Equivalent function without the estimator API.

sklearn.decomposition.PCA

Further removes the linear correlation across features with ‘whiten=True’.

Notes

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

https://en.wikipedia.org/wiki/Median https://en.wikipedia.org/wiki/Interquartile_range

Full API documentation: RobustScalerScikitsLearnNode

class mdp.nodes.StandardScalerScikitsLearnNode

Standardize features by removing the mean and scaling to unit variance This node has been automatically generated by wrapping the sklearn.preprocessing._data.StandardScaler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The standard score of a sample x is calculated as:

z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.

Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using transform().

Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).

For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.

Read more in the User Guide.

Parameters

copyboolean, optional, default True

If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

with_meanboolean, True by default

If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_stdboolean, True by default

If True, scale the data to unit variance (or equivalently, unit standard deviation).

Attributes

scale_ndarray or None, shape (n_features,)

Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to None when with_std=False.

New in version 0.17: scale_

mean_ndarray or None, shape (n_features,)

The mean value for each feature in the training set. Equal to None when with_mean=False.

var_ndarray or None, shape (n_features,)

The variance for each feature in the training set. Used to compute scale_. Equal to None when with_std=False.

n_samples_seen_int or array, shape (n_features,)

The number of samples processed by the estimator for each feature. If there are not missing samples, the n_samples_seen will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments across partial_fit calls.

Examples

>>> from sklearn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_)
[0.5 0.5]
>>> print(scaler.transform(data))
[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]
>>> print(scaler.transform([[2, 2]]))
[[3. 3.]]

See also

scale: Equivalent function without the estimator API.

sklearn.decomposition.PCA

Further removes the linear correlation across features with ‘whiten=True’.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

Full API documentation: StandardScalerScikitsLearnNode

class mdp.nodes.QuantileTransformerScikitsLearnNode

Transform features using quantiles information. This node has been automatically generated by wrapping the sklearn.preprocessing._data.QuantileTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.

The transformation is applied on each feature independently. First an estimate of the cumulative distribution function of a feature is used to map the original values to a uniform distribution. The obtained values are then mapped to the desired output distribution using the associated quantile function. Features values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the output distribution. Note that this transform is non-linear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.

Read more in the User Guide.

New in version 0.19.

Parameters

n_quantilesint, optional (default=1000 or n_samples)

Number of quantiles to be computed. It corresponds to the number of landmarks used to discretize the cumulative distribution function. If n_quantiles is larger than the number of samples, n_quantiles is set to the number of samples as a larger number of quantiles does not give a better approximation of the cumulative distribution function estimator.

output_distributionstr, optional (default=’uniform’)

Marginal distribution for the transformed data. The choices are ‘uniform’ (default) or ‘normal’.

ignore_implicit_zerosbool, optional (default=False)

Only applies to sparse matrices. If True, the sparse entries of the matrix are discarded to compute the quantile statistics. If False, these entries are treated as zeros.

subsampleint, optional (default=1e5)

Maximum number of samples used to estimate the quantiles for computational efficiency. Note that the subsampling procedure may differ for value-identical sparse and dense matrices.

random_stateint, RandomState instance or None, optional (default=None)

Determines random number generation for subsampling and smoothing noise. Please see subsample for more details. Pass an int for reproducible results across multiple function calls. See Glossary

copyboolean, optional, (default=True)

Set to False to perform inplace transformation and avoid a copy (if the input is already a numpy array).

Attributes

n_quantiles_integer

The actual number of quantiles used to discretize the cumulative distribution function.

quantiles_ndarray, shape (n_quantiles, n_features)

The values corresponding the quantiles of reference.

references_ndarray, shape(n_quantiles, )

Quantiles of references.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import QuantileTransformer
>>> rng = np.random.RandomState(0)
>>> X = np.sort(rng.normal(loc=0.5, scale=0.25, size=(25, 1)), axis=0)
>>> qt = QuantileTransformer(n_quantiles=10, random_state=0)
>>> qt.fit_transform(X)
array([...])

See also

quantile_transform : Equivalent function without the estimator API. PowerTransformer : Perform mapping to a normal distribution using a power

transform.

StandardScalerPerform standardization that is faster, but less robust

to outliers.

RobustScalerPerform robust standardization that removes the influence

of outliers but does not put outliers and inliers on the same scale.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

Full API documentation: QuantileTransformerScikitsLearnNode

class mdp.nodes.PowerTransformerScikitsLearnNode

Apply a power transform featurewise to make data more Gaussian-like. This node has been automatically generated by wrapping the sklearn.preprocessing._data.PowerTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired.

Currently, PowerTransformer supports the Box-Cox transform and the Yeo-Johnson transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both positive or negative data.

By default, zero-mean, unit-variance normalization is applied to the transformed data.

Read more in the User Guide.

New in version 0.20.

Parameters

methodstr, (default=’yeo-johnson’)

The power transform method. Available methods are:

  • ‘yeo-johnson’ [1]_, works with positive and negative values

  • ‘box-cox’ [2]_, only works with strictly positive values

standardizeboolean, default=True

Set to True to apply zero-mean, unit-variance normalization to the transformed output.

copyboolean, optional, default=True

Set to False to perform inplace computation during transformation.

Attributes

lambdas_array of float, shape (n_features,)

The parameters of the power transformation for the selected features.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import PowerTransformer
>>> pt = PowerTransformer()
>>> data = [[1, 2], [3, 2], [4, 5]]
>>> print(pt.fit(data))
PowerTransformer()
>>> print(pt.lambdas_)
[ 1.386... -3.100...]
>>> print(pt.transform(data))
[[-1.316... -0.707...]
 [ 0.209... -0.707...]
 [ 1.106...  1.414...]]

See also

power_transform : Equivalent function without the estimator API.

QuantileTransformerMaps data to a standard normal distribution with

the parameter output_distribution=’normal’.

Notes

NaNs are treated as missing values: disregarded in fit, and maintained in transform.

For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.

References

1

I.K. Yeo and R.A. Johnson, “A new family of power transformations to improve normality or symmetry.” Biometrika, 87(4), pp.954-959, (2000).

2

G.E.P. Box and D.R. Cox, “An Analysis of Transformations”, Journal of the Royal Statistical Society B, 26, 211-252 (1964).

Full API documentation: PowerTransformerScikitsLearnNode

class mdp.nodes.PolynomialFeaturesScikitsLearnNode

Generate polynomial and interaction features. This node has been automatically generated by wrapping the sklearn.preprocessing._data.PolynomialFeatures class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Generate a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

Parameters

degreeinteger

The degree of the polynomial features. Default = 2.

interaction_onlyboolean, default = False

If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).

include_biasboolean

If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).

orderstr in {‘C’, ‘F’}, default ‘C’

Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators.

New in version 0.21.

Examples

>>> import numpy as np
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(2)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])
>>> poly = PolynomialFeatures(interaction_only=True)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.],
       [ 1.,  2.,  3.,  6.],
       [ 1.,  4.,  5., 20.]])

Attributes

powers_array, shape (n_output_features, n_input_features)

powers_[i, j] is the exponent of the jth input in the ith output.

n_input_features_int

The total number of input features.

n_output_features_int

The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.

Notes

Be aware that the number of features in the output array scales polynomially in the number of features of the input array, and exponentially in the degree. High degrees can cause overfitting.

See examples/linear_model/plot_polynomial_interpolation.py

Full API documentation: PolynomialFeaturesScikitsLearnNode

class mdp.nodes.OneHotEncoderScikitsLearnNode

Encode categorical features as a one-hot numeric array. This node has been automatically generated by wrapping the sklearn.preprocessing._encoders.OneHotEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse parameter)

By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually.

This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels.

Note: a one-hot encoding of y labels should use a LabelBinarizer instead.

Read more in the User Guide.

Changed in version 0.20.

Parameters

categories‘auto’ or a list of array-like, default=’auto’

Categories (unique values) per feature:

  • ‘auto’ : Determine categories automatically from the training data.

  • list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values within a single feature, and should be sorted in case of numeric values.

The used categories can be found in the categories_ attribute.

New in version 0.20.

drop{‘first’, ‘if_binary’} or a array-like of shape (n_features,), default=None

Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into a neural network or an unregularized regression.

However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models.

  • None : retain all features (the default).

  • ‘first’ : drop the first category in each feature. If only one category is present, the feature will be dropped entirely.

  • ‘if_binary’ : drop the first category in each feature with two categories. Features with 1 or more than 2 categories are left intact.

  • array : drop[i] is the category in feature X[:, i] that should be dropped.

sparsebool, default=True

Will return sparse matrix if set True else will return an array.

dtypenumber type, default=np.float

Desired dtype of output.

handle_unknown{‘error’, ‘ignore’}, default=’error’

Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.

Attributes

categories_list of arrays

The categories of each feature determined during fitting (in order of the features in X and corresponding with the output of transform). This includes the category specified in drop (if any).

drop_idx_array of shape (n_features,)
  • drop_idx_[i] is the index in categories_[i] of the category to be dropped for each feature.

  • drop_idx_[i] = None if no category is to be dropped from the feature with index i, e.g. when drop=’if_binary’ and the feature isn’t binary.

  • drop_idx_ = None if all the transformed features will be retained.

See Also

sklearn.preprocessing.OrdinalEncoderPerforms an ordinal (integer)

encoding of the categorical features.

sklearn.feature_extraction.DictVectorizerPerforms a one-hot encoding of

dictionary items (also handles string-valued features).

sklearn.feature_extraction.FeatureHasherPerforms an approximate one-hot

encoding of dictionary items or strings.

sklearn.preprocessing.LabelBinarizerBinarizes labels in a one-vs-all

fashion.

sklearn.preprocessing.MultiLabelBinarizerTransforms between iterable of

iterables and a multilabel format, e.g. a (samples x classes) binary matrix indicating the presence of a class label.

Examples

Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary one-hot encoding.

>>> from sklearn.preprocessing import OneHotEncoder

One can discard categories not seen during fit:

>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.fit(X)
OneHotEncoder(handle_unknown='ignore')
>>> enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> enc.transform([['Female', 1], ['Male', 4]]).toarray()
array([[1., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.]])
>>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]])
array([['Male', 1],
       [None, 2]], dtype=object)
>>> enc.get_feature_names(['gender', 'group'])
array(['gender_Female', 'gender_Male', 'group_1', 'group_2', 'group_3'],
  dtype=object)

One can always drop the first column for each feature:

>>> drop_enc = OneHotEncoder(drop='first').fit(X)
>>> drop_enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> drop_enc.transform([['Female', 1], ['Male', 2]]).toarray()
array([[0., 0., 0.],
       [1., 1., 0.]])

Or drop a column for feature only having 2 categories:

>>> drop_binary_enc = OneHotEncoder(drop='if_binary').fit(X)
>>> drop_binary_enc.transform([['Female', 1], ['Male', 2]]).toarray()
array([[0., 1., 0., 0.],
       [1., 0., 1., 0.]])

Full API documentation: OneHotEncoderScikitsLearnNode

class mdp.nodes.OrdinalEncoderScikitsLearnNode

Encode categorical features as an integer array. This node has been automatically generated by wrapping the sklearn.preprocessing._encoders.OrdinalEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.

Read more in the User Guide.

New in version 0.20.

Parameters

categories‘auto’ or a list of array-like, default=’auto’

Categories (unique values) per feature:

  • ‘auto’ : Determine categories automatically from the training data.

  • list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values, and should be sorted in case of numeric values.

The used categories can be found in the categories_ attribute.

dtypenumber type, default np.float64

Desired dtype of output.

Attributes

categories_list of arrays

The categories of each feature determined during fitting (in order of the features in X and corresponding with the output of transform).

See Also

sklearn.preprocessing.OneHotEncoderPerforms a one-hot encoding of

categorical features.

sklearn.preprocessing.LabelEncoderEncodes target labels with values

between 0 and n_classes-1.

Examples

Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to an ordinal encoding.

>>> from sklearn.preprocessing import OrdinalEncoder
>>> enc = OrdinalEncoder()
>>> X = [['Male', 1], ['Female', 3], ['Female', 2]]
>>> enc.fit(X)
OrdinalEncoder()
>>> enc.categories_
[array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)]
>>> enc.transform([['Female', 3], ['Male', 1]])
array([[0., 2.],
       [1., 0.]])
>>> enc.inverse_transform([[1, 0], [0, 1]])
array([['Male', 1],
       ['Female', 2]], dtype=object)

Full API documentation: OrdinalEncoderScikitsLearnNode

class mdp.nodes.LabelBinarizerScikitsLearnNode

Binarize labels in a one-vs-all fashion This node has been automatically generated by wrapping the sklearn.preprocessing._label.LabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.

At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.

At prediction time, one assigns the class for which the corresponding model gave the greatest confidence. LabelBinarizer makes this easy with the inverse_transform method.

Read more in the User Guide.

Parameters

neg_labelint (default: 0)

Value with which negative labels must be encoded.

pos_labelint (default: 1)

Value with which positive labels must be encoded.

sparse_outputboolean (default: False)

True if the returned array from transform is desired to be in sparse CSR format.

Attributes

classes_array of shape [n_class]

Holds the label for each class.

y_type_str,

Represents the type of the target data as evaluated by utils.multiclass.type_of_target. Possible type are ‘continuous’, ‘continuous-multioutput’, ‘binary’, ‘multiclass’, ‘multiclass-multioutput’, ‘multilabel-indicator’, and ‘unknown’.

sparse_input_boolean,

True if the input data to transform is given as a sparse matrix, False otherwise.

Examples

>>> from sklearn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer()
>>> lb.classes_
array([1, 2, 4, 6])
>>> lb.transform([1, 6])
array([[1, 0, 0, 0],
       [0, 0, 0, 1]])

Binary targets transform to a column vector

>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes'])
array([[1],
       [0],
       [0],
       [1]])

Passing a 2D matrix for multilabel classification

>>> import numpy as np
>>> lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
LabelBinarizer()
>>> lb.classes_
array([0, 1, 2])
>>> lb.transform([0, 1, 2, 1])
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [0, 1, 0]])

See also

label_binarizefunction to perform the transform operation of

LabelBinarizer with fixed classes.

sklearn.preprocessing.OneHotEncoderencode categorical features

using a one-hot aka one-of-K scheme.

Full API documentation: LabelBinarizerScikitsLearnNode

class mdp.nodes.LabelEncoderScikitsLearnNode

Encode target labels with value between 0 and n_classes-1. This node has been automatically generated by wrapping the sklearn.preprocessing._label.LabelEncoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This transformer should be used to encode target values, i.e. y, and not the input X.

Read more in the User Guide.

New in version 0.12.

Attributes

classes_array of shape (n_class,)

Holds the label for each class.

Examples

LabelEncoder can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

See also

sklearn.preprocessing.OrdinalEncoderEncode categorical features

using an ordinal encoding scheme.

sklearn.preprocessing.OneHotEncoderEncode categorical features

as a one-hot numeric array.

Full API documentation: LabelEncoderScikitsLearnNode

class mdp.nodes.MultiLabelBinarizerScikitsLearnNode

Transform between iterable of iterables and a multilabel format This node has been automatically generated by wrapping the sklearn.preprocessing._label.MultiLabelBinarizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters

classesarray-like of shape [n_classes] (optional)

Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).

sparse_outputboolean (default: False),

Set to true if output binary array is desired in CSR sparse format

Attributes

classes_array of labels

A copy of the classes parameter where provided, or otherwise, the sorted set of classes found when fitting.

Examples

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
       [0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])
>>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}])
array([[0, 1, 1],
       [1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']

A common mistake is to pass in a list, which leads to the following issue:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit(['sci-fi', 'thriller', 'comedy'])
MultiLabelBinarizer()
>>> mlb.classes_
array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't',
    'y'], dtype=object)

To correct this, the list of labels should be passed in as:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit([['sci-fi', 'thriller', 'comedy']])
MultiLabelBinarizer()
>>> mlb.classes_
array(['comedy', 'sci-fi', 'thriller'], dtype=object)

See also

sklearn.preprocessing.OneHotEncoderencode categorical features

using a one-hot aka one-of-K scheme.

Full API documentation: MultiLabelBinarizerScikitsLearnNode

class mdp.nodes.KBinsDiscretizerScikitsLearnNode

Bin continuous data into intervals. This node has been automatically generated by wrapping the sklearn.preprocessing._discretization.KBinsDiscretizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

New in version 0.20.

Parameters

n_binsint or array-like, shape (n_features,) (default=5)

The number of bins to produce. Raises ValueError if n_bins < 2.

encode{‘onehot’, ‘onehot-dense’, ‘ordinal’}, (default=’onehot’)

Method used to encode the transformed result.

onehot

Encode the transformed result with one-hot encoding and return a sparse matrix. Ignored features are always stacked to the right.

onehot-dense

Encode the transformed result with one-hot encoding and return a dense array. Ignored features are always stacked to the right.

ordinal

Return the bin identifier encoded as an integer value.

strategy{‘uniform’, ‘quantile’, ‘kmeans’}, (default=’quantile’)

Strategy used to define the widths of the bins.

uniform

All bins in each feature have identical widths.

quantile

All bins in each feature have the same number of points.

kmeans

Values in each bin have the same nearest center of a 1D k-means cluster.

Attributes

n_bins_int array, shape (n_features,)

Number of bins per feature. Bins whose width are too small (i.e., <= 1e-8) are removed with a warning.

bin_edges_array of arrays, shape (n_features, )

The edges of each bin. Contain arrays of varying shapes (n_bins_, ) Ignored features will have empty arrays.

See Also

sklearn.preprocessing.BinarizerClass used to bin values as 0 or

1 based on a parameter threshold.

Notes

In bin edges for feature i, the first and last values are used only for inverse_transform. During transform, bin edges are extended to:

np.concatenate([-np.inf, bin_edges_[i][1:-1], np.inf])

You can combine KBinsDiscretizer with sklearn.compose.ColumnTransformer if you only want to preprocess part of the features.

KBinsDiscretizer might produce constant features (e.g., when encode = 'onehot' and certain bins do not contain any data). These features can be removed with feature selection algorithms (e.g., sklearn.feature_selection.VarianceThreshold).

Examples

>>> X = [[-2, 1, -4,   -1],
...      [-1, 2, -3, -0.5],
...      [ 0, 3, -2,  0.5],
...      [ 1, 4, -1,    2]]
>>> est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform')
>>> est.fit(X)
KBinsDiscretizer(...)
>>> Xt = est.transform(X)
>>> Xt  
array([[ 0., 0., 0., 0.],
       [ 1., 1., 1., 0.],
       [ 2., 2., 2., 1.],
       [ 2., 2., 2., 2.]])

Sometimes it may be useful to convert the data back into the original feature space. The inverse_transform function converts the binned data into the original feature space. Each value will be equal to the mean of the two bin edges.

>>> est.bin_edges_[0]
array([-2., -1.,  0.,  1.])
>>> est.inverse_transform(Xt)
array([[-1.5,  1.5, -3.5, -0.5],
       [-0.5,  2.5, -2.5, -0.5],
       [ 0.5,  3.5, -1.5,  0.5],
       [ 0.5,  3.5, -1.5,  1.5]])

Full API documentation: KBinsDiscretizerScikitsLearnNode

class mdp.nodes.IsotonicRegressionScikitsLearnNode

Isotonic regression model. This node has been automatically generated by wrapping the sklearn.isotonic.IsotonicRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

New in version 0.13.

Parameters

y_minfloat, default=None

Lower bound on the lowest predicted value (the minimum value may still be higher). If not set, defaults to -inf.

y_maxfloat, default=None

Upper bound on the highest predicted value (the maximum may still be lower). If not set, defaults to +inf.

increasingbool or ‘auto’, default=True

Determines whether the predictions should be constrained to increase or decrease with X. ‘auto’ will decide based on the Spearman correlation estimate’s sign.

out_of_boundsstr, default=”nan”

The out_of_bounds parameter handles how X values outside of the training domain are handled. When set to “nan”, predictions will be NaN. When set to “clip”, predictions will be set to the value corresponding to the nearest train interval endpoint. When set to “raise” a ValueError is raised.

Attributes

X_min_float

Minimum value of input array X_ for left bound.

X_max_float

Maximum value of input array X_ for right bound.

f_function

The stepwise interpolating function that covers the input domain X.

increasing_bool

Inferred value for increasing.

Notes

Ties are broken using the secondary method from Leeuw, 1977.

References

Isotonic Median Regression: A Linear Programming Approach Nilotpal Chakravarti Mathematics of Operations Research Vol. 14, No. 2 (May, 1989), pp. 303-308

Isotone Optimization in R : Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods Leeuw, Hornik, Mair Journal of Statistical Software 2009

Correctness of Kruskal’s algorithms for monotone regression with ties Leeuw, Psychometrica, 1977

Examples

>>> from sklearn.datasets import make_regression
>>> from sklearn.isotonic import IsotonicRegression
>>> X, y = make_regression(n_samples=10, n_features=1, random_state=41)
>>> iso_reg = IsotonicRegression().fit(X.flatten(), y)
>>> iso_reg.predict([.1, .2])
array([1.8628..., 3.7256...])

Full API documentation: IsotonicRegressionScikitsLearnNode

class mdp.nodes.GridSearchCVScikitsLearnNode

Exhaustive search over specified parameter values for an estimator. This node has been automatically generated by wrapping the sklearn.model_selection._search.GridSearchCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Important members are fit, predict.

GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

Read more in the User Guide.

Parameters

estimatorestimator object.

This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

param_griddict or list of dictionaries

Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

scoringstr, callable, list/tuple or dict, default=None

A single str (see scoring_parameter) or a callable (see scoring) to evaluate the predictions on the test set.

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.

NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.

See multimetric_grid_search for an example.

If None, the estimator’s score method is used.

n_jobsint, default=None

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Changed in version v0.20: n_jobs default changed from 1 to None

pre_dispatchint, or str, default=n_jobs

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

  • An int, giving the exact number of total jobs that are spawned

  • A str, giving an expression as a function of n_jobs, as in ‘2*n_jobs’

iidbool, default=False

If True, return the average score across folds, weighted by the number of samples in each test set. In this case, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.

Deprecated since version 0.22: Parameter iid is deprecated in 0.22 and will be removed in 0.24

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • integer, to specify the number of folds in a (Stratified)KFold,

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

refitbool, str, or callable, default=True

Refit an estimator using the best found parameters on the whole dataset.

For multiple metric evaluation, this needs to be a str denoting the scorer that would be used to find the best parameters for refitting the estimator at the end.

Where there are considerations other than maximum score in choosing a best estimator, refit can be set to a function which returns the selected best_index_ given cv_results_. In that case, the best_estimator_ and best_params_ will be set according to the returned best_index_ while the best_score_ attribute will not be available.

The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance.

Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer.

See scoring parameter to know more about multiple metric evaluation.

Changed in version 0.20: Support for callable added.

verboseinteger

Controls the verbosity: the higher, the more messages.

error_score‘raise’ or numeric, default=np.nan

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

return_train_scorebool, default=False

If False, the cv_results_ attribute will not include training scores. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

New in version 0.19.

Changed in version 0.21: Default value was changed from True to False

Examples

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import GridSearchCV
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svc = svm.SVC()
>>> clf = GridSearchCV(svc, parameters)
>>> clf.fit(iris.data, iris.target)
GridSearchCV(estimator=SVC(),
             param_grid={'C': [1, 10], 'kernel': ('linear', 'rbf')})
>>> sorted(clf.cv_results_.keys())
['mean_fit_time', 'mean_score_time', 'mean_test_score',...
 'param_C', 'param_kernel', 'params',...
 'rank_test_score', 'split0_test_score',...
 'split2_test_score', ...
 'std_fit_time', 'std_score_time', 'std_test_score']

Attributes

cv_results_dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

For instance the below given table

param_kernel

param_gamma

param_degree

split0_test_score

rank_t…

‘poly’

2

0.80

2

‘poly’

3

0.70

4

‘rbf’

0.1

0.80

3

‘rbf’

0.2

0.93

1

will be represented by a cv_results_ dict of:

{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
                             mask = [False False False False]...)
'param_gamma': masked_array(data = [-- -- 0.1 0.2],
                            mask = [ True  True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --],
                             mask = [False False  True  True]...),
'split0_test_score'  : [0.80, 0.70, 0.80, 0.93],
'split1_test_score'  : [0.82, 0.50, 0.70, 0.78],
'mean_test_score'    : [0.81, 0.60, 0.75, 0.85],
'std_test_score'     : [0.01, 0.10, 0.05, 0.08],
'rank_test_score'    : [2, 4, 3, 1],
'split0_train_score' : [0.80, 0.92, 0.70, 0.93],
'split1_train_score' : [0.82, 0.55, 0.70, 0.87],
'mean_train_score'   : [0.81, 0.74, 0.70, 0.90],
'std_train_score'    : [0.01, 0.19, 0.00, 0.03],
'mean_fit_time'      : [0.73, 0.63, 0.43, 0.49],
'std_fit_time'       : [0.01, 0.02, 0.01, 0.01],
'mean_score_time'    : [0.01, 0.06, 0.04, 0.04],
'std_score_time'     : [0.00, 0.00, 0.00, 0.01],
'params'             : [{'kernel': 'poly', 'degree': 2}, ...],
}

NOTE

The key 'params' is used to store a list of parameter settings dicts for all the parameter candidates.

The mean_fit_time, std_fit_time, mean_score_time and std_score_time are all in seconds.

For multi-metric evaluation, the scores for all the scorers are available in the cv_results_ dict at the keys ending with that scorer’s name ('_<scorer_name>') instead of '_score' shown above. (‘split0_test_precision’, ‘mean_train_precision’ etc.)

best_estimator_estimator

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.

See refit parameter for more information on allowed values.

best_score_float

Mean cross-validated score of the best_estimator

For multi-metric evaluation, this is present only if refit is specified.

This attribute is not available if refit is a function.

best_params_dict

Parameter setting that gave the best results on the hold out data.

For multi-metric evaluation, this is present only if refit is specified.

best_index_int

The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.

The dict at search.cv_results_['params'][search.best_index_] gives the parameter setting for the best model, that gives the highest mean score (search.best_score_).

For multi-metric evaluation, this is present only if refit is specified.

scorer_function or a dict

Scorer function used on the held out data to choose the best parameters for the model.

For multi-metric evaluation, this attribute holds the validated scoring dict which maps the scorer key to the scorer callable.

n_splits_int

The number of cross-validation splits (folds/iterations).

refit_time_float

Seconds used for refitting the best model on the whole dataset.

This is present only if refit is not False.

New in version 0.20.

Notes

The parameters selected are those that maximize the score of the left out data, unless an explicit score is passed in which case it is used instead.

If n_jobs was set to a value higher than one, the data is copied for each point in the grid (and not n_jobs times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. A workaround in this case is to set pre_dispatch. Then, the memory is copied only pre_dispatch many times. A reasonable value for pre_dispatch is 2 * n_jobs.

See Also

ParameterGrid:

  • generates all the combinations of a hyperparameter grid.

sklearn.model_selection.train_test_split():

  • utility function to split the data into a development set usable

  • for fitting a GridSearchCV instance and an evaluation set for

  • its final evaluation.

sklearn.metrics.make_scorer():

  • Make a scorer from a performance metric or loss function.

Full API documentation: GridSearchCVScikitsLearnNode

class mdp.nodes.RandomizedSearchCVScikitsLearnNode

Randomized search on hyper parameters. This node has been automatically generated by wrapping the sklearn.model_selection._search.RandomizedSearchCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. RandomizedSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated search over parameter settings.

In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.

If all parameters are presented as a list, sampling without replacement is performed. If at least one parameter is given as a distribution, sampling with replacement is used. It is highly recommended to use continuous distributions for continuous parameters.

Read more in the User Guide.

New in version 0.14.

Parameters

estimatorestimator object.

A object of that type is instantiated for each grid point. This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

param_distributionsdict or list of dicts

Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above.

n_iterint, default=10

Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

scoringstr, callable, list/tuple or dict, default=None

A single str (see scoring_parameter) or a callable (see scoring) to evaluate the predictions on the test set.

For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.

NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.

See multimetric_grid_search for an example.

If None, the estimator’s score method is used.

n_jobsint, default=None

Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Changed in version v0.20: n_jobs default changed from 1 to None

pre_dispatchint, or str, default=None

Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:

  • None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs

  • An int, giving the exact number of total jobs that are spawned

  • A str, giving an expression as a function of n_jobs, as in ‘2*n_jobs’

iidbool, default=False

If True, return the average score across folds, weighted by the number of samples in each test set. In this case, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.

Deprecated since version 0.22: Parameter iid is deprecated in 0.22 and will be removed in 0.24

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • integer, to specify the number of folds in a (Stratified)KFold,

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

refitbool, str, or callable, default=True

Refit an estimator using the best found parameters on the whole dataset.

For multiple metric evaluation, this needs to be a str denoting the scorer that would be used to find the best parameters for refitting the estimator at the end.

Where there are considerations other than maximum score in choosing a best estimator, refit can be set to a function which returns the selected best_index_ given the cv_results. In that case, the best_estimator_ and best_params_ will be set according to the returned best_index_ while the best_score_ attribute will not be available.

The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this RandomizedSearchCV instance.

Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer.

See scoring parameter to know more about multiple metric evaluation.

Changed in version 0.20: Support for callable added.

verboseinteger

Controls the verbosity: the higher, the more messages.

random_stateint or RandomState instance, default=None

Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. Pass an int for reproducible output across multiple function calls. See Glossary.

error_score‘raise’ or numeric, default=np.nan

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

return_train_scorebool, default=False

If False, the cv_results_ attribute will not include training scores. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

New in version 0.19.

Changed in version 0.21: Default value was changed from True to False

Attributes

cv_results_dict of numpy (masked) ndarrays

A dict with keys as column headers and values as columns, that can be imported into a pandas DataFrame.

For instance the below given table

param_kernel

param_gamma

split0_test_score

rank_test_score

‘rbf’

0.1

0.80

2

‘rbf’

0.2

0.90

1

‘rbf’

0.3

0.70

1

will be represented by a cv_results_ dict of:

{
'param_kernel' : masked_array(data = ['rbf', 'rbf', 'rbf'],
                              mask = False),
'param_gamma'  : masked_array(data = [0.1 0.2 0.3], mask = False),
'split0_test_score'  : [0.80, 0.90, 0.70],
'split1_test_score'  : [0.82, 0.50, 0.70],
'mean_test_score'    : [0.81, 0.70, 0.70],
'std_test_score'     : [0.01, 0.20, 0.00],
'rank_test_score'    : [3, 1, 1],
'split0_train_score' : [0.80, 0.92, 0.70],
'split1_train_score' : [0.82, 0.55, 0.70],
'mean_train_score'   : [0.81, 0.74, 0.70],
'std_train_score'    : [0.01, 0.19, 0.00],
'mean_fit_time'      : [0.73, 0.63, 0.43],
'std_fit_time'       : [0.01, 0.02, 0.01],
'mean_score_time'    : [0.01, 0.06, 0.04],
'std_score_time'     : [0.00, 0.00, 0.00],
'params'             : [{'kernel' : 'rbf', 'gamma' : 0.1}, ...],
}

NOTE

The key 'params' is used to store a list of parameter settings dicts for all the parameter candidates.

The mean_fit_time, std_fit_time, mean_score_time and std_score_time are all in seconds.

For multi-metric evaluation, the scores for all the scorers are available in the cv_results_ dict at the keys ending with that scorer’s name ('_<scorer_name>') instead of '_score' shown above. (‘split0_test_precision’, ‘mean_train_precision’ etc.)

best_estimator_estimator

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if refit=False.

For multi-metric evaluation, this attribute is present only if refit is specified.

See refit parameter for more information on allowed values.

best_score_float

Mean cross-validated score of the best_estimator.

For multi-metric evaluation, this is not available if refit is False. See refit parameter for more information.

This attribute is not available if refit is a function.

best_params_dict

Parameter setting that gave the best results on the hold out data.

For multi-metric evaluation, this is not available if refit is False. See refit parameter for more information.

best_index_int

The index (of the cv_results_ arrays) which corresponds to the best candidate parameter setting.

The dict at search.cv_results_['params'][search.best_index_] gives the parameter setting for the best model, that gives the highest mean score (search.best_score_).

For multi-metric evaluation, this is not available if refit is False. See refit parameter for more information.

scorer_function or a dict

Scorer function used on the held out data to choose the best parameters for the model.

For multi-metric evaluation, this attribute holds the validated scoring dict which maps the scorer key to the scorer callable.

n_splits_int

The number of cross-validation splits (folds/iterations).

refit_time_float

Seconds used for refitting the best model on the whole dataset.

This is present only if refit is not False.

New in version 0.20.

Notes

The parameters selected are those that maximize the score of the held-out data, according to the scoring parameter.

If n_jobs was set to a value higher than one, the data is copied for each parameter setting(and not n_jobs times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. A workaround in this case is to set pre_dispatch. Then, the memory is copied only pre_dispatch many times. A reasonable value for pre_dispatch is 2 * n_jobs.

See Also

GridSearchCV:

  • Does exhaustive search over a grid of parameters.

ParameterSampler:

  • A generator over parameter settings, constructed from

  • param_distributions.

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import RandomizedSearchCV
>>> from scipy.stats import uniform
>>> iris = load_iris()
>>> logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,
...                               random_state=0)
>>> distributions = dict(C=uniform(loc=0, scale=4),
...                      penalty=['l2', 'l1'])
>>> clf = RandomizedSearchCV(logistic, distributions, random_state=0)
>>> search = clf.fit(iris.data, iris.target)
>>> search.best_params_
{'C': 2..., 'penalty': 'l1'}

Full API documentation: RandomizedSearchCVScikitsLearnNode

class mdp.nodes.LinearRegressionScikitsLearnNode

Ordinary least squares Linear Regression. This node has been automatically generated by wrapping the sklearn.linear_model._base.LinearRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Parameters

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

n_jobsint, default=None

The number of jobs to use for the computation. This will only provide speedup for n_targets > 1 and sufficient large problems. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

coef_array of shape (n_features, ) or (n_targets, n_features)

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

rank_int

Rank of matrix X. Only available when X is dense.

singular_array of shape (min(X, y),)

Singular values of X. Only available when X is dense.

intercept_float or array of shape (n_targets,)

Independent term in the linear model. Set to 0.0 if fit_intercept = False.

See Also

sklearn.linear_model.RidgeRidge regression addresses some of the

problems of Ordinary Least Squares by imposing a penalty on the size of the coefficients with l2 regularization.

sklearn.linear_model.LassoThe Lasso is a linear model that estimates

sparse coefficients with l1 regularization.

sklearn.linear_model.ElasticNetElastic-Net is a linear regression

model trained with both l1 and l2 -norm regularization of the coefficients.

Notes

From the implementation point of view, this is just plain Ordinary Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])

Full API documentation: LinearRegressionScikitsLearnNode

class mdp.nodes.BayesianRidgeScikitsLearnNode

Bayesian ridge regression. This node has been automatically generated by wrapping the sklearn.linear_model._bayes.BayesianRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Fit a Bayesian ridge model. See the Notes section for details on this implementation and the optimization of the regularization parameters lambda (precision of the weights) and alpha (precision of the noise).

Read more in the User Guide.

Parameters

n_iterint, default=300

Maximum number of iterations. Should be greater than or equal to 1.

tolfloat, default=1e-3

Stop the algorithm if w has converged.

alpha_1float, default=1e-6

Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter.

alpha_2float, default=1e-6

Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter.

lambda_1float, default=1e-6

Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter.

lambda_2float, default=1e-6

Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter.

alpha_initfloat, default=None

Initial value for alpha (precision of the noise). If not set, alpha_init is 1/Var(y).

New in version 0.22.

lambda_initfloat, default=None

Initial value for lambda (precision of the weights). If not set, lambda_init is 1.

New in version 0.22.

compute_scorebool, default=False

If True, compute the log marginal likelihood at each iteration of the optimization.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. The intercept is not treated as a probabilistic parameter and thus has no associated variance. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool, default=False

Verbose mode when fitting the model.

Attributes

coef_array-like of shape (n_features,)

Coefficients of the regression model (mean of distribution)

intercept_float

Independent term in decision function. Set to 0.0 if fit_intercept = False.

alpha_float

Estimated precision of the noise.

lambda_float

Estimated precision of the weights.

sigma_array-like of shape (n_features, n_features)

Estimated variance-covariance matrix of the weights

scores_array-like of shape (n_iter_+1,)

If computed_score is True, value of the log marginal likelihood (to be maximized) at each iteration of the optimization. The array starts with the value of the log marginal likelihood obtained for the initial values of alpha and lambda and ends with the value obtained for the estimated alpha and lambda.

n_iter_int

The actual number of iterations to reach the stopping criterion.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.BayesianRidge()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
BayesianRidge()
>>> clf.predict([[1, 1]])
array([1.])

Notes

There exist several strategies to perform Bayesian ridge regression. This implementation is based on the algorithm described in Appendix A of (Tipping, 2001) where updates of the regularization parameters are done as suggested in (MacKay, 1992). Note that according to A New View of Automatic Relevance Determination (Wipf and Nagarajan, 2008) these update rules do not guarantee that the marginal likelihood is increasing between two consecutive iterations of the optimization.

References

D. J. C. MacKay, Bayesian Interpolation, Computation and Neural Systems, Vol. 4, No. 3, 1992.

M. E. Tipping, Sparse Bayesian Learning and the Relevance Vector Machine, Journal of Machine Learning Research, Vol. 1, 2001.

Full API documentation: BayesianRidgeScikitsLearnNode

class mdp.nodes.ARDRegressionScikitsLearnNode

Bayesian ARD regression. This node has been automatically generated by wrapping the sklearn.linear_model._bayes.ARDRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)

Read more in the User Guide.

Parameters

n_iterint, default=300

Maximum number of iterations.

tolfloat, default=1e-3

Stop the algorithm if w has converged.

alpha_1float, default=1e-6

Hyper-parameter : shape parameter for the Gamma distribution prior over the alpha parameter.

alpha_2float, default=1e-6

Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter.

lambda_1float, default=1e-6

Hyper-parameter : shape parameter for the Gamma distribution prior over the lambda parameter.

lambda_2float, default=1e-6

Hyper-parameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter.

compute_scorebool, default=False

If True, compute the objective function at each step of the model.

threshold_lambdafloat, default=10 000

threshold for removing (pruning) weights with high precision from the computation.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool, default=False

Verbose mode when fitting the model.

Attributes

coef_array-like of shape (n_features,)

Coefficients of the regression model (mean of distribution)

alpha_float

estimated precision of the noise.

lambda_array-like of shape (n_features,)

estimated precisions of the weights.

sigma_array-like of shape (n_features, n_features)

estimated variance-covariance matrix of the weights

scores_float

if computed, value of the objective function (to be maximized)

intercept_float

Independent term in decision function. Set to 0.0 if fit_intercept = False.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.ARDRegression()
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
ARDRegression()
>>> clf.predict([[1, 1]])
array([1.])

Notes

For an example, see examples/linear_model/plot_ard.py.

References

D. J. C. MacKay, Bayesian nonlinear modeling for the prediction competition, ASHRAE Transactions, 1994.

R. Salakhutdinov, Lecture notes on Statistical Machine Learning, http://www.utstat.toronto.edu/~rsalakhu/sta4273/notes/Lecture2.pdf#page=15 Their beta is our self.alpha_ Their alpha is our self.lambda_ ARD is a little different than the slide: only dimensions/features for which self.lambda_ < self.threshold_lambda are kept and the rest are discarded.

Full API documentation: ARDRegressionScikitsLearnNode

class mdp.nodes.LarsScikitsLearnNode

Least Angle Regression model a.k.a. LAR This node has been automatically generated by wrapping the sklearn.linear_model._least_angle.Lars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

verbosebool or int, default=False

Sets the verbosity amount

normalizebool, default=True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool, ‘auto’ or array-like , default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

n_nonzero_coefsint, default=500

Target number of non-zero coefficients. Use np.inf for no limit.

epsfloat, optional

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the tol parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization. By default, np.finfo(np.float).eps is used.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

fit_pathbool, default=True

If True the full path is stored in the coef_path_ attribute. If you compute the solution for a large problem or many targets, setting fit_path to False will lead to a speedup, especially with a small alpha.

jitterfloat, default=None

Upper bound on a uniform noise parameter to be added to the y values, to satisfy the model’s assumption of one-at-a-time computations. Might help with stability.

random_stateint, RandomState instance or None (default)

Determines random number generation for jittering. Pass an int for reproducible output across multiple function calls. See Glossary. Ignored if jitter is None.

Attributes

alphas_array-like of shape (n_alphas + 1,) | list of n_targets such arrays

Maximum of covariances (in absolute value) at each iteration. n_alphas is either n_nonzero_coefs or n_features, whichever is smaller.

active_list, length = n_alphas | list of n_targets such lists

Indices of active variables at the end of the path.

coef_path_array-like of shape (n_features, n_alphas + 1) | list of n_targets such arrays

The varying values of the coefficients along the path. It is not present if the fit_path parameter is False.

coef_array-like of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the formulation formula).

intercept_float or array-like of shape (n_targets,)

Independent term in decision function.

n_iter_array-like or int

The number of iterations taken by lars_path to find the grid of alphas for each target.

Examples

>>> from sklearn import linear_model
>>> reg = linear_model.Lars(n_nonzero_coefs=1)
>>> reg.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
Lars(n_nonzero_coefs=1)
>>> print(reg.coef_)
[ 0. -1.11...]

See also

lars_path, LarsCV sklearn.decomposition.sparse_encode

Full API documentation: LarsScikitsLearnNode

class mdp.nodes.LassoLarsScikitsLearnNode

Lasso model fit with Least Angle Regression a.k.a. Lars This node has been automatically generated by wrapping the sklearn.linear_model._least_angle.LassoLars class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. It is a Linear Model trained with an L1 prior as regularizer.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Constant that multiplies the penalty term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by LinearRegression. For numerical reasons, using alpha = 0 with the LassoLars object is not advised and you should prefer the LinearRegression object.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

verbosebool or int, default=False

Sets the verbosity amount

normalizebool, default=True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool, ‘auto’ or array-like, default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

max_iterint, default=500

Maximum number of iterations to perform.

epsfloat, optional

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the tol parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization. By default, np.finfo(np.float).eps is used.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

fit_pathbool, default=True

If True the full path is stored in the coef_path_ attribute. If you compute the solution for a large problem or many targets, setting fit_path to False will lead to a speedup, especially with a small alpha.

positivebool, default=False

Restrict coefficients to be >= 0. Be aware that you might want to remove fit_intercept which is set True by default. Under the positive restriction the model coefficients will not converge to the ordinary-least-squares solution for small values of alpha. Only coefficients up to the smallest alpha value (alphas_[alphas_ > 0.].min() when fit_path=True) reached by the stepwise Lars-Lasso algorithm are typically in congruence with the solution of the coordinate descent Lasso estimator.

jitterfloat, default=None

Upper bound on a uniform noise parameter to be added to the y values, to satisfy the model’s assumption of one-at-a-time computations. Might help with stability.

random_stateint, RandomState instance or None (default)

Determines random number generation for jittering. Pass an int for reproducible output across multiple function calls. See Glossary. Ignored if jitter is None.

Attributes

alphas_array-like of shape (n_alphas + 1,) | list of n_targets such arrays

Maximum of covariances (in absolute value) at each iteration. n_alphas is either max_iter, n_features, or the number of nodes in the path with correlation greater than alpha, whichever is smaller.

active_list, length = n_alphas | list of n_targets such lists

Indices of active variables at the end of the path.

coef_path_array-like of shape (n_features, n_alphas + 1) or list

If a list is passed it’s expected to be one of n_targets such arrays. The varying values of the coefficients along the path. It is not present if the fit_path parameter is False.

coef_array-like of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the formulation formula).

intercept_float or array-like of shape (n_targets,)

Independent term in decision function.

n_iter_array-like or int.

The number of iterations taken by lars_path to find the grid of alphas for each target.

Examples

>>> from sklearn import linear_model
>>> reg = linear_model.LassoLars(alpha=0.01)
>>> reg.fit([[-1, 1], [0, 0], [1, 1]], [-1, 0, -1])
LassoLars(alpha=0.01)
>>> print(reg.coef_)
[ 0.         -0.963257...]

See also

lars_path lasso_path Lasso LassoCV LassoLarsCV LassoLarsIC sklearn.decomposition.sparse_encode

Full API documentation: LassoLarsScikitsLearnNode

class mdp.nodes.LarsCVScikitsLearnNode

Cross-validated Least Angle Regression model. This node has been automatically generated by wrapping the sklearn.linear_model._least_angle.LarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

Read more in the User Guide.

Parameters

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

verbosebool or int, default=False

Sets the verbosity amount

max_iterint, default=500

Maximum number of iterations to perform.

normalizebool, default=True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool, ‘auto’ or array-like , default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix cannot be passed as argument since we will use only subsets of X.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

max_n_alphasint, default=1000

The maximum number of points on the path used to compute the residuals in the cross-validation

n_jobsint or None, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

epsfloat, optional

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. By default, np.finfo(np.float).eps is used.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

Attributes

coef_array-like of shape (n_features,)

parameter vector (w in the formulation formula)

intercept_float

independent term in decision function

coef_path_array-like of shape (n_features, n_alphas)

the varying values of the coefficients along the path

alpha_float

the estimated regularization parameter alpha

alphas_array-like of shape (n_alphas,)

the different values of alpha along the path

cv_alphas_array-like of shape (n_cv_alphas,)

all the values of alpha along the path for the different folds

mse_path_array-like of shape (n_folds, n_cv_alphas)

the mean square error on left-out for each fold along the path (alpha values given by cv_alphas)

n_iter_array-like or int

the number of iterations run by Lars with the optimal alpha.

Examples

>>> from sklearn.linear_model import LarsCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=200, noise=4.0, random_state=0)
>>> reg = LarsCV(cv=5).fit(X, y)
>>> reg.score(X, y)
0.9996...
>>> reg.alpha_
0.0254...
>>> reg.predict(X[:1,])
array([154.0842...])

See also

lars_path, LassoLars, LassoLarsCV

Full API documentation: LarsCVScikitsLearnNode

class mdp.nodes.LassoLarsCVScikitsLearnNode

Cross-validated Lasso, using the LARS algorithm. This node has been automatically generated by wrapping the sklearn.linear_model._least_angle.LassoLarsCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Read more in the User Guide.

Parameters

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

verbosebool or int, default=False

Sets the verbosity amount

max_iterint, default=500

Maximum number of iterations to perform.

normalizebool, default=True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool or ‘auto’ , default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix cannot be passed as argument since we will use only subsets of X.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

max_n_alphasint, default=1000

The maximum number of points on the path used to compute the residuals in the cross-validation

n_jobsint or None, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

epsfloat, optional

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. By default, np.finfo(np.float).eps is used.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

Restrict coefficients to be >= 0. Be aware that you might want to remove fit_intercept which is set True by default. Under the positive restriction the model coefficients do not converge to the ordinary-least-squares solution for small values of alpha. Only coefficients up to the smallest alpha value (alphas_[alphas_ > 0.].min() when fit_path=True) reached by the stepwise Lars-Lasso algorithm are typically in congruence with the solution of the coordinate descent Lasso estimator. As a consequence using LassoLarsCV only makes sense for problems where a sparse solution is expected and/or reached.

Attributes

coef_array-like of shape (n_features,)

parameter vector (w in the formulation formula)

intercept_float

independent term in decision function.

coef_path_array-like of shape (n_features, n_alphas)

the varying values of the coefficients along the path

alpha_float

the estimated regularization parameter alpha

alphas_array-like of shape (n_alphas,)

the different values of alpha along the path

cv_alphas_array-like of shape (n_cv_alphas,)

all the values of alpha along the path for the different folds

mse_path_array-like of shape (n_folds, n_cv_alphas)

the mean square error on left-out for each fold along the path (alpha values given by cv_alphas)

n_iter_array-like or int

the number of iterations run by Lars with the optimal alpha.

Examples

>>> from sklearn.linear_model import LassoLarsCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(noise=4.0, random_state=0)
>>> reg = LassoLarsCV(cv=5).fit(X, y)
>>> reg.score(X, y)
0.9992...
>>> reg.alpha_
0.0484...
>>> reg.predict(X[:1,])
array([-77.8723...])

Notes

The object solves the same problem as the LassoCV object. However, unlike the LassoCV, it find the relevant alphas values by itself. In general, because of this property, it will be more stable. However, it is more fragile to heavily multicollinear datasets.

It is more efficient than the LassoCV if only a small number of features are selected compared to the total number, for instance if there are very few samples compared to the number of features.

See also

lars_path, LassoLars, LarsCV, LassoCV

Full API documentation: LassoLarsCVScikitsLearnNode

class mdp.nodes.LassoLarsICScikitsLearnNode

Lasso model fit with Lars using BIC or AIC for model selection This node has been automatically generated by wrapping the sklearn.linear_model._least_angle.LassoLarsIC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.

Read more in the User Guide.

Parameters

criterion{‘bic’ , ‘aic’}, default=’aic’

The type of criterion to use.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

verbosebool or int, default=False

Sets the verbosity amount

normalizebool, default=True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool, ‘auto’ or array-like, default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

max_iterint, default=500

Maximum number of iterations to perform. Can be used for early stopping.

epsfloat, optional

The machine-precision regularization in the computation of the Cholesky diagonal factors. Increase this for very ill-conditioned systems. Unlike the tol parameter in some iterative optimization-based algorithms, this parameter does not control the tolerance of the optimization. By default, np.finfo(np.float).eps is used

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

positivebool, default=False

Restrict coefficients to be >= 0. Be aware that you might want to remove fit_intercept which is set True by default. Under the positive restriction the model coefficients do not converge to the ordinary-least-squares solution for small values of alpha. Only coefficients up to the smallest alpha value (alphas_[alphas_ > 0.].min() when fit_path=True) reached by the stepwise Lars-Lasso algorithm are typically in congruence with the solution of the coordinate descent Lasso estimator. As a consequence using LassoLarsIC only makes sense for problems where a sparse solution is expected and/or reached.

Attributes

coef_array-like of shape (n_features,)

parameter vector (w in the formulation formula)

intercept_float

independent term in decision function.

alpha_float

the alpha parameter chosen by the information criterion

n_iter_int

number of iterations run by lars_path to find the grid of alphas.

criterion_array-like of shape (n_alphas,)

The value of the information criteria (‘aic’, ‘bic’) across all alphas. The alpha which has the smallest information criterion is chosen. This value is larger by a factor of n_samples compared to Eqns. 2.15 and 2.16 in (Zou et al, 2007).

Examples

>>> from sklearn import linear_model
>>> reg = linear_model.LassoLarsIC(criterion='bic')
>>> reg.fit([[-1, 1], [0, 0], [1, 1]], [-1.1111, 0, -1.1111])
LassoLarsIC(criterion='bic')
>>> print(reg.coef_)
[ 0.  -1.11...]

Notes

The estimation of the number of degrees of freedom is given by:

“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.

https://en.wikipedia.org/wiki/Akaike_information_criterion https://en.wikipedia.org/wiki/Bayesian_information_criterion

See also

lars_path, LassoLars, LassoLarsCV

Full API documentation: LassoLarsICScikitsLearnNode

class mdp.nodes.LassoScikitsLearnNode

Linear Model trained with L1 prior as regularizer (aka the Lasso) This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.Lasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Technically the Lasso model is optimizing the same objective function as the Elastic Net with l1_ratio=1.0 (no L2 penalty).

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Constant that multiplies the L1 term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precompute‘auto’, bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positivebool, default=False

When set to True, forces the coefficients to be positive.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

coef_ndarray of shape (n_features,) or (n_targets, n_features)

parameter vector (w in the cost function formula)

sparse_coef_sparse matrix of shape (n_features, 1) or (n_targets, n_features)

sparse_coef_ is a readonly property derived from coef_

intercept_float or ndarray of shape (n_targets,)

independent term in decision function.

n_iter_int or list of int

number of iterations run by the coordinate descent solver to reach the specified tolerance.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1)
>>> print(clf.coef_)
[0.85 0.  ]
>>> print(clf.intercept_)
0.15...

See also

lars_path lasso_path LassoLars LassoCV LassoLarsCV sklearn.decomposition.sparse_encode

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

Full API documentation: LassoScikitsLearnNode

class mdp.nodes.ElasticNetScikitsLearnNode

Linear regression with combined L1 and L2 priors as regularizer. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.ElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Minimizes the objective function:

1 / (2 * n_samples) * ||y - Xw||^2_2
+ alpha * l1_ratio * ||w||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a * L1 + b * L2

where:

alpha = a + b and l1_ratio = a / (a + b)

The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. Specifically, l1_ratio = 1 is the lasso penalty. Currently, l1_ratio <= 0.01 is not reliable, unless you supply your own sequence of alpha.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Constant that multiplies the penalty terms. Defaults to 1.0. See the notes for the exact mathematical meaning of this parameter. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

l1_ratiofloat, default=0.5

The ElasticNet mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

fit_interceptbool, default=True

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precomputebool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

max_iterint, default=1000

The maximum number of iterations

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positivebool, default=False

When set to True, forces the coefficients to be positive.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

coef_ndarray of shape (n_features,) or (n_targets, n_features)

parameter vector (w in the cost function formula)

sparse_coef_sparse matrix of shape (n_features, 1) or (n_targets, n_features)

sparse_coef_ is a readonly property derived from coef_

intercept_float or ndarray of shape (n_targets,)

independent term in decision function.

n_iter_list of int

number of iterations run by the coordinate descent solver to reach the specified tolerance.

Examples

>>> from sklearn.linear_model import ElasticNet
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0)
>>> regr = ElasticNet(random_state=0)
>>> regr.fit(X, y)
ElasticNet(random_state=0)
>>> print(regr.coef_)
[18.83816048 64.55968825]
>>> print(regr.intercept_)
1.451...
>>> print(regr.predict([[0, 0]]))
[1.451...]

Notes

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

See also

ElasticNetCVElastic net model with best model selection by

cross-validation.

SGDRegressor: implements elastic net regression with incremental training. SGDClassifier: implements logistic regression with elastic net penalty

(SGDClassifier(loss="log", penalty="elasticnet")).

Full API documentation: ElasticNetScikitsLearnNode

class mdp.nodes.LassoCVScikitsLearnNode

Lasso linear model with iterative fitting along a regularization path. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.LassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

The best model is selected by cross-validation.

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Read more in the User Guide.

Parameters

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

verbosebool or int, default=False

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

positivebool, default=False

If positive, restrict regression coefficients to be positive

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

alpha_float

The amount of penalization chosen by cross validation

coef_ndarray of shape (n_features,) or (n_targets, n_features)

parameter vector (w in the cost function formula)

intercept_float or ndarray of shape (n_targets,)

independent term in decision function.

mse_path_ndarray of shape (n_alphas, n_folds)

mean square error for the test set on each fold, varying alpha

alphas_ndarray of shape (n_alphas,)

The grid of alphas used for fitting

dual_gap_float or ndarray of shape (n_targets,)

The dual gap at the end of the optimization for the optimal alpha (alpha_).

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.

Examples

>>> from sklearn.linear_model import LassoCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(noise=4, random_state=0)
>>> reg = LassoCV(cv=5, random_state=0).fit(X, y)
>>> reg.score(X, y)
0.9993...
>>> reg.predict(X[:1,])
array([-78.4951...])

Notes

For an example, see examples/linear_model/plot_lasso_model_selection.py.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

See also

lars_path lasso_path LassoLars Lasso LassoLarsCV

Full API documentation: LassoCVScikitsLearnNode

class mdp.nodes.ElasticNetCVScikitsLearnNode

Elastic Net model with iterative fitting along a regularization path. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.ElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

Read more in the User Guide.

Parameters

l1_ratiofloat or list of float, default=0.5

float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2 This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precompute‘auto’, bool or array-like of shape (n_features, n_features), default=’auto’

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=0

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

positivebool, default=False

When set to True, forces the coefficients to be positive.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

alpha_float

The amount of penalization chosen by cross validation

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation

coef_ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula),

intercept_float or ndarray of shape (n_targets, n_features)

Independent term in the decision function.

mse_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Mean square error for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.

Examples

>>> from sklearn.linear_model import ElasticNetCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0)
>>> regr = ElasticNetCV(cv=5, random_state=0)
>>> regr.fit(X, y)
ElasticNetCV(cv=5, random_state=0)
>>> print(regr.alpha_)
0.199...
>>> print(regr.intercept_)
0.398...
>>> print(regr.predict([[0, 0]]))
[0.398...]

Notes

For an example, see examples/linear_model/plot_lasso_model_selection.py.

To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortran-contiguous numpy array.

The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:

1 / (2 * n_samples) * ||y - Xw||^2_2
+ alpha * l1_ratio * ||w||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||w||^2_2

If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:

a * L1 + b * L2

for:

alpha = a + b and l1_ratio = a / (a + b).

See also

enet_path ElasticNet

Full API documentation: ElasticNetCVScikitsLearnNode

class mdp.nodes.MultiTaskLassoScikitsLearnNode

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.MultiTaskLasso class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||Y - XW||^2_Fro + alpha * ||W||_21

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of each row.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Constant that multiplies the L1/L2 term. Defaults to 1.0

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4

Attributes

coef_ndarray of shape (n_tasks, n_features)

Parameter vector (W in the cost function formula). Note that coef_ stores the transpose of W, W.T.

intercept_ndarray of shape (n_tasks,)

independent term in decision function.

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.MultiTaskLasso(alpha=0.1)
>>> clf.fit([[0, 1], [1, 2], [2, 4]], [[0, 0], [1, 1], [2, 3]])
MultiTaskLasso(alpha=0.1)
>>> print(clf.coef_)
[[0.         0.60809415]
[0.         0.94592424]]
>>> print(clf.intercept_)
[-0.41888636 -0.87382323]

See also

MultiTaskLasso : Multi-task L1/L2 Lasso with built-in cross-validation Lasso MultiTaskElasticNet

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X and y arguments of the fit method should be directly passed as Fortran-contiguous numpy arrays.

Full API documentation: MultiTaskLassoScikitsLearnNode

class mdp.nodes.MultiTaskElasticNetScikitsLearnNode

Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.MultiTaskElasticNet class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The optimization objective for MultiTaskElasticNet is:

(1 / (2 * n_samples)) * ||Y - XW||_Fro^2
+ alpha * l1_ratio * ||W||_21
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2

Where:

||W||_21 = sum_i sqrt(sum_j W_ij ^ 2)

i.e. the sum of norms of each row.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Constant that multiplies the L1/L2 term. Defaults to 1.0

l1_ratiofloat, default=0.5

The ElasticNet mixing parameter, with 0 < l1_ratio <= 1. For l1_ratio = 1 the penalty is an L1/L2 penalty. For l1_ratio = 0 it is an L2 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1/L2 and L2.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

intercept_ndarray of shape (n_tasks,)

Independent term in decision function.

coef_ndarray of shape (n_tasks, n_features)

Parameter vector (W in the cost function formula). If a 1D y is passed in at fit (non multi-task usage), coef_ is then a 1D array. Note that coef_ stores the transpose of W, W.T.

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.MultiTaskElasticNet(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]])
MultiTaskElasticNet(alpha=0.1)
>>> print(clf.coef_)
[[0.45663524 0.45612256]
 [0.45663524 0.45612256]]
>>> print(clf.intercept_)
[0.0872422 0.0872422]

See also

MultiTaskElasticNetMulti-task L1/L2 ElasticNet with built-in

cross-validation.

ElasticNet MultiTaskLasso

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X and y arguments of the fit method should be directly passed as Fortran-contiguous numpy arrays.

Full API documentation: MultiTaskElasticNetScikitsLearnNode

class mdp.nodes.MultiTaskElasticNetCVScikitsLearnNode

Multi-task L1/L2 ElasticNet with built-in cross-validation. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.MultiTaskElasticNetCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

The optimization objective for MultiTaskElasticNet is:

(1 / (2 * n_samples)) * ||Y - XW||^Fro_2
+ alpha * l1_ratio * ||W||_21
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of each row.

Read more in the User Guide.

New in version 0.15.

Parameters

l1_ratiofloat or list of float, default=0.5

The ElasticNet mixing parameter, with 0 < l1_ratio <= 1. For l1_ratio = 1 the penalty is an L1/L2 penalty. For l1_ratio = 0 it is an L2 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1/L2 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path

alphasarray-like, default=None

List of alphas where to compute the models. If not provided, set automatically.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=0

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. Note that this is used only if multiple values for l1_ratio are given. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

intercept_ndarray of shape (n_tasks,)

Independent term in decision function.

coef_ndarray of shape (n_tasks, n_features)

Parameter vector (W in the cost function formula). Note that coef_ stores the transpose of W, W.T.

alpha_float

The amount of penalization chosen by cross validation

mse_path_ndarray of shape (n_alphas, n_folds) or (n_l1_ratio, n_alphas, n_folds)

mean square error for the test set on each fold, varying alpha

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio

l1_ratio_float

best l1_ratio obtained by cross-validation.

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.

Examples

>>> from sklearn import linear_model
>>> clf = linear_model.MultiTaskElasticNetCV(cv=3)
>>> clf.fit([[0,0], [1, 1], [2, 2]],
...         [[0, 0], [1, 1], [2, 2]])
MultiTaskElasticNetCV(cv=3)
>>> print(clf.coef_)
[[0.52875032 0.46958558]
 [0.52875032 0.46958558]]
>>> print(clf.intercept_)
[0.00166409 0.00166409]

See also

MultiTaskElasticNet ElasticNetCV MultiTaskLassoCV

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X and y arguments of the fit method should be directly passed as Fortran-contiguous numpy arrays.

Full API documentation: MultiTaskElasticNetCVScikitsLearnNode

class mdp.nodes.MultiTaskLassoCVScikitsLearnNode

Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer. This node has been automatically generated by wrapping the sklearn.linear_model._coordinate_descent.MultiTaskLassoCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

The optimization objective for MultiTaskLasso is:

(1 / (2 * n_samples)) * ||Y - XW||^Fro_2 + alpha * ||W||_21

Where:

||W||_21 = \sum_i \sqrt{\sum_j w_{ij}^2}

i.e. the sum of norm of each row.

Read more in the User Guide.

New in version 0.15.

Parameters

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path

alphasarray-like, default=None

List of alphas where to compute the models. If not provided, set automatically.

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations.

tolfloat, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

verbosebool or int, default=False

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. Note that this is used only if multiple values for l1_ratio are given. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection{‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Attributes

intercept_ndarray of shape (n_tasks,)

Independent term in decision function.

coef_ndarray of shape (n_tasks, n_features)

Parameter vector (W in the cost function formula). Note that coef_ stores the transpose of W, W.T.

alpha_float

The amount of penalization chosen by cross validation

mse_path_ndarray of shape (n_alphas, n_folds)

mean square error for the test set on each fold, varying alpha

alphas_ndarray of shape (n_alphas,)

The grid of alphas used for fitting.

n_iter_int

number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.

Examples

>>> from sklearn.linear_model import MultiTaskLassoCV
>>> from sklearn.datasets import make_regression
>>> from sklearn.metrics import r2_score
>>> X, y = make_regression(n_targets=2, noise=4, random_state=0)
>>> reg = MultiTaskLassoCV(cv=5, random_state=0).fit(X, y)
>>> r2_score(y, reg.predict(X))
0.9994...
>>> reg.alpha_
0.5713...
>>> reg.predict(X[:1,])
array([[153.7971...,  94.9015...]])

See also

MultiTaskElasticNet ElasticNetCV MultiTaskElasticNetCV

Notes

The algorithm used to fit the model is coordinate descent.

To avoid unnecessary memory duplication the X and y arguments of the fit method should be directly passed as Fortran-contiguous numpy arrays.

Full API documentation: MultiTaskLassoCVScikitsLearnNode

class mdp.nodes.PoissonRegressorScikitsLearnNode

Generalized Linear Model with a Poisson distribution. This node has been automatically generated by wrapping the sklearn.linear_model._glm.glm.PoissonRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

alphafloat, default=1

Constant that multiplies the penalty term and thus determines the regularization strength. alpha = 0 is equivalent to unpenalized GLMs. In this case, the design matrix X must have full column rank (no collinearities).

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=100

The maximal number of iterations for the solver.

tolfloat, default=1e-4

Stopping criterion. For the lbfgs solver, the iteration will stop when max{|g_j|, j = 1, ..., d} <= tol where g_j is the j-th component of the gradient (derivative) of the objective function.

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_ .

verboseint, default=0

For the lbfgs solver set verbose to any positive number for verbosity.

Attributes

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ ``coef_` + intercept_`) in the GLM.

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

Full API documentation: PoissonRegressorScikitsLearnNode

class mdp.nodes.GammaRegressorScikitsLearnNode

Generalized Linear Model with a Gamma distribution. This node has been automatically generated by wrapping the sklearn.linear_model._glm.glm.GammaRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

alphafloat, default=1

Constant that multiplies the penalty term and thus determines the regularization strength. alpha = 0 is equivalent to unpenalized GLMs. In this case, the design matrix X must have full column rank (no collinearities).

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=100

The maximal number of iterations for the solver.

tolfloat, default=1e-4

Stopping criterion. For the lbfgs solver, the iteration will stop when max{|g_j|, j = 1, ..., d} <= tol where g_j is the j-th component of the gradient (derivative) of the objective function.

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_ .

verboseint, default=0

For the lbfgs solver set verbose to any positive number for verbosity.

Attributes

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X * ``coef_` + intercept_`) in the GLM.

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

Full API documentation: GammaRegressorScikitsLearnNode

class mdp.nodes.TweedieRegressorScikitsLearnNode

Generalized Linear Model with a Tweedie distribution. This node has been automatically generated by wrapping the sklearn.linear_model._glm.glm.TweedieRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This estimator can be used to model different GLMs depending on the power parameter, which determines the underlying distribution.

Read more in the User Guide.

Parameters

powerfloat, default=0

The power determines the underlying target distribution according to the following table:

Power

Distribution

0

Normal

1

Poisson

(1,2)

Compound Poisson Gamma

2

Gamma

3

Inverse Gaussian

For 0 < power < 1, no distribution exists.

alphafloat, default=1

Constant that multiplies the penalty term and thus determines the regularization strength. alpha = 0 is equivalent to unpenalized GLMs. In this case, the design matrix X must have full column rank (no collinearities).

link{‘auto’, ‘identity’, ‘log’}, default=’auto’

The link function of the GLM, i.e. mapping from linear predictor X @ coeff + intercept to prediction y_pred. Option ‘auto’ sets the link depending on the chosen family as follows:

  • ‘identity’ for Normal distribution

  • ‘log’ for Poisson, Gamma and Inverse Gaussian distributions

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=100

The maximal number of iterations for the solver.

tolfloat, default=1e-4

Stopping criterion. For the lbfgs solver, the iteration will stop when max{|g_j|, j = 1, ..., d} <= tol where g_j is the j-th component of the gradient (derivative) of the objective function.

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_ .

verboseint, default=0

For the lbfgs solver set verbose to any positive number for verbosity.

Attributes

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ ``coef_` + intercept_`) in the GLM.

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

Full API documentation: TweedieRegressorScikitsLearnNode

class mdp.nodes.HuberRegressorScikitsLearnNode

Linear regression model that is robust to outliers. This node has been automatically generated by wrapping the sklearn.linear_model._huber.HuberRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The Huber Regressor optimizes the squared loss for the samples where |(y - X'w) / sigma| < epsilon and the absolute loss for the samples where |(y - X'w) / sigma| > epsilon, where w and sigma are parameters to be optimized. The parameter sigma makes sure that if y is scaled up or down by a certain factor, one does not need to rescale epsilon to achieve the same robustness. Note that this does not take into account the fact that the different features of X may be of different scales.

This makes sure that the loss function is not heavily influenced by the outliers while not completely ignoring their effect.

Read more in the User Guide

New in version 0.18.

Parameters

epsilonfloat, greater than 1.0, default 1.35

The parameter epsilon controls the number of samples that should be classified as outliers. The smaller the epsilon, the more robust it is to outliers.

max_iterint, default 100

Maximum number of iterations that scipy.optimize.minimize(method="L-BFGS-B") should run for.

alphafloat, default 0.0001

Regularization parameter.

warm_startbool, default False

This is useful if the stored attributes of a previously used model has to be reused. If set to False, then the coefficients will be rewritten for every call to fit. See the Glossary.

fit_interceptbool, default True

Whether or not to fit the intercept. This can be set to False if the data is already centered around the origin.

tolfloat, default 1e-5

The iteration will stop when max{|proj g_i | i = 1, ..., n} <= tol where pg_i is the i-th component of the projected gradient.

Attributes

coef_array, shape (n_features,)

Features got by optimizing the Huber loss.

intercept_float

Bias.

scale_float

The value by which |y - X'w - c| is scaled down.

n_iter_int

Number of iterations that scipy.optimize.minimize(method="L-BFGS-B") has run for.

Changed in version 0.20: In SciPy <= 1.0.0 the number of lbfgs iterations may exceed max_iter. n_iter_ will now report at most max_iter.

outliers_array, shape (n_samples,)

A boolean mask which is set to True where the samples are identified as outliers.

Examples

>>> import numpy as np
>>> from sklearn.linear_model import HuberRegressor, LinearRegression
>>> from sklearn.datasets import make_regression
>>> rng = np.random.RandomState(0)
>>> X, y, coef = make_regression(
...     n_samples=200, n_features=2, noise=4.0, coef=True, random_state=0)
>>> X[:4] = rng.uniform(10, 20, (4, 2))
>>> y[:4] = rng.uniform(10, 20, 4)
>>> huber = HuberRegressor().fit(X, y)
>>> huber.score(X, y)
-7.284...
>>> huber.predict(X[:1,])
array([806.7200...])
>>> linear = LinearRegression().fit(X, y)
>>> print("True coefficients:", coef)
True coefficients: [20.4923...  34.1698...]
>>> print("Huber coefficients:", huber.coef_)
Huber coefficients: [17.7906... 31.0106...]
>>> print("Linear Regression coefficients:", linear.coef_)
Linear Regression coefficients: [-1.9221...  7.0226...]

References

1

Peter J. Huber, Elvezio M. Ronchetti, Robust Statistics Concomitant scale estimates, pg 172

2

Art B. Owen (2006), A robust hybrid of lasso and ridge regression. https://statweb.stanford.edu/~owen/reports/hhu.pdf

Full API documentation: HuberRegressorScikitsLearnNode

class mdp.nodes.SGDClassifierScikitsLearnNode

Linear classifiers (SVM, logistic regression, etc.) with SGD training. This node has been automatically generated by wrapping the sklearn.linear_model._stochastic_gradient.SGDClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This estimator implements regularized linear models with stochastic gradient descent (SGD) learning: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate). SGD allows minibatch (online/out-of-core) learning via the partial_fit method. For best results using the default learning rate schedule, the data should have zero mean and unit variance.

This implementation works with data represented as dense or sparse arrays of floating point values for the features. The model it fits can be controlled with the loss parameter; by default, it fits a linear support vector machine (SVM).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

Read more in the User Guide.

Parameters

lossstr, default=’hinge’

The loss function to be used. Defaults to ‘hinge’, which gives a linear SVM.

The possible options are ‘hinge’, ‘log’, ‘modified_huber’, ‘squared_hinge’, ‘perceptron’, or a regression loss: ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’.

The ‘log’ loss gives logistic regression, a probabilistic classifier. ‘modified_huber’ is another smooth loss that brings tolerance to outliers as well as probability estimates. ‘squared_hinge’ is like hinge but is quadratically penalized. ‘perceptron’ is the linear loss used by the perceptron algorithm. The other losses are designed for regression but can be useful in classification as well; see SGDRegressor for a description.

More details about the losses formulas can be found in the User Guide.

penalty{‘l2’, ‘l1’, ‘elasticnet’}, default=’l2’

The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ might bring sparsity to the model (feature selection) not achievable with ‘l2’.

alphafloat, default=0.0001

Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate when set to learning_rate is set to ‘optimal’.

l1_ratiofloat, default=0.15

The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is ‘elasticnet’.

fit_interceptbool, default=True

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

max_iterint, default=1000

The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit() method.

New in version 0.19.

tolfloat, default=1e-3

The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.

New in version 0.19.

shufflebool, default=True

Whether or not the training data should be shuffled after each epoch.

verboseint, default=0

The verbosity level.

epsilonfloat, default=0.1

Epsilon in the epsilon-insensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.

n_jobsint, default=None

The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance, default=None

Used for shuffling the data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary.

learning_ratestr, default=’optimal’

The learning rate schedule:

  • ‘constant’: eta = eta0

  • ‘optimal’: eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.

  • ‘invscaling’: eta = eta0 / pow(t, power_t)

  • ‘adaptive’: eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.

    New in version 0.20: Added ‘adaptive’ option

eta0double, default=0.0

The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.

power_tdouble, default=0.5

The exponent for inverse scaling learning rate [default 0.5].

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score returned by the score method is not improving by at least tol for n_iter_no_change consecutive epochs.

New in version 0.20: Added ‘early_stopping’ option

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

New in version 0.20: Added ‘validation_fraction’ option

n_iter_no_changeint, default=5

Number of iterations with no improvement to wait before early stopping.

New in version 0.20: Added ‘n_iter_no_change’ option

class_weightdict, {class_label: weight} or “balanced”, default=None

Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling fit resets this counter, while partial_fit will result in increasing the existing counter.

averagebool or int, default=False

When set to True, computes the averaged SGD weights accross all updates and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

Attributes

coef_ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features)

Weights assigned to the features.

intercept_ndarray of shape (1,) if n_classes == 2 else (n_classes,)

Constants in decision function.

n_iter_int

The actual number of iterations before reaching the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

loss_function_ : concrete LossFunction

classes_ : array of shape (n_classes,)

t_int

Number of weight updates performed during training. Same as (n_iter_ * n_samples).

See Also

sklearn.svm.LinearSVC: Linear support vector classification. LogisticRegression: Logistic regression. Perceptron: Inherits from SGDClassifier. Perceptron() is equivalent to

SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).

Examples

>>> import numpy as np
>>> from sklearn.linear_model import SGDClassifier
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.pipeline import make_pipeline
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> # Always scale the input. The most convenient way is to use a pipeline.
>>> clf = make_pipeline(StandardScaler(),
...                     SGDClassifier(max_iter=1000, tol=1e-3))
>>> clf.fit(X, Y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('sgdclassifier', SGDClassifier())])
>>> print(clf.predict([[-0.8, -1]]))
[1]

Full API documentation: SGDClassifierScikitsLearnNode

class mdp.nodes.SGDRegressorScikitsLearnNode

Linear model fitted by minimizing a regularized empirical loss with SGD This node has been automatically generated by wrapping the sklearn.linear_model._stochastic_gradient.SGDRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense numpy arrays of floating point values for the features.

Read more in the User Guide.

Parameters

lossstr, default=’squared_loss’

The loss function to be used. The possible values are ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’

The ‘squared_loss’ refers to the ordinary least squares fit. ‘huber’ modifies ‘squared_loss’ to focus less on getting outliers correct by switching from squared to linear loss past a distance of epsilon. ‘epsilon_insensitive’ ignores errors less than epsilon and is linear past that; this is the loss function used in SVR. ‘squared_epsilon_insensitive’ is the same but becomes squared loss past a tolerance of epsilon.

More details about the losses formulas can be found in the User Guide.

penalty{‘l2’, ‘l1’, ‘elasticnet’}, default=’l2’

The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ might bring sparsity to the model (feature selection) not achievable with ‘l2’.

alphafloat, default=0.0001

Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate when set to learning_rate is set to ‘optimal’.

l1_ratiofloat, default=0.15

The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if penalty is ‘elasticnet’.

fit_interceptbool, default=True

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

max_iterint, default=1000

The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit() method.

New in version 0.19.

tolfloat, default=1e-3

The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for n_iter_no_change consecutive epochs.

New in version 0.19.

shufflebool, default=True

Whether or not the training data should be shuffled after each epoch.

verboseint, default=0

The verbosity level.

epsilonfloat, default=0.1

Epsilon in the epsilon-insensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.

random_stateint, RandomState instance, default=None

Used for shuffling the data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary.

learning_ratestring, default=’invscaling’

The learning rate schedule:

  • ‘constant’: eta = eta0

  • ‘optimal’: eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.

  • ‘invscaling’: eta = eta0 / pow(t, power_t)

  • ‘adaptive’: eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.

    New in version 0.20: Added ‘adaptive’ option

eta0double, default=0.01

The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.01.

power_tdouble, default=0.25

The exponent for inverse scaling learning rate.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score returned by the score method is not improving by at least tol for n_iter_no_change consecutive epochs.

New in version 0.20: Added ‘early_stopping’ option

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

New in version 0.20: Added ‘validation_fraction’ option

n_iter_no_changeint, default=5

Number of iterations with no improvement to wait before early stopping.

New in version 0.20: Added ‘n_iter_no_change’ option

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling fit resets this counter, while partial_fit will result in increasing the existing counter.

averagebool or int, default=False

When set to True, computes the averaged SGD weights accross all updates and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

Attributes

coef_ndarray of shape (n_features,)

Weights assigned to the features.

intercept_ndarray of shape (1,)

The intercept term.

average_coef_ndarray of shape (n_features,)

Averaged weights assigned to the features. Only available if average=True.

Deprecated since version 0.23: Attribute average_coef_ was deprecated in version 0.23 and will be removed in 0.25.

average_intercept_ndarray of shape (1,)

The averaged intercept term. Only available if average=True.

Deprecated since version 0.23: Attribute average_intercept_ was deprecated in version 0.23 and will be removed in 0.25.

n_iter_int

The actual number of iterations before reaching the stopping criterion.

t_int

Number of weight updates performed during training. Same as (n_iter_ * n_samples).

Examples

>>> import numpy as np
>>> from sklearn.linear_model import SGDRegressor
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> # Always scale the input. The most convenient way is to use a pipeline.
>>> reg = make_pipeline(StandardScaler(),
...                     SGDRegressor(max_iter=1000, tol=1e-3))
>>> reg.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('sgdregressor', SGDRegressor())])

See also

Ridge, ElasticNet, Lasso, sklearn.svm.SVR

Full API documentation: SGDRegressorScikitsLearnNode

class mdp.nodes.RidgeScikitsLearnNode

Linear least squares with l2 regularization. This node has been automatically generated by wrapping the sklearn.linear_model._ridge.Ridge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Minimizes the objective function:

||y - Xw||^2_2 + alpha * ||w||^2_2

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).

Read more in the User Guide.

Parameters

alpha{float, ndarray of shape (n_targets,)}, default=1.0

Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or sklearn.svm.LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.

fit_interceptbool, default=True

Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations (i.e. X and y are expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

max_iterint, default=None

Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000.

tolfloat, default=1e-3

Precision of the solution.

solver{‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}, default=’auto’

Solver to use in the computational routines:

  • ‘auto’ chooses the solver automatically based on the type of data.

  • ‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.

  • ‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.

  • ‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).

  • ‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

  • ‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

All last five solvers support both dense and sparse data. However, only ‘sag’ and ‘sparse_cg’ supports sparse input when fit_intercept is True.

New in version 0.17: Stochastic Average Gradient descent solver.

New in version 0.19: SAGA solver.

random_stateint, RandomState instance, default=None

Used when solver == ‘sag’ or ‘saga’ to shuffle the data. See Glossary for details.

New in version 0.17: random_state to support Stochastic Average Gradient.

Attributes

coef_ndarray of shape (n_features,) or (n_targets, n_features)

Weight vector(s).

intercept_float or ndarray of shape (n_targets,)

Independent term in decision function. Set to 0.0 if fit_intercept = False.

n_iter_None or ndarray of shape (n_targets,)

Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.

New in version 0.17.

See also

RidgeClassifier : Ridge classifier RidgeCV : Ridge regression with built-in cross validation sklearn.kernel_ridge.KernelRidge : Kernel ridge regression

combines ridge regression with the kernel trick

Examples

>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y)
Ridge()

Full API documentation: RidgeScikitsLearnNode

class mdp.nodes.RidgeCVScikitsLearnNode

Ridge regression with built-in cross-validation. This node has been automatically generated by wrapping the sklearn.linear_model._ridge.RidgeCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.

Read more in the User Guide.

Parameters

alphasndarray of shape (n_alphas,), default=(0.1, 1.0, 10.0)

Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or sklearn.svm.LinearSVC. If using generalized cross-validation, alphas must be positive.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

scoringstring, callable, default=None

A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). If None, the negative mean squared error if cv is ‘auto’ or None (i.e. when using generalized cross-validation), and r2 score otherwise.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the efficient Leave-One-Out cross-validation (also known as Generalized Cross-Validation).

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if y is binary or multiclass, sklearn.model_selection.StratifiedKFold is used, else, sklearn.model_selection.KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

gcv_mode{‘auto’, ‘svd’, eigen’}, default=’auto’

Flag indicating which strategy to use when performing Generalized Cross-Validation. Options are:

'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen'
'svd' : force use of singular value decomposition of X when X is
    dense, eigenvalue decomposition of X^T.X when X is sparse.
'eigen' : force computation via eigendecomposition of X.X^T

The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.

store_cv_valuesbool, default=False

Flag indicating if the cross-validation values corresponding to each alpha should be stored in the cv_values_ attribute (see below). This flag is only compatible with cv=None (i.e. using Generalized Cross-Validation).

Attributes

cv_values_ndarray of shape (n_samples, n_alphas) or shape (n_samples, n_targets, n_alphas), optional

Cross-validation values for each alpha (only available if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor).

coef_ndarray of shape (n_features) or (n_targets, n_features)

Weight vector(s).

intercept_float or ndarray of shape (n_targets,)

Independent term in decision function. Set to 0.0 if fit_intercept = False.

alpha_float

Estimated regularization parameter.

best_score_float

Score of base estimator with best alpha.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> X, y = load_diabetes(return_X_y=True)
>>> clf = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
>>> clf.score(X, y)
0.5166...

See also

Ridge : Ridge regression RidgeClassifier : Ridge classifier RidgeClassifierCV : Ridge classifier with built-in cross validation

Full API documentation: RidgeCVScikitsLearnNode

class mdp.nodes.RidgeClassifierScikitsLearnNode

Classifier using Ridge regression. This node has been automatically generated by wrapping the sklearn.linear_model._ridge.RidgeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or sklearn.svm.LinearSVC.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

max_iterint, default=None

Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.

tolfloat, default=1e-3

Precision of the solution.

class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

solver{‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}, default=’auto’

Solver to use in the computational routines:

  • ‘auto’ chooses the solver automatically based on the type of data.

  • ‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.

  • ‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.

  • ‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).

  • ‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

  • ‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

    New in version 0.17: Stochastic Average Gradient descent solver.

    New in version 0.19: SAGA solver.

random_stateint, RandomState instance, default=None

Used when solver == ‘sag’ or ‘saga’ to shuffle the data. See Glossary for details.

Attributes

coef_ndarray of shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

intercept_float or ndarray of shape (n_targets,)

Independent term in decision function. Set to 0.0 if fit_intercept = False.

n_iter_None or ndarray of shape (n_targets,)

Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.

classes_ndarray of shape (n_classes,)

The classes labels.

See Also

Ridge : Ridge regression. RidgeClassifierCV : Ridge classifier with built-in cross validation.

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifier
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifier().fit(X, y)
>>> clf.score(X, y)
0.9595...

Full API documentation: RidgeClassifierScikitsLearnNode

class mdp.nodes.RidgeClassifierCVScikitsLearnNode

Ridge classifier with built-in cross-validation. This node has been automatically generated by wrapping the sklearn.linear_model._ridge.RidgeClassifierCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

By default, it performs Generalized Cross-Validation, which is a form of efficient Leave-One-Out cross-validation. Currently, only the n_features > n_samples case is handled efficiently.

Read more in the User Guide.

Parameters

alphasndarray of shape (n_alphas,), default=(0.1, 1.0, 10.0)

Array of alpha values to try. Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or sklearn.svm.LinearSVC.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

scoringstring, callable, default=None

A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the efficient Leave-One-Out cross-validation

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

Refer User Guide for the various cross-validation strategies that can be used here.

class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

store_cv_valuesbool, default=False

Flag indicating if the cross-validation values corresponding to each alpha should be stored in the cv_values_ attribute (see below). This flag is only compatible with cv=None (i.e. using Generalized Cross-Validation).

Attributes

cv_values_ndarray of shape (n_samples, n_targets, n_alphas), optional

Cross-validation values for each alpha (if store_cv_values=True and cv=None). After fit() has been called, this attribute will contain the mean squared errors (by default) or the values of the {loss,score}_func function (if provided in the constructor). This attribute exists only when store_cv_values is True.

coef_ndarray of shape (1, n_features) or (n_targets, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

intercept_float or ndarray of shape (n_targets,)

Independent term in decision function. Set to 0.0 if fit_intercept = False.

alpha_float

Estimated regularization parameter.

best_score_float

Score of base estimator with best alpha.

classes_ndarray of shape (n_classes,)

The classes labels.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import RidgeClassifierCV
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = RidgeClassifierCV(alphas=[1e-3, 1e-2, 1e-1, 1]).fit(X, y)
>>> clf.score(X, y)
0.9630...

See also

Ridge : Ridge regression RidgeClassifier : Ridge classifier RidgeCV : Ridge regression with built-in cross validation

Notes

For multi-class classification, n_class classifiers are trained in a one-versus-all approach. Concretely, this is implemented by taking advantage of the multi-variate response support in Ridge.

Full API documentation: RidgeClassifierCVScikitsLearnNode

class mdp.nodes.LogisticRegressionScikitsLearnNode

Logistic Regression (aka logit, MaxEnt) classifier. This node has been automatically generated by wrapping the sklearn.linear_model._logistic.LogisticRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’ solvers.)

This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers. Note that regularization is applied by default. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).

The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 regularization with primal formulation, or no regularization. The ‘liblinear’ solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only supported by the ‘saga’ solver.

Read more in the User Guide.

Parameters

penalty{‘l1’, ‘l2’, ‘elasticnet’, ‘none’}, default=’l2’

Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.

New in version 0.19: l1 penalty with SAGA solver (allowing ‘multinomial’ + L1)

dualbool, default=False

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

tolfloat, default=1e-4

Tolerance for stopping criteria.

Cfloat, default=1.0

Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scalingfloat, default=1

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

New in version 0.17: class_weight=’balanced’

random_stateint, RandomState instance, default=None

Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. See Glossary for details.

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’

Algorithm to use in the optimization problem.

  • For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.

  • For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.

  • ‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty

  • ‘liblinear’ and ‘saga’ also handle L1 penalty

  • ‘saga’ also supports ‘elasticnet’ penalty

  • ‘liblinear’ does not support setting penalty='none'

Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

New in version 0.17: Stochastic Average Gradient descent solver.

New in version 0.19: SAGA solver.

Changed in version 0.22: The default solver changed from ‘liblinear’ to ‘lbfgs’ in 0.22.

max_iterint, default=100

Maximum number of iterations taken for the solvers to converge.

multi_class{‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’

If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case.

Changed in version 0.22: Default changed from ‘ovr’ to ‘auto’ in 0.22.

verboseint, default=0

For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary.

New in version 0.17: warm_start to support lbfgs, newton-cg, sag, saga solvers.

n_jobsint, default=None

Number of CPU cores used when parallelizing over classes if multi_class=’ovr’”. This parameter is ignored when the solver is set to ‘liblinear’ regardless of whether ‘multi_class’ is specified or not. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

l1_ratiofloat, default=None

The Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. Setting l1_ratio=0 is equivalent to using penalty='l2', while setting l1_ratio=1 is equivalent to using penalty='l1'. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2.

Attributes

classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_ndarray of shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary. In particular, when multi_class=’multinomial’, coef_ corresponds to outcome 1 (True) and -coef_ corresponds to outcome 0 (False).

intercept_ndarray of shape (1,) or (n_classes,)

Intercept (a.k.a. bias) added to the decision function.

If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape (1,) when the given problem is binary. In particular, when multi_class=’multinomial’, intercept_ corresponds to outcome 1 (True) and -intercept_ corresponds to outcome 0 (False).

n_iter_ndarray of shape (n_classes,) or (1, )

Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given.

Changed in version 0.20: In SciPy <= 1.0.0 the number of lbfgs iterations may exceed max_iter. n_iter_ will now report at most max_iter.

See Also

SGDClassifierIncrementally trained logistic regression (when given

the parameter loss="log").

LogisticRegressionCV : Logistic regression with built-in cross validation.

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear in the narrative documentation.

References

L-BFGS-B – Software for Large-scale Bound-constrained Optimization

Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. http://users.iems.northwestern.edu/~nocedal/lbfgsb.html

LIBLINEAR – A Library for Large Linear Classification

https://www.csie.ntu.edu.tw/~cjlin/liblinear/

SAG – Mark Schmidt, Nicolas Le Roux, and Francis Bach

Minimizing Finite Sums with the Stochastic Average Gradient https://hal.inria.fr/hal-00860051/document

SAGA – Defazio, A., Bach F. & Lacoste-Julien S. (2014).

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives https://arxiv.org/abs/1407.0202

Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent

methods for logistic regression and maximum entropy models. Machine Learning 85(1-2):41-75. https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = load_iris(return_X_y=True)
>>> clf = LogisticRegression(random_state=0).fit(X, y)
>>> clf.predict(X[:2, :])
array([0, 0])
>>> clf.predict_proba(X[:2, :])
array([[9.8...e-01, 1.8...e-02, 1.4...e-08],
       [9.7...e-01, 2.8...e-02, ...e-08]])
>>> clf.score(X, y)
0.97...

Full API documentation: LogisticRegressionScikitsLearnNode

class mdp.nodes.LogisticRegressionCVScikitsLearnNode

Logistic Regression CV (aka logit, MaxEnt) classifier. This node has been automatically generated by wrapping the sklearn.linear_model._logistic.LogisticRegressionCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

This class implements logistic regression using liblinear, newton-cg, sag of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Elastic-Net penalty is only supported by the saga solver.

For the grid of Cs values and l1_ratios values, the best hyperparameter is selected by the cross-validator StratifiedKFold, but it can be changed using the cv parameter. The ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers can warm-start the coefficients (see Glossary).

Read more in the User Guide.

Parameters

Csint or list of floats, default=10

Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

cvint or cross-validation generator, default=None

The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module sklearn.model_selection module for the list of possible cross-validation objects.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

dualbool, default=False

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

penalty{‘l1’, ‘l2’, ‘elasticnet’}, default=’l2’

Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver.

scoringstr or callable, default=None

A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option used is ‘accuracy’.

solver : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’

Algorithm to use in the optimization problem.

  • For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.

  • For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.

  • ‘newton-cg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, whereas ‘liblinear’ and ‘saga’ handle L1 penalty.

  • ‘liblinear’ might be slower in LogisticRegressionCV because it does not handle warm-starting.

Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

New in version 0.17: Stochastic Average Gradient descent solver.

New in version 0.19: SAGA solver.

tolfloat, default=1e-4

Tolerance for stopping criteria.

max_iterint, default=100

Maximum number of iterations of the optimization algorithm.

class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

New in version 0.17: class_weight == ‘balanced’

n_jobsint, default=None

Number of CPU cores used during the cross-validation loop. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

verboseint, default=0

For the ‘liblinear’, ‘sag’ and ‘lbfgs’ solvers set verbose to any positive number for verbosity.

refitbool, default=True

If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.

intercept_scalingfloat, default=1

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

multi_class{‘auto, ‘ovr’, ‘multinomial’}, default=’auto’

If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case.

Changed in version 0.22: Default changed from ‘ovr’ to ‘auto’ in 0.22.

random_stateint, RandomState instance, default=None

Used when solver=’sag’, ‘saga’ or ‘liblinear’ to shuffle the data. Note that this only applies to the solver and not the cross-validation generator. See Glossary for details.

l1_ratioslist of float, default=None

The list of Elastic-Net mixing parameter, with 0 <= l1_ratio <= 1. Only used if penalty='elasticnet'. A value of 0 is equivalent to using penalty='l2', while 1 is equivalent to using penalty='l1'. For 0 < l1_ratio <1, the penalty is a combination of L1 and L2.

Attributes

classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_ndarray of shape (1, n_features) or (n_classes, n_features)

Coefficient of the features in the decision function.

coef_ is of shape (1, n_features) when the given problem is binary.

intercept_ndarray of shape (1,) or (n_classes,)

Intercept (a.k.a. bias) added to the decision function.

If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape(1,) when the problem is binary.

Cs_ndarray of shape (n_cs)

Array of C i.e. inverse of regularization parameter values used for cross-validation.

l1_ratios_ndarray of shape (n_l1_ratios)

Array of l1_ratios used for cross-validation. If no l1_ratio is used (i.e. penalty is not ‘elasticnet’), this is set to [None]

coefs_paths_ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1)

dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold and then across each Cs after doing an OvR for the corresponding class as values. If the ‘multi_class’ option is set to ‘multinomial’, then the coefs_paths are the coefficients corresponding to each class. Each dict value has shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) depending on whether the intercept is fit or not. If penalty='elasticnet', the shape is (n_folds, n_cs, ``n_l1_ratios_, n_features)`` or (n_folds, n_cs, ``n_l1_ratios_, n_features + 1)``.

scores_dict

dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold, after doing an OvR for the corresponding class. If the ‘multi_class’ option given is ‘multinomial’ then the same scores are repeated across all classes, since this is the multinomial class. Each dict value has shape (n_folds, n_cs or (n_folds, n_cs, n_l1_ratios) if penalty='elasticnet'.

C_ndarray of shape (n_classes,) or (n_classes - 1,)

Array of C that maps to the best scores across every class. If refit is set to False, then for each class, the best C is the average of the C’s that correspond to the best scores for each fold. C_ is of shape(n_classes,) when the problem is binary.

l1_ratio_ndarray of shape (n_classes,) or (n_classes - 1,)

Array of l1_ratio that maps to the best scores across every class. If refit is set to False, then for each class, the best l1_ratio is the average of the l1_ratio’s that correspond to the best scores for each fold. l1_ratio_ is of shape(n_classes,) when the problem is binary.

n_iter_ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs)

Actual number of iterations for all classes, folds and Cs. In the binary or multinomial cases, the first dimension is equal to 1. If penalty='elasticnet', the shape is (n_classes, n_folds, n_cs, n_l1_ratios) or (1, n_folds, n_cs, n_l1_ratios).

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.linear_model import LogisticRegressionCV
>>> X, y = load_iris(return_X_y=True)
>>> clf = LogisticRegressionCV(cv=5, random_state=0).fit(X, y)
>>> clf.predict(X[:2, :])
array([0, 0])
>>> clf.predict_proba(X[:2, :]).shape
(2, 3)
>>> clf.score(X, y)
0.98...

See also

LogisticRegression

Full API documentation: LogisticRegressionCVScikitsLearnNode

class mdp.nodes.OrthogonalMatchingPursuitScikitsLearnNode

Orthogonal Matching Pursuit model (OMP) This node has been automatically generated by wrapping the sklearn.linear_model._omp.OrthogonalMatchingPursuit class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_nonzero_coefsint, optional

Desired number of non-zero entries in the solution. If None (by default) this value is set to 10% of n_features.

tolfloat, optional

Maximum norm of the residual. If not None, overrides n_nonzero_coefs.

fit_interceptboolean, optional

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizeboolean, optional, default True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

precompute{True, False, ‘auto’}, default ‘auto’

Whether to use a precomputed Gram and Xy matrix to speed up calculations. Improves performance when n_targets or n_samples is very large. Note that if you already have such matrices, you can pass them directly to the fit method.

Attributes

coef_array, shape (n_features,) or (n_targets, n_features)

parameter vector (w in the formula)

intercept_float or array, shape (n_targets,)

independent term in decision function.

n_iter_int or array-like

Number of active features across every target.

Examples

>>> from sklearn.linear_model import OrthogonalMatchingPursuit
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(noise=4, random_state=0)
>>> reg = OrthogonalMatchingPursuit().fit(X, y)
>>> reg.score(X, y)
0.9991...
>>> reg.predict(X[:1,])
array([-78.3854...])

Notes

Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 3397-3415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)

This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report - CS Technion, April 2008. https://www.cs.technion.ac.il/~ronrubin/Publications/KSVD-OMP-v2.pdf

See also

orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode OrthogonalMatchingPursuitCV

Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode

class mdp.nodes.OrthogonalMatchingPursuitCVScikitsLearnNode

Cross-validated Orthogonal Matching Pursuit model (OMP). This node has been automatically generated by wrapping the sklearn.linear_model._omp.OrthogonalMatchingPursuitCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

Read more in the User Guide.

Parameters

copybool, optional

Whether the design matrix X must be copied by the algorithm. A false value is only helpful if X is already Fortran-ordered, otherwise a copy is made anyway.

fit_interceptboolean, optional

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizeboolean, optional, default True

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterinteger, optional

Maximum numbers of iterations to perform, therefore maximum features to include. 10% of n_features but at least 5 if available.

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

n_jobsint or None, optional (default=None)

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

verboseboolean or integer, optional

Sets the verbosity amount

Attributes

intercept_float or array, shape (n_targets,)

Independent term in decision function.

coef_array, shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the problem formulation).

n_nonzero_coefs_int

Estimated number of non-zero coefficients giving the best mean squared error over the cross-validation folds.

n_iter_int or array-like

Number of active features across every target for the model refit with the best hyperparameters got by cross-validating across all folds.

Examples

>>> from sklearn.linear_model import OrthogonalMatchingPursuitCV
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=100, n_informative=10,
...                        noise=4, random_state=0)
>>> reg = OrthogonalMatchingPursuitCV(cv=5).fit(X, y)
>>> reg.score(X, y)
0.9991...
>>> reg.n_nonzero_coefs_
10
>>> reg.predict(X[:1,])
array([-78.3854...])

See also

orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars OrthogonalMatchingPursuit LarsCV LassoLarsCV decomposition.sparse_encode

Full API documentation: OrthogonalMatchingPursuitCVScikitsLearnNode

class mdp.nodes.PassiveAggressiveClassifierScikitsLearnNode

Passive Aggressive Classifier This node has been automatically generated by wrapping the sklearn.linear_model._passive_aggressive.PassiveAggressiveClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

Cfloat

Maximum step size (regularization). Defaults to 1.0.

fit_interceptbool, default=False

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

max_iterint, optional (default=1000)

The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit() method.

New in version 0.19.

tolfloat or None, optional (default=1e-3)

The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).

New in version 0.19.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation. score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

New in version 0.20.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

New in version 0.20.

n_iter_no_changeint, default=5

Number of iterations with no improvement to wait before early stopping.

New in version 0.20.

shufflebool, default=True

Whether or not the training data should be shuffled after each epoch.

verboseinteger, optional

The verbosity level

lossstring, optional

The loss function to be used:

  • hinge: equivalent to PA-I in the reference paper.

  • squared_hinge: equivalent to PA-II in the reference paper.

n_jobsint or None, optional (default=None)

The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance, default=None

Used to shuffle the training data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled.

class_weightdict, {class_label: weight} or “balanced” or None, optional

Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

New in version 0.17: parameter class_weight to automatically weight samples.

averagebool or int, optional

When set to True, computes the averaged SGD weights and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

New in version 0.19: parameter average to use weights averaging in SGD

Attributes

coef_array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.

intercept_array, shape = [1] if n_classes == 2 else [n_classes]

Constants in decision function.

n_iter_int

The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

classes_array of shape (n_classes,)

The unique classes labels.

t_int

Number of weight updates performed during training. Same as (n_iter_ * n_samples).

loss_function_callable

Loss function used by the algorithm.

Examples

>>> from sklearn.linear_model import PassiveAggressiveClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = PassiveAggressiveClassifier(max_iter=1000, random_state=0,
... tol=1e-3)
>>> clf.fit(X, y)
PassiveAggressiveClassifier(random_state=0)
>>> print(clf.coef_)
[[0.26642044 0.45070924 0.67251877 0.64185414]]
>>> print(clf.intercept_)
[1.84127814]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]

See also

SGDClassifier Perceptron

References

Online Passive-Aggressive Algorithms <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf> K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)

Full API documentation: PassiveAggressiveClassifierScikitsLearnNode

class mdp.nodes.PassiveAggressiveRegressorScikitsLearnNode

Passive Aggressive Regressor This node has been automatically generated by wrapping the sklearn.linear_model._passive_aggressive.PassiveAggressiveRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

Cfloat

Maximum step size (regularization). Defaults to 1.0.

fit_interceptbool

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.

max_iterint, optional (default=1000)

The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit() method.

New in version 0.19.

tolfloat or None, optional (default=1e-3)

The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).

New in version 0.19.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation. score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

New in version 0.20.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

New in version 0.20.

n_iter_no_changeint, default=5

Number of iterations with no improvement to wait before early stopping.

New in version 0.20.

shufflebool, default=True

Whether or not the training data should be shuffled after each epoch.

verboseinteger, optional

The verbosity level

lossstring, optional

The loss function to be used:

  • epsilon_insensitive: equivalent to PA-I in the reference paper.

  • squared_epsilon_insensitive: equivalent to PA-II in the reference

  • paper.

epsilonfloat

If the difference between the current prediction and the correct label is below this threshold, the model is not updated.

random_stateint, RandomState instance, default=None

Used to shuffle the training data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled.

averagebool or int, optional

When set to True, computes the averaged SGD weights and stores the result in the coef_ attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.

New in version 0.19: parameter average to use weights averaging in SGD

Attributes

coef_array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.

intercept_array, shape = [1] if n_classes == 2 else [n_classes]

Constants in decision function.

n_iter_int

The actual number of iterations to reach the stopping criterion.

t_int

Number of weight updates performed during training. Same as (n_iter_ * n_samples).

Examples

>>> from sklearn.linear_model import PassiveAggressiveRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, random_state=0)
>>> regr = PassiveAggressiveRegressor(max_iter=100, random_state=0,
... tol=1e-3)
>>> regr.fit(X, y)
PassiveAggressiveRegressor(max_iter=100, random_state=0)
>>> print(regr.coef_)
[20.48736655 34.18818427 67.59122734 87.94731329]
>>> print(regr.intercept_)
[-0.02306214]
>>> print(regr.predict([[0, 0, 0, 0]]))
[-0.02306214]

See also

SGDRegressor

References

Online Passive-Aggressive Algorithms <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf> K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR (2006)

Full API documentation: PassiveAggressiveRegressorScikitsLearnNode

class mdp.nodes.PerceptronScikitsLearnNode

Perceptron This node has been automatically generated by wrapping the sklearn.linear_model._perceptron.Perceptron class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

penalty{‘l2’,’l1’,’elasticnet’}, default=None

The penalty (aka regularization term) to be used.

alphafloat, default=0.0001

Constant that multiplies the regularization term if regularization is used.

fit_interceptbool, default=True

Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.

max_iterint, default=1000

The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the fit method, and not the partial_fit() method.

New in version 0.19.

tolfloat, default=1e-3

The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss - tol).

New in version 0.19.

shufflebool, default=True

Whether or not the training data should be shuffled after each epoch.

verboseint, default=0

The verbosity level

eta0double, default=1

Constant by which the updates are multiplied.

n_jobsint, default=None

The number of CPUs to use to do the OVA (One Versus All, for multi-class problems) computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState instance, default=None

Used to shuffle the training data, when shuffle is set to True. Pass an int for reproducible output across multiple function calls. See Glossary.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation. score is not improving. If set to True, it will automatically set aside a stratified fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

New in version 0.20.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.

New in version 0.20.

n_iter_no_changeint, default=5

Number of iterations with no improvement to wait before early stopping.

New in version 0.20.

class_weightdict, {class_label: weight} or “balanced”, default=None

Preset for the class_weight fit parameter.

Weights associated with classes. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

Attributes

coef_ndarray of shape = [1, n_features] if n_classes == 2 else [n_classes, n_features]

Weights assigned to the features.

intercept_ndarray of shape = [1] if n_classes == 2 else [n_classes]

Constants in decision function.

n_iter_int

The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.

classes_ndarray of shape (n_classes,)

The unique classes labels.

t_int

Number of weight updates performed during training. Same as (n_iter_ * n_samples).

Notes

Perceptron is a classification algorithm which shares the same underlying implementation with SGDClassifier. In fact, Perceptron() is equivalent to SGDClassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=None).

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.linear_model import Perceptron
>>> X, y = load_digits(return_X_y=True)
>>> clf = Perceptron(tol=1e-3, random_state=0)
>>> clf.fit(X, y)
Perceptron()
>>> clf.score(X, y)
0.939...

See also

SGDClassifier

References

https://en.wikipedia.org/wiki/Perceptron and references therein.

Full API documentation: PerceptronScikitsLearnNode

class mdp.nodes.RANSACRegressorScikitsLearnNode

RANSAC (RANdom SAmple Consensus) algorithm. This node has been automatically generated by wrapping the sklearn.linear_model._ransac.RANSACRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. RANSAC is an iterative algorithm for the robust estimation of parameters from a subset of inliers from the complete data set.

Read more in the User Guide.

Parameters

base_estimatorobject, optional

Base estimator object which implements the following methods:

  • fit(X, y): Fit model to given training data and target values.

  • score(X, y): Returns the mean accuracy on the given test data, which is used for the stop criterion defined by stop_score. Additionally, the score is used to decide which of two equally large consensus sets is chosen as the better one.

  • predict(X): Returns predicted values using the linear model, which is used to compute residual error using loss function.

If base_estimator is None, then base_estimator=sklearn.linear_model.LinearRegression() is used for target values of dtype float.

Note that the current implementation only supports regression estimators.

min_samplesint (>= 1) or float ([0, 1]), optional

Minimum number of samples chosen randomly from original data. Treated as an absolute number of samples for min_samples >= 1, treated as a relative number ceil(min_samples * X.shape[0]) for min_samples < 1. This is typically chosen as the minimal number of samples necessary to estimate the given base_estimator. By default a sklearn.linear_model.LinearRegression() estimator is assumed and min_samples is chosen as X.shape[1] + 1.

residual_thresholdfloat, optional

Maximum residual for a data sample to be classified as an inlier. By default the threshold is chosen as the MAD (median absolute deviation) of the target values y.

is_data_validcallable, optional

This function is called with the randomly selected data before the model is fitted to it: is_data_valid(X, y). If its return value is False the current randomly chosen sub-sample is skipped.

is_model_validcallable, optional

This function is called with the estimated model and the randomly selected data: is_model_valid(model, X, y). If its return value is False the current randomly chosen sub-sample is skipped. Rejecting samples with this function is computationally costlier than with is_data_valid. is_model_valid should therefore only be used if the estimated model is needed for making the rejection decision.

max_trialsint, optional

Maximum number of iterations for random sample selection.

max_skipsint, optional

Maximum number of iterations that can be skipped due to finding zero inliers or invalid data defined by is_data_valid or invalid models defined by is_model_valid.

New in version 0.19.

stop_n_inliersint, optional

Stop iteration if at least this number of inliers are found.

stop_scorefloat, optional

Stop iteration if score is greater equal than this threshold.

stop_probabilityfloat in range [0, 1], optional

RANSAC iteration stops if at least one outlier-free set of the training data is sampled in RANSAC. This requires to generate at least N samples (iterations):

N >= log(1 - probability) / log(1 - e**m)

where the probability (confidence) is typically set to high value such as 0.99 (the default) and e is the current fraction of inliers w.r.t. the total number of samples.

lossstring, callable, optional, default “absolute_loss”

String inputs, “absolute_loss” and “squared_loss” are supported which find the absolute loss and squared loss per sample respectively.

If loss is a callable, then it should be a function that takes two arrays as inputs, the true and predicted value and returns a 1-D array with the i-th value of the array corresponding to the loss on X[i].

If the loss on a sample is greater than the residual_threshold, then this sample is classified as an outlier.

New in version 0.18.

random_stateint, RandomState instance, default=None

The generator used to initialize the centers. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

estimator_object

Best fitted model (copy of the base_estimator object).

n_trials_int

Number of random selection trials until one of the stop criteria is met. It is always <= max_trials.

inlier_mask_bool array of shape [n_samples]

Boolean mask of inliers classified as True.

n_skips_no_inliers_int

Number of iterations skipped due to finding zero inliers.

New in version 0.19.

n_skips_invalid_data_int

Number of iterations skipped due to invalid data defined by is_data_valid.

New in version 0.19.

n_skips_invalid_model_int

Number of iterations skipped due to an invalid model defined by is_model_valid.

New in version 0.19.

Examples

>>> from sklearn.linear_model import RANSACRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(
...     n_samples=200, n_features=2, noise=4.0, random_state=0)
>>> reg = RANSACRegressor(random_state=0).fit(X, y)
>>> reg.score(X, y)
0.9885...
>>> reg.predict(X[:1,])
array([-31.9417...])

References

1

https://en.wikipedia.org/wiki/RANSAC

2

https://www.sri.com/sites/default/files/publications/ransac-publication.pdf

3

http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf

Full API documentation: RANSACRegressorScikitsLearnNode

class mdp.nodes.TheilSenRegressorScikitsLearnNode

Theil-Sen Estimator: robust multivariate regression model. This node has been automatically generated by wrapping the sklearn.linear_model._theil_sen.TheilSenRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The algorithm calculates least square solutions on subsets with size n_subsamples of the samples in X. Any value of n_subsamples between the number of features and samples leads to an estimator with a compromise between robustness and efficiency. Since the number of least square solutions is “n_samples choose n_subsamples”, it can be extremely large and can therefore be limited with max_subpopulation. If this limit is reached, the subsets are chosen randomly. In a final step, the spatial median (or L1 median) is calculated of all least square solutions.

Read more in the User Guide.

Parameters

fit_interceptboolean, optional, default True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

copy_Xboolean, optional, default True

If True, X will be copied; else, it may be overwritten.

max_subpopulationint, optional, default 1e4

Instead of computing with a set of cardinality ‘n choose k’, where n is the number of samples and k is the number of subsamples (at least number of features), consider only a stochastic subpopulation of a given maximal size if ‘n choose k’ is larger than max_subpopulation. For other than small problem sizes this parameter will determine memory usage and runtime if n_subsamples is not changed.

n_subsamplesint, optional, default None

Number of samples to calculate the parameters. This is at least the number of features (plus 1 if fit_intercept=True) and the number of samples as a maximum. A lower number leads to a higher breakdown point and a low efficiency while a high number leads to a low breakdown point and a high efficiency. If None, take the minimum number of subsamples leading to maximal robustness. If n_subsamples is set to n_samples, Theil-Sen is identical to least squares.

max_iterint, optional, default 300

Maximum number of iterations for the calculation of spatial median.

tolfloat, optional, default 1.e-3

Tolerance when calculating spatial median.

random_stateint, RandomState instance, default=None

A random number generator instance to define the state of the random permutations generator. Pass an int for reproducible output across multiple function calls. See Glossary

n_jobsint or None, optional (default=None)

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

verboseboolean, optional, default False

Verbose mode when fitting the model.

Attributes

coef_array, shape = (n_features)

Coefficients of the regression model (median of distribution).

intercept_float

Estimated intercept of regression model.

breakdown_float

Approximated breakdown point.

n_iter_int

Number of iterations needed for the spatial median.

n_subpopulation_int

Number of combinations taken into account from ‘n choose k’, where n is the number of samples and k is the number of subsamples.

Examples

>>> from sklearn.linear_model import TheilSenRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(
...     n_samples=200, n_features=2, noise=4.0, random_state=0)
>>> reg = TheilSenRegressor(random_state=0).fit(X, y)
>>> reg.score(X, y)
0.9884...
>>> reg.predict(X[:1,])
array([-31.5871...])

References

Full API documentation: TheilSenRegressorScikitsLearnNode

class mdp.nodes.SVCScikitsLearnNode

C-Support Vector Classification. This node has been automatically generated by wrapping the sklearn.svm._classes.SVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The implementation is based on libsvm. The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples. For large datasets consider using sklearn.svm.LinearSVC or sklearn.linear_model.SGDClassifier instead, possibly after a sklearn.kernel_approximation.Nystroem transformer.

The multiclass support is handled according to a one-vs-one scheme.

For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each other, see the corresponding section in the narrative documentation:

svm_kernels.

Read more in the User Guide.

Parameters

Cfloat, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples).

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

probabilitybool, default=False

Whether to enable probability estimates. This must be enabled prior to calling fit, will slow down that method as it internally uses 5-fold cross-validation, and predict_proba may be inconsistent with predict. Read more in the User Guide.

tolfloat, default=1e-3

Tolerance for stopping criterion.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

class_weightdict or ‘balanced’, default=None

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Changed in version 0.19: decision_function_shape is ‘ovr’ by default.

New in version 0.17: decision_function_shape=’ovr’ is recommended.

Changed in version 0.17: Deprecated decision_function_shape=’ovo’ and None.

break_tiesbool, default=False

If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict.

New in version 0.22.

random_stateint or RandomState instance, default=None

Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

n_support_ndarray of shape (n_class,), dtype=int32

Number of support vectors for each class.

dual_coef_ndarray of shape (n_class-1, n_SV)

Dual coefficients of the support vector in the decision function (see sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide for details.

coef_ndarray of shape (n_class * (n_class-1) / 2, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is a readonly property derived from dual_coef_ and support_vectors_.

intercept_ndarray of shape (n_class * (n_class-1) / 2,)

Constants in decision function.

fit_status_int

0 if correctly fitted, 1 otherwise (will raise warning)

classes_ndarray of shape (n_classes,)

The classes labels.

probA_ : ndarray of shape (n_class * (n_class-1) / 2) probB_ : ndarray of shape (n_class * (n_class-1) / 2)

If probability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If probability=False, it’s an empty array. Platt scaling uses the logistic function 1 / (1 + exp(decision_value * ``probA_ + probB_))`` where probA_ and probB_ are learned from the dataset [2]_. For more information on the multiclass case and training procedure see section 8 of [1]_.

class_weight_ndarray of shape (n_class,)

Multipliers of parameter C for each class. Computed based on the class_weight parameter.

shape_fit_tuple of int of shape (n_dimensions_of_X,)

Array dimensions of training vector X.

Examples

>>> import numpy as np
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.svm import SVC
>>> clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
>>> clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('svc', SVC(gamma='auto'))])
>>> print(clf.predict([[-0.8, -1]]))
[1]

See also

SVR

Support Vector Machine for Regression implemented using libsvm.

LinearSVC

Scalable Linear Support Vector Machine for classification implemented using liblinear. Check the See also section of LinearSVC for more comparison element.

References

1

LIBSVM: A Library for Support Vector Machines

2

Platt, John (1999). “Probabilistic outputs for support vector machines and comparison to regularizedlikelihood methods.”

Full API documentation: SVCScikitsLearnNode

class mdp.nodes.NuSVCScikitsLearnNode

Nu-Support Vector Classification. This node has been automatically generated by wrapping the sklearn.svm._classes.NuSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Similar to SVC but uses a parameter to control the number of support vectors.

The implementation is based on libsvm.

Read more in the User Guide.

Parameters

nufloat, default=0.5

An upper bound on the fraction of margin errors (see User Guide) and a lower bound of the fraction of support vectors. Should be in the interval (0, 1].

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

probabilitybool, default=False

Whether to enable probability estimates. This must be enabled prior to calling fit, will slow down that method as it internally uses 5-fold cross-validation, and predict_proba may be inconsistent with predict. Read more in the User Guide.

tolfloat, default=1e-3

Tolerance for stopping criterion.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

class_weight{dict, ‘balanced’}, default=None

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies as n_samples / (n_classes * np.bincount(y))

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

decision_function_shape{‘ovo’, ‘ovr’}, default=’ovr’

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one (‘ovo’) is always used as multi-class strategy. The parameter is ignored for binary classification.

Changed in version 0.19: decision_function_shape is ‘ovr’ by default.

New in version 0.17: decision_function_shape=’ovr’ is recommended.

Changed in version 0.17: Deprecated decision_function_shape=’ovo’ and None.

break_tiesbool, default=False

If true, decision_function_shape='ovr', and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict.

New in version 0.22.

random_stateint or RandomState instance, default=None

Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when probability is False. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

n_support_ndarray of shape (n_class), dtype=int32

Number of support vectors for each class.

dual_coef_ndarray of shape (n_class-1, n_SV)

Dual coefficients of the support vector in the decision function (see sgd_mathematical_formulation), multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. See the multi-class section of the User Guide for details.

coef_ndarray of shape (n_class * (n_class-1) / 2, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_.

intercept_ndarray of shape (n_class * (n_class-1) / 2,)

Constants in decision function.

classes_ndarray of shape (n_classes,)

The unique classes labels.

fit_status_int

0 if correctly fitted, 1 if the algorithm did not converge.

probA_ : ndarray of shape (n_class * (n_class-1) / 2,) probB_ : ndarray of shape (n_class * (n_class-1) / 2,)

If probability=True, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If probability=False, it’s an empty array. Platt scaling uses the logistic function 1 / (1 + exp(decision_value * ``probA_ + probB_))`` where probA_ and probB_ are learned from the dataset [2]_. For more information on the multiclass case and training procedure see section 8 of [1]_.

class_weight_ndarray of shape (n_class,)

Multipliers of parameter C of each class. Computed based on the class_weight parameter.

shape_fit_tuple of int of shape (n_dimensions_of_X,)

Array dimensions of training vector X.

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.svm import NuSVC
>>> clf = make_pipeline(StandardScaler(), NuSVC())
>>> clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()), ('nusvc', NuSVC())])
>>> print(clf.predict([[-0.8, -1]]))
[1]

See also

SVC

Support Vector Machine for classification using libsvm.

LinearSVC

Scalable linear Support Vector Machine for classification using liblinear.

References

1

LIBSVM: A Library for Support Vector Machines

2

Platt, John (1999). “Probabilistic outputs for support vector machines and comparison to regularizedlikelihood methods.”

Full API documentation: NuSVCScikitsLearnNode

class mdp.nodes.SVRScikitsLearnNode

Epsilon-Support Vector Regression. This node has been automatically generated by wrapping the sklearn.svm._classes.SVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The free parameters in the model are C and epsilon.

The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to datasets with more than a couple of 10000 samples. For large datasets consider using sklearn.svm.LinearSVR or sklearn.linear_model.SGDRegressor instead, possibly after a sklearn.kernel_approximation.Nystroem transformer.

Read more in the User Guide.

Parameters

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

tolfloat, default=1e-3

Tolerance for stopping criterion.

Cfloat, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

epsilonfloat, default=0.1

Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

Attributes

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

dual_coef_ndarray of shape (1, n_SV)

Coefficients of the support vector in the decision function.

coef_ndarray of shape (1, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_.

fit_status_int

0 if correctly fitted, 1 otherwise (will raise warning)

intercept_ndarray of shape (1,)

Constants in decision function.

Examples

>>> from sklearn.svm import SVR
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> regr = make_pipeline(StandardScaler(), SVR(C=1.0, epsilon=0.2))
>>> regr.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('svr', SVR(epsilon=0.2))])

See also

NuSVR

Support Vector Machine for regression implemented using libsvm using a parameter to control the number of support vectors.

LinearSVR

Scalable Linear Support Vector Machine for regression implemented using liblinear.

Notes

References: LIBSVM: A Library for Support Vector Machines

Full API documentation: SVRScikitsLearnNode

class mdp.nodes.NuSVRScikitsLearnNode

Nu Support Vector Regression. This node has been automatically generated by wrapping the sklearn.svm._classes.NuSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Similar to NuSVC, for regression, uses a parameter nu to control the number of support vectors. However, unlike NuSVC, where nu replaces C, here nu replaces the parameter epsilon of epsilon-SVR.

The implementation is based on libsvm.

Read more in the User Guide.

Parameters

nufloat, default=0.5

An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

Cfloat, default=1.0

Penalty parameter C of the error term.

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

tolfloat, default=1e-3

Tolerance for stopping criterion.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

Attributes

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

dual_coef_ndarray of shape (1, n_SV)

Coefficients of the support vector in the decision function.

coef_ndarray of shape (1, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_.

intercept_ndarray of shape (1,)

Constants in decision function.

Examples

>>> from sklearn.svm import NuSVR
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> regr = make_pipeline(StandardScaler(), NuSVR(C=1.0, nu=0.1))
>>> regr.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('nusvr', NuSVR(nu=0.1))])

See also

NuSVC

Support Vector Machine for classification implemented with libsvm with a parameter to control the number of support vectors.

SVR

epsilon Support Vector Machine for regression implemented with libsvm.

Notes

References: LIBSVM: A Library for Support Vector Machines

Full API documentation: NuSVRScikitsLearnNode

class mdp.nodes.OneClassSVMScikitsLearnNode

Unsupervised Outlier Detection. This node has been automatically generated by wrapping the sklearn.svm._classes.OneClassSVM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Estimate the support of a high-dimensional distribution.

The implementation is based on libsvm.

Read more in the User Guide.

Parameters

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’}, default=’rbf’

Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.

degreeint, default=3

Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.

gamma{‘scale’, ‘auto’} or float, default=’scale’

Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,

  • if ‘auto’, uses 1 / n_features.

Changed in version 0.22: The default value of gamma changed from ‘auto’ to ‘scale’.

coef0float, default=0.0

Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.

tolfloat, default=1e-3

Tolerance for stopping criterion.

nufloat, default=0.5

An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.

shrinkingbool, default=True

Whether to use the shrinking heuristic. See the User Guide.

cache_sizefloat, default=200

Specify the size of the kernel cache (in MB).

verbosebool, default=False

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.

max_iterint, default=-1

Hard limit on iterations within solver, or -1 for no limit.

Attributes

support_ndarray of shape (n_SV,)

Indices of support vectors.

support_vectors_ndarray of shape (n_SV, n_features)

Support vectors.

dual_coef_ndarray of shape (1, n_SV)

Coefficients of the support vectors in the decision function.

coef_ndarray of shape (1, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_

intercept_ndarray of shape (1,)

Constant in the decision function.

offset_float

Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples - offset_. The offset is the opposite of intercept_ and is provided for consistency with other outlier detection algorithms.

New in version 0.20.

fit_status_int

0 if correctly fitted, 1 otherwise (will raise warning)

Examples

>>> from sklearn.svm import OneClassSVM
>>> X = [[0], [0.44], [0.45], [0.46], [1]]
>>> clf = OneClassSVM(gamma='auto').fit(X)
>>> clf.predict(X)
array([-1,  1,  1,  1, -1])
>>> clf.score_samples(X)
array([1.7798..., 2.0547..., 2.0556..., 2.0561..., 1.7332...])

Full API documentation: OneClassSVMScikitsLearnNode

class mdp.nodes.LinearSVCScikitsLearnNode

Linear Support Vector Classification. This node has been automatically generated by wrapping the sklearn.svm._classes.LinearSVC class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

This class supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.

Read more in the User Guide.

Parameters

penalty{‘l1’, ‘l2’}, default=’l2’

Specifies the norm used in the penalization. The ‘l2’ penalty is the standard used in SVC. The ‘l1’ leads to coef_ vectors that are sparse.

loss{‘hinge’, ‘squared_hinge’}, default=’squared_hinge’

Specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g. by the SVC class) while ‘squared_hinge’ is the square of the hinge loss.

dualbool, default=True

Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.

tolfloat, default=1e-4

Tolerance for stopping criteria.

Cfloat, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive.

multi_class{‘ovr’, ‘crammer_singer’}, default=’ovr’

Determines the multi-class strategy if y contains more than two classes. "ovr" trains n_classes one-vs-rest classifiers, while "crammer_singer" optimizes a joint objective over all classes. While crammer_singer is interesting from a theoretical perspective as it is consistent, it is seldom used in practice as it rarely leads to better accuracy and is more expensive to compute. If "crammer_singer" is chosen, the options loss, penalty and dual will be ignored.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).

intercept_scalingfloat, default=1

When self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

class_weightdict or ‘balanced’, default=None

Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).

verboseint, default=0

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.

random_stateint or RandomState instance, default=None

Controls the pseudo random number generation for shuffling the data for the dual coordinate descent (if dual=True). When dual=False the underlying implementation of LinearSVC is not random and random_state has no effect on the results. Pass an int for reproducible output across multiple function calls. See Glossary.

max_iterint, default=1000

The maximum number of iterations to be run.

Attributes

coef_ndarray of shape (1, n_features) if n_classes == 2 else (n_classes, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is a readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.

intercept_ndarray of shape (1,) if n_classes == 2 else (n_classes,)

Constants in decision function.

classes_ndarray of shape (n_classes,)

The unique classes labels.

n_iter_int

Maximum number of iterations run across all classes.

See Also

SVC

Implementation of Support Vector Machine classifier using libsvm:

  • the kernel can be non-linear but its SMO algorithm does not

  • scale to large number of samples as LinearSVC does.

Furthermore SVC multi-class mode is implemented using one vs one scheme while LinearSVC uses one vs the rest. It is possible to implement one vs the rest with SVC by using the sklearn.multiclass.OneVsRestClassifier wrapper.

Finally SVC can fit dense data without memory copy if the input is C-contiguous. Sparse data will still incur memory copy though.

sklearn.linear_model.SGDClassifier

SGDClassifier can optimize the same cost function as LinearSVC by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.

Notes

The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon to have slightly different results for the same input data. If that happens, try with a smaller tol parameter.

The underlying implementation, liblinear, uses a sparse internal representation for the data that will incur a memory copy.

Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear in the narrative documentation.

References

LIBLINEAR: A Library for Large Linear Classification

Examples

>>> from sklearn.svm import LinearSVC
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = make_pipeline(StandardScaler(),
...                     LinearSVC(random_state=0, tol=1e-5))
>>> clf.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('linearsvc', LinearSVC(random_state=0, tol=1e-05))])
>>> print(clf.named_steps['linearsvc'].coef_)
[[0.141...   0.526... 0.679... 0.493...]]
>>> print(clf.named_steps['linearsvc'].intercept_)
[0.1693...]
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]

Full API documentation: LinearSVCScikitsLearnNode

class mdp.nodes.LinearSVRScikitsLearnNode

Linear Support Vector Regression. This node has been automatically generated by wrapping the sklearn.svm._classes.LinearSVR class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Similar to SVR with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

This class supports both dense and sparse input.

Read more in the User Guide.

New in version 0.16.

Parameters

epsilonfloat, default=0.0

Epsilon parameter in the epsilon-insensitive loss function. Note that the value of this parameter depends on the scale of the target variable y. If unsure, set epsilon=0.

tolfloat, default=1e-4

Tolerance for stopping criteria.

Cfloat, default=1.0

Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive.

loss{‘epsilon_insensitive’, ‘squared_epsilon_insensitive’}, default=’epsilon_insensitive’

Specifies the loss function. The epsilon-insensitive loss (standard SVR) is the L1 loss, while the squared epsilon-insensitive loss (‘squared_epsilon_insensitive’) is the L2 loss.

fit_interceptbool, default=True

Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).

intercept_scalingfloat, default=1.

When self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

dualbool, default=True

Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.

verboseint, default=0

Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.

random_stateint or RandomState instance, default=None

Controls the pseudo random number generation for shuffling the data. Pass an int for reproducible output across multiple function calls. See Glossary.

max_iterint, default=1000

The maximum number of iterations to be run.

Attributes

coef_ndarray of shape (n_features) if n_classes == 2 else (n_classes, n_features)

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is a readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.

intercept_ndarray of shape (1) if n_classes == 2 else (n_classes)

Constants in decision function.

n_iter_int

Maximum number of iterations run across all classes.

Examples

>>> from sklearn.svm import LinearSVR
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, random_state=0)
>>> regr = make_pipeline(StandardScaler(),
...                      LinearSVR(random_state=0, tol=1e-5))
>>> regr.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('linearsvr', LinearSVR(random_state=0, tol=1e-05))])
>>> print(regr.named_steps['linearsvr'].coef_)
[18.582... 27.023... 44.357... 64.522...]
>>> print(regr.named_steps['linearsvr'].intercept_)
[-4...]
>>> print(regr.predict([[0, 0, 0, 0]]))
[-2.384...]

See also

LinearSVC

Implementation of Support Vector Machine classifier using the same library as this class (liblinear).

SVR

Implementation of Support Vector Machine regression using libsvm:

  • the kernel can be non-linear but its SMO algorithm does not

  • scale to large number of samples as LinearSVC does.

sklearn.linear_model.SGDRegressor

SGDRegressor can optimize the same cost function as LinearSVR by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.

Full API documentation: LinearSVRScikitsLearnNode

class mdp.nodes.CalibratedClassifierCVScikitsLearnNode

Probability calibration with isotonic regression or logistic regression. This node has been automatically generated by wrapping the sklearn.calibration.CalibratedClassifierCV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The calibration is based on the decision_function method of the base_estimator if it exists, else on predict_proba.

Read more in the User Guide.

Parameters

base_estimatorinstance BaseEstimator

The classifier whose output need to be calibrated to provide more accurate predict_proba outputs.

method‘sigmoid’ or ‘isotonic’

The method to use for calibration. Can be ‘sigmoid’ which corresponds to Platt’s method (i.e. a logistic regression model) or ‘isotonic’ which is a non-parametric approach. It is not advised to use isotonic calibration with too few calibration samples (<<1000) since it tends to overfit.

cvinteger, cross-validation generator, iterable or “prefit”, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if y is binary or multiclass, sklearn.model_selection.StratifiedKFold is used. If y is neither binary nor multiclass, sklearn.model_selection.KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

If “prefit” is passed, it is assumed that base_estimator has been fitted already and all data is used for calibration.

Changed in version 0.22: cv default value if None changed from 3-fold to 5-fold.

Attributes

classes_array, shape (n_classes)

The class labels.

calibrated_classifiers_list (len() equal to cv or 1 if cv == “prefit”)

The list of calibrated classifiers, one for each cross-validation fold, which has been fitted on all but the validation fold and calibrated on the validation fold.

References

1

Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001

2

Transforming Classifier Scores into Accurate Multiclass Probability Estimates, B. Zadrozny & C. Elkan, (KDD 2002)

3

Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)

4

Predicting Good Probabilities with Supervised Learning, A. Niculescu-Mizil & R. Caruana, ICML 2005

Full API documentation: CalibratedClassifierCVScikitsLearnNode

class mdp.nodes.NMFScikitsLearnNode

Non-Negative Matrix Factorization (NMF) This node has been automatically generated by wrapping the sklearn.decomposition._nmf.NMF class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Find two non-negative matrices (W, H) whose product approximates the non- negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.

The objective function is:

0.5 * ||X - WH||_Fro^2
+ alpha * l1_ratio * ||vec(W)||_1
+ alpha * l1_ratio * ||vec(H)||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2
+ 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2

Where:

||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm)
||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)

For multiplicative-update (‘mu’) solver, the Frobenius norm (0.5 * ||X - WH||_Fro^2) can be changed into another beta-divergence loss, by changing the beta_loss parameter.

The objective function is minimized with an alternating minimization of W and H.

Read more in the User Guide.

Parameters

n_componentsint or None

Number of components, if n_components is not set all features are kept.

initNone | ‘random’ | ‘nndsvd’ | ‘nndsvda’ | ‘nndsvdar’ | ‘custom’

Method used to initialize the procedure. Default: None. Valid options:

  • None: ‘nndsvd’ if n_components <= min(n_samples, n_features),

    otherwise random.

  • ‘random’: non-negative random matrices, scaled with:

    • sqrt(X.mean() / n_components)

  • ‘nndsvd’: Nonnegative Double Singular Value Decomposition (NNDSVD)

    initialization (better for sparseness)

  • ‘nndsvda’: NNDSVD with zeros filled with the average of X

    (better when sparsity is not desired)

  • ‘nndsvdar’: NNDSVD with zeros filled with small random values

    (generally faster, less accurate alternative to NNDSVDa for when sparsity is not desired)

  • ‘custom’: use custom matrices W and H

solver‘cd’ | ‘mu’

Numerical solver to use:

  • ‘cd’ is a Coordinate Descent solver.

  • ‘mu’ is a Multiplicative Update solver.

New in version 0.17: Coordinate Descent solver.

New in version 0.19: Multiplicative Update solver.

beta_lossfloat or string, default ‘frobenius’

String must be in {‘frobenius’, ‘kullback-leibler’, ‘itakura-saito’}. Beta divergence to be minimized, measuring the distance between X and the dot product WH. Note that values different from ‘frobenius’ (or 2) and ‘kullback-leibler’ (or 1) lead to significantly slower fits. Note that for beta_loss <= 0 (or ‘itakura-saito’), the input matrix X cannot contain zeros. Used only in ‘mu’ solver.

New in version 0.19.

tolfloat, default: 1e-4

Tolerance of the stopping condition.

max_iterinteger, default: 200

Maximum number of iterations before timing out.

random_stateint, RandomState instance, default=None

Used for initialisation (when init == ‘nndsvdar’ or ‘random’), and in Coordinate Descent. Pass an int for reproducible results across multiple function calls. See Glossary.

alphadouble, default: 0.

Constant that multiplies the regularization terms. Set it to zero to have no regularization.

New in version 0.17: alpha used in the Coordinate Descent solver.

l1_ratiodouble, default: 0.

The regularization mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an elementwise L2 penalty (aka Frobenius Norm). For l1_ratio = 1 it is an elementwise L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.

New in version 0.17: Regularization parameter l1_ratio used in the Coordinate Descent solver.

verbosebool, default=False

Whether to be verbose.

shuffleboolean, default: False

If true, randomize the order of coordinates in the CD solver.

New in version 0.17: shuffle parameter used in the Coordinate Descent solver.

Attributes

components_array, [n_components, n_features]

Factorization matrix, sometimes called ‘dictionary’.

n_components_integer

The number of components. It is same as the n_components parameter if it was given. Otherwise, it will be same as the number of features.

reconstruction_err_number

Frobenius norm of the matrix difference, or beta-divergence, between the training data X and the reconstructed data WH from the fitted model.

n_iter_int

Actual number of iterations.

Examples

>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import NMF
>>> model = NMF(n_components=2, init='random', random_state=0)
>>> W = model.fit_transform(X)
>>> H = model.components_

References

Cichocki, Andrzej, and P. H. A. N. Anh-Huy. “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708-721, 2009.

Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9).

Full API documentation: NMFScikitsLearnNode

class mdp.nodes.PCAScikitsLearnNode

Principal component analysis (PCA). This node has been automatically generated by wrapping the sklearn.decomposition._pca.PCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the truncated SVD.

Notice that this class does not support sparse input. See TruncatedSVD for an alternative with sparse data.

Read more in the User Guide.

Parameters

n_componentsint, float, None or str

Number of components to keep. if n_components is not set all components are kept:

n_components == min(n_samples, n_features)

If n_components == 'mle' and svd_solver == 'full', Minka’s MLE is used to guess the dimension. Use of n_components == 'mle' will interpret svd_solver == 'auto' as svd_solver == 'full'.

If 0 < n_components < 1 and svd_solver == 'full', select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.

If svd_solver == 'arpack', the number of components must be strictly less than the minimum of n_features and n_samples.

Hence, the None case results in:

n_components == min(n_samples, n_features) - 1
copybool, default=True

If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.

whitenbool, optional (default False)

When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hard-wired assumptions.

svd_solverstr {‘auto’, ‘full’, ‘arpack’, ‘randomized’}

If auto :

  • The solver is selected by a default policy based on X.shape and

  • n_components: if the input data is larger than 500x500 and the

  • number of components to extract is lower than 80% of the smallest

  • dimension of the data, then the more efficient ‘randomized’

  • method is enabled. Otherwise the exact full SVD is computed and

  • optionally truncated afterwards.

If full :

  • run exact full SVD calling the standard LAPACK solver via

  • scipy.linalg.svd and select the components by postprocessing

If arpack :

  • run SVD truncated to n_components calling ARPACK solver via

  • scipy.sparse.linalg.svds. It requires strictly

  • 0 < n_components < min(X.shape)

If randomized :

  • run randomized SVD by the method of Halko et al.

New in version 0.18.0.

tolfloat >= 0, optional (default .0)

Tolerance for singular values computed by svd_solver == ‘arpack’.

New in version 0.18.0.

iterated_powerint >= 0, or ‘auto’, (default ‘auto’)

Number of iterations for the power method computed by svd_solver == ‘randomized’.

New in version 0.18.0.

random_stateint, RandomState instance, default=None

Used when svd_solver == ‘arpack’ or ‘randomized’. Pass an int for reproducible results across multiple function calls. See Glossary.

New in version 0.18.0.

Attributes

components_array, shape (n_components, n_features)

Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_.

explained_variance_array, shape (n_components,)

The amount of variance explained by each of the selected components.

Equal to n_components largest eigenvalues of the covariance matrix of X.

New in version 0.18.

explained_variance_ratio_array, shape (n_components,)

Percentage of variance explained by each of the selected components.

If n_components is not set then all components are stored and the sum of the ratios is equal to 1.0.

singular_values_array, shape (n_components,)

The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the n_components variables in the lower-dimensional space.

New in version 0.19.

mean_array, shape (n_features,)

Per-feature empirical mean, estimated from the training set.

Equal to X.mean(axis=0).

n_components_int

The estimated number of components. When n_components is set to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this number is estimated from input data. Otherwise it equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.

n_features_int

Number of features in the training data.

n_samples_int

Number of samples in the training data.

noise_variance_float

The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf. It is required to compute the estimated data covariance and score samples.

Equal to the average of (min(n_features, n_samples) - n_components) smallest eigenvalues of the covariance matrix of X.

See Also

KernelPCA : Kernel Principal Component Analysis. SparsePCA : Sparse Principal Component Analysis. TruncatedSVD : Dimensionality reduction using truncated SVD. IncrementalPCA : Incremental Principal Component Analysis.

References

For n_components == ‘mle’, this class uses the method of Minka, T. P. “Automatic choice of dimensionality for PCA”. In NIPS, pp. 598-604

Implements the probabilistic PCA model from:

Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic principal component analysis”. Journal of the Royal Statistical Society:

Series B (Statistical Methodology), 61(3), 611-622. via the score and score_samples methods. See http://www.miketipping.com/papers/met-mppca.pdf

For svd_solver == ‘arpack’, refer to scipy.sparse.linalg.svds.

For svd_solver == ‘randomized’, see:

Halko, N., Martinsson, P. G., and Tropp, J. A. (2011). “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions”. SIAM review, 53(2), 217-288. and also Martinsson, P. G., Rokhlin, V., and Tygert, M. (2011). “A randomized algorithm for the decomposition of matrices”. Applied and Computational Harmonic Analysis, 30(1), 47-68.

Examples

>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(n_components=2)
>>> print(pca.explained_variance_ratio_)
[0.9924... 0.0075...]
>>> print(pca.singular_values_)
[6.30061... 0.54980...]
>>> pca = PCA(n_components=2, svd_solver='full')
>>> pca.fit(X)
PCA(n_components=2, svd_solver='full')
>>> print(pca.explained_variance_ratio_)
[0.9924... 0.00755...]
>>> print(pca.singular_values_)
[6.30061... 0.54980...]
>>> pca = PCA(n_components=1, svd_solver='arpack')
>>> pca.fit(X)
PCA(n_components=1, svd_solver='arpack')
>>> print(pca.explained_variance_ratio_)
[0.99244...]
>>> print(pca.singular_values_)
[6.30061...]

Full API documentation: PCAScikitsLearnNode

class mdp.nodes.IncrementalPCAScikitsLearnNode

Incremental principal components analysis (IPCA). This node has been automatically generated by wrapping the sklearn.decomposition._incremental_pca.IncrementalPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Linear dimensionality reduction using Singular Value Decomposition of the data, keeping only the most significant singular vectors to project the data to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA, and allows sparse input.

This algorithm has constant memory complexity, on the order of batch_size * n_features, enabling use of np.memmap files without loading the entire file into memory. For sparse matrices, the input is converted to dense in batches (in order to be able to subtract the mean) which avoids storing the entire dense matrix at any one time.

The computational overhead of each SVD is O(batch_size * n_features ** 2), but only 2 * batch_size samples remain in memory at a time. There will be n_samples / batch_size SVD computations to get the principal components, versus 1 large SVD of complexity O(n_samples * n_features ** 2) for PCA.

Read more in the User Guide.

New in version 0.16.

Parameters

n_componentsint or None, (default=None)

Number of components to keep. If n_components `` is ``None, then n_components is set to min(n_samples, n_features).

whitenbool, optional

When True (False by default) the components_ vectors are divided by n_samples times components_ to ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometimes improve the predictive accuracy of the downstream estimators by making data respect some hard-wired assumptions.

copybool, (default=True)

If False, X will be overwritten. copy=False can be used to save memory but is unsafe for general use.

batch_sizeint or None, (default=None)

The number of samples to use for each batch. Only used when calling fit. If batch_size is None, then batch_size is inferred from the data and set to 5 * n_features, to provide a balance between approximation accuracy and memory consumption.

Attributes

components_array, shape (n_components, n_features)

Components with maximum variance.

explained_variance_array, shape (n_components,)

Variance explained by each of the selected components.

explained_variance_ratio_array, shape (n_components,)

Percentage of variance explained by each of the selected components. If all components are stored, the sum of explained variances is equal to 1.0.

singular_values_array, shape (n_components,)

The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the n_components variables in the lower-dimensional space.

mean_array, shape (n_features,)

Per-feature empirical mean, aggregate over calls to partial_fit.

var_array, shape (n_features,)

Per-feature empirical variance, aggregate over calls to partial_fit.

noise_variance_float

The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf.

n_components_int

The estimated number of components. Relevant when n_components=None.

n_samples_seen_int

The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls.

batch_size_int

Inferred batch size from batch_size.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.decomposition import IncrementalPCA
>>> from scipy import sparse
>>> X, _ = load_digits(return_X_y=True)
>>> transformer = IncrementalPCA(n_components=7, batch_size=200)
>>> # either partially fit on smaller batches of data
>>> transformer.partial_fit(X[:100, :])
IncrementalPCA(batch_size=200, n_components=7)
>>> # or let the fit function itself divide the data into batches
>>> X_sparse = sparse.csr_matrix(X)
>>> X_transformed = transformer.fit_transform(X_sparse)
>>> X_transformed.shape
(1797, 7)

Notes

Implements the incremental PCA model from:

D. Ross, J. Lim, R. Lin, M. Yang, Incremental Learning for Robust Visual Tracking, International Journal of Computer Vision, Volume 77, Issue 1-3, pp. 125-141, May 2008. See https://www.cs.toronto.edu/~dross/ivt/RossLimLinYang_ijcv.pdf

This model is an extension of the Sequential Karhunen-Loeve Transform from:

A. Levy and M. Lindenbaum, Sequential Karhunen-Loeve Basis Extraction and its Application to Images, IEEE Transactions on Image Processing, Volume 9, Number 8, pp. 1371-1374, August 2000. See https://www.cs.technion.ac.il/~mic/doc/skl-ip.pdf

We have specifically abstained from an optimization used by authors of both papers, a QR decomposition used in specific situations to reduce the algorithmic complexity of the SVD. The source for this technique is Matrix Computations, Third Edition, G. Holub and C. Van Loan, Chapter 5, section 5.4.4, pp 252-253.. This technique has been omitted because it is advantageous only when decomposing a matrix with n_samples (rows) >= 5/3 * n_features (columns), and hurts the readability of the implemented algorithm. This would be a good opportunity for future optimization, if it is deemed necessary.

References

D. Ross, J. Lim, R. Lin, M. Yang. Incremental Learning for Robust Visual Tracking, International Journal of Computer Vision, Volume 77, Issue 1-3, pp. 125-141, May 2008.

G. Golub and C. Van Loan. Matrix Computations, Third Edition, Chapter 5, Section 5.4.4, pp. 252-253.

See also

PCA KernelPCA SparsePCA TruncatedSVD

Full API documentation: IncrementalPCAScikitsLearnNode

class mdp.nodes.KernelPCAScikitsLearnNode

Kernel Principal component analysis (KPCA) This node has been automatically generated by wrapping the sklearn.decomposition._kernel_pca.KernelPCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Non-linear dimensionality reduction through the use of kernels (see metrics).

Read more in the User Guide.

Parameters

n_componentsint, default=None

Number of components. If None, all non-zero components are kept.

kernel“linear” | “poly” | “rbf” | “sigmoid” | “cosine” | “precomputed”

Kernel. Default=”linear”.

gammafloat, default=1/n_features

Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels.

degreeint, default=3

Degree for poly kernels. Ignored by other kernels.

coef0float, default=1

Independent term in poly and sigmoid kernels. Ignored by other kernels.

kernel_paramsmapping of string to any, default=None

Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.

alphaint, default=1.0

Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True).

fit_inverse_transformbool, default=False

Learn the inverse transform for non-precomputed kernels. (i.e. learn to find the pre-image of a point)

eigen_solverstring [‘auto’|’dense’|’arpack’], default=’auto’

Select eigensolver to use. If n_components is much less than the number of training samples, arpack may be more efficient than the dense eigensolver.

tolfloat, default=0

Convergence tolerance for arpack. If 0, optimal value will be chosen by arpack.

max_iterint, default=None

Maximum number of iterations for arpack. If None, optimal value will be chosen by arpack.

remove_zero_eigboolean, default=False

If True, then all components with zero eigenvalues are removed, so that the number of components in the output may be < n_components (and sometimes even zero due to numerical instability). When n_components is None, this parameter is ignored and components with zero eigenvalues are removed regardless.

random_stateint, RandomState instance, default=None

Used when eigen_solver == ‘arpack’. Pass an int for reproducible results across multiple function calls. See Glossary.

New in version 0.18.

copy_Xboolean, default=True

If True, input X is copied and stored by the model in the X_fit_ attribute. If no further changes will be done to X, setting copy_X=False saves memory by storing a reference.

New in version 0.18.

n_jobsint or None, optional (default=None)

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

New in version 0.18.

Attributes

lambdas_array, (n_components,)

Eigenvalues of the centered kernel matrix in decreasing order. If n_components and remove_zero_eig are not set, then all values are stored.

alphas_array, (n_samples, n_components)

Eigenvectors of the centered kernel matrix. If n_components and remove_zero_eig are not set, then all components are stored.

dual_coef_array, (n_samples, n_features)

Inverse transform matrix. Only available when fit_inverse_transform is True.

X_transformed_fit_array, (n_samples, n_components)

Projection of the fitted data on the kernel principal components. Only available when fit_inverse_transform is True.

X_fit_(n_samples, n_features)

The data used to fit the model. If copy_X=False, then X_fit_ is a reference. This attribute is used for the calls to transform.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.decomposition import KernelPCA
>>> X, _ = load_digits(return_X_y=True)
>>> transformer = KernelPCA(n_components=7, kernel='linear')
>>> X_transformed = transformer.fit_transform(X)
>>> X_transformed.shape
(1797, 7)

References

Kernel PCA was introduced in:

  • Bernhard Schoelkopf, Alexander J. Smola,

  • and Klaus-Robert Mueller. 1999. Kernel principal

  • component analysis. In Advances in kernel methods,

  • MIT Press, Cambridge, MA, USA 327-352.

Full API documentation: KernelPCAScikitsLearnNode

class mdp.nodes.SparsePCAScikitsLearnNode

Sparse Principal Components Analysis (SparsePCA) This node has been automatically generated by wrapping the sklearn.decomposition._sparse_pca.SparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Read more in the User Guide.

Parameters

n_componentsint,

Number of sparse atoms to extract.

alphafloat,

Sparsity controlling parameter. Higher values lead to sparser components.

ridge_alphafloat,

Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.

max_iterint,

Maximum number of iterations to perform.

tolfloat,

Tolerance for the stopping condition.

method{‘lars’, ‘cd’}

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

n_jobsint or None, optional (default=None)

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

U_initarray of shape (n_samples, n_components),

Initial values for the loadings for warm restart scenarios.

V_initarray of shape (n_components, n_features),

Initial values for the components for warm restart scenarios.

verboseint

Controls the verbosity; the higher, the more messages. Defaults to 0.

random_stateint, RandomState instance, default=None

Used during dictionary learning. Pass an int for reproducible results across multiple function calls. See Glossary.

normalize_components‘deprecated’

This parameter does not have any effect. The components are always normalized.

New in version 0.20.

Deprecated since version 0.22: normalize_components is deprecated in 0.22 and will be removed in 0.24.

Attributes

components_array, [n_components, n_features]

Sparse components extracted from the data.

error_array

Vector of errors at each iteration.

n_components_int

Estimated number of components.

New in version 0.23.

n_iter_int

Number of iterations run.

mean_array, shape (n_features,)

Per-feature empirical mean, estimated from the training set. Equal to X.mean(axis=0).

Examples

>>> import numpy as np
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.decomposition import SparsePCA
>>> X, _ = make_friedman1(n_samples=200, n_features=30, random_state=0)
>>> transformer = SparsePCA(n_components=5, random_state=0)
>>> transformer.fit(X)
SparsePCA(...)
>>> X_transformed = transformer.transform(X)
>>> X_transformed.shape
(200, 5)
>>> # most values in the ``components_`` are zero (sparsity)
>>> np.mean(transformer.components_ == 0)
0.9666...

See also

PCA MiniBatchSparsePCA DictionaryLearning

Full API documentation: SparsePCAScikitsLearnNode

class mdp.nodes.MiniBatchSparsePCAScikitsLearnNode

Mini-batch Sparse Principal Components Analysis This node has been automatically generated by wrapping the sklearn.decomposition._sparse_pca.MiniBatchSparsePCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Read more in the User Guide.

Parameters

n_componentsint,

number of sparse atoms to extract

alphaint,

Sparsity controlling parameter. Higher values lead to sparser components.

ridge_alphafloat,

Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.

n_iterint,

number of iterations to perform for each mini batch

callbackcallable or None, optional (default: None)

callable that gets invoked every five iterations

batch_sizeint,

the number of features to take in each mini batch

verboseint

Controls the verbosity; the higher, the more messages. Defaults to 0.

shuffleboolean,

whether to shuffle the data before splitting it in batches

n_jobsint or None, optional (default=None)

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

method{‘lars’, ‘cd’}

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

random_stateint, RandomState instance, default=None

Used for random shuffling when shuffle is set to True, during online dictionary learning. Pass an int for reproducible results across multiple function calls. See Glossary.

normalize_components‘deprecated’

This parameter does not have any effect. The components are always normalized.

New in version 0.20.

Deprecated since version 0.22: normalize_components is deprecated in 0.22 and will be removed in 0.24.

Attributes

components_array, [n_components, n_features]

Sparse components extracted from the data.

n_components_int

Estimated number of components.

New in version 0.23.

n_iter_int

Number of iterations run.

mean_array, shape (n_features,)

Per-feature empirical mean, estimated from the training set. Equal to X.mean(axis=0).

Examples

>>> import numpy as np
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.decomposition import MiniBatchSparsePCA
>>> X, _ = make_friedman1(n_samples=200, n_features=30, random_state=0)
>>> transformer = MiniBatchSparsePCA(n_components=5, batch_size=50,
...                                  random_state=0)
>>> transformer.fit(X)
MiniBatchSparsePCA(...)
>>> X_transformed = transformer.transform(X)
>>> X_transformed.shape
(200, 5)
>>> # most values in the ``components_`` are zero (sparsity)
>>> np.mean(transformer.components_ == 0)
0.94

See also

PCA SparsePCA DictionaryLearning

Full API documentation: MiniBatchSparsePCAScikitsLearnNode

class mdp.nodes.TruncatedSVDScikitsLearnNode

Dimensionality reduction using truncated SVD (aka LSA). This node has been automatically generated by wrapping the sklearn.decomposition._truncated_svd.TruncatedSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This transformer performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Contrary to PCA, this estimator does not center the data before computing the singular value decomposition. This means it can work with scipy.sparse matrices efficiently.

In particular, truncated SVD works on term count/tf-idf matrices as returned by the vectorizers in sklearn.feature_extraction.text. In that context, it is known as latent semantic analysis (LSA).

This estimator supports two algorithms: a fast randomized SVD solver, and a “naive” algorithm that uses ARPACK as an eigensolver on (X * X.T) or (X.T * X), whichever is more efficient.

Read more in the User Guide.

Parameters

n_componentsint, default = 2

Desired dimensionality of output data. Must be strictly less than the number of features. The default value is useful for visualisation. For LSA, a value of 100 is recommended.

algorithmstring, default = “randomized”

SVD solver to use. Either “arpack” for the ARPACK wrapper in SciPy (scipy.sparse.linalg.svds), or “randomized” for the randomized algorithm due to Halko (2009).

n_iterint, optional (default 5)

Number of iterations for randomized SVD solver. Not used by ARPACK. The default is larger than the default in ~sklearn.utils.extmath.randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.

random_stateint, RandomState instance, default=None

Used during randomized svd. Pass an int for reproducible results across multiple function calls. See Glossary.

tolfloat, optional

Tolerance for ARPACK. 0 means machine precision. Ignored by randomized SVD solver.

Attributes

components_ : array, shape (n_components, n_features)

explained_variance_array, shape (n_components,)

The variance of the training samples transformed by a projection to each component.

explained_variance_ratio_array, shape (n_components,)

Percentage of variance explained by each of the selected components.

singular_values_array, shape (n_components,)

The singular values corresponding to each of the selected components. The singular values are equal to the 2-norms of the n_components variables in the lower-dimensional space.

Examples

>>> from sklearn.decomposition import TruncatedSVD
>>> from scipy.sparse import random as sparse_random
>>> from sklearn.random_projection import sparse_random_matrix
>>> X = sparse_random(100, 100, density=0.01, format='csr',
...                   random_state=42)
>>> svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)
>>> svd.fit(X)
TruncatedSVD(n_components=5, n_iter=7, random_state=42)
>>> print(svd.explained_variance_ratio_)
[0.0646... 0.0633... 0.0639... 0.0535... 0.0406...]
>>> print(svd.explained_variance_ratio_.sum())
0.286...
>>> print(svd.singular_values_)
[1.553... 1.512...  1.510... 1.370... 1.199...]

See also

PCA

References

Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions Halko, et al., 2009 (arXiv:909) https://arxiv.org/pdf/0909.4061.pdf

Notes

SVD suffers from a problem called “sign indeterminacy”, which means the sign of the components_ and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.

Full API documentation: TruncatedSVDScikitsLearnNode

class mdp.nodes.FastICAScikitsLearnNode

FastICA: a fast algorithm for Independent Component Analysis. This node has been automatically generated by wrapping the sklearn.decomposition._fastica.FastICA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_componentsint, optional

Number of components to use. If none is passed, all are used.

algorithm{‘parallel’, ‘deflation’}

Apply parallel or deflational algorithm for FastICA.

whitenboolean, optional

If whiten is false, the data is already considered to be whitened, and no whitening is performed.

funstring or function, optional. Default: ‘logcosh’

The functional form of the G function used in the approximation to neg-entropy. Could be either ‘logcosh’, ‘exp’, or ‘cube’. You can also provide your own function. It should return a tuple containing the value of the function, and of its derivative, in the point. Example:

def my_g(x):

  • return x ** 3, (3 * x ** 2).mean(axis=-1)

fun_argsdictionary, optional

Arguments to send to the functional form. If empty and if fun=’logcosh’, fun_args will take value {‘alpha’ : 1.0}.

max_iterint, optional

Maximum number of iterations during fit.

tolfloat, optional

Tolerance on update at each iteration.

w_initNone of an (n_components, n_components) ndarray

The mixing matrix to be used to initialize the algorithm.

random_stateint, RandomState instance, default=None

Used to initialize w_init when not specified, with a normal distribution. Pass an int, for reproducible results across multiple function calls. See Glossary.

Attributes

components_2D array, shape (n_components, n_features)

The linear operator to apply to the data to get the independent sources. This is equal to the unmixing matrix when whiten is False, and equal to np.dot(unmixing_matrix, self.whitening_) when whiten is True.

mixing_array, shape (n_features, n_components)

The pseudo-inverse of components_. It is the linear operator that maps independent sources to the data.

mean_array, shape(n_features)

The mean over features. Only set if self.whiten is True.

n_iter_int

If the algorithm is “deflation”, n_iter is the maximum number of iterations run across all components. Else they are just the number of iterations taken to converge.

whitening_array, shape (n_components, n_features)

Only set if whiten is ‘True’. This is the pre-whitening matrix that projects data onto the first n_components principal components.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.decomposition import FastICA
>>> X, _ = load_digits(return_X_y=True)
>>> transformer = FastICA(n_components=7,
...         random_state=0)
>>> X_transformed = transformer.fit_transform(X)
>>> X_transformed.shape
(1797, 7)

Notes

Implementation based on *A. Hyvarinen and E. Oja, Independent Component Analysis:

Algorithms and Applications, Neural Networks, 13(4-5), 2000, pp. 411-430*

Full API documentation: FastICAScikitsLearnNode

class mdp.nodes.DictionaryLearningScikitsLearnNode

Dictionary learning This node has been automatically generated by wrapping the sklearn.decomposition._dict_learning.DictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
            (U,V)
            with || V_k ||_2 = 1 for all  0 <= k < n_components

Read more in the User Guide.

Parameters

n_componentsint, default=n_features

number of dictionary elements to extract

alphafloat, default=1.0

sparsity controlling parameter

max_iterint, default=1000

maximum number of iterations to perform

tolfloat, default=1e-8

tolerance for numerical error

fit_algorithm{‘lars’, ‘cd’}, default=’lars’

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

New in version 0.17: cd coordinate descent method to improve speed.

transform_algorithm{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}, default=’omp’

Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X'

New in version 0.17: lasso_cd coordinate descent method to improve speed.

transform_n_nonzero_coefsint, default=0.1*n_features

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.

transform_alphafloat, default=1.0

If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.

n_jobsint or None, default=None

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

code_initarray of shape (n_samples, n_components), default=None

initial value for the code, for warm restart

dict_initarray of shape (n_components, n_features), default=None

initial values for the dictionary, for warm restart

verbosebool, default=False

To control the verbosity of the procedure.

split_signbool, default=False

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

random_stateint, RandomState instance or None, optional (default=None)

Used for initializing the dictionary when dict_init is not specified, randomly shuffling the data when shuffle is set to True, and updating the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

New in version 0.20.

positive_dictbool, default=False

Whether to enforce positivity when finding the dictionary

New in version 0.20.

transform_max_iterint, default=1000

Maximum number of iterations to perform if algorithm=’lasso_cd’ or lasso_lars.

New in version 0.22.

Attributes

components_array, [n_components, n_features]

dictionary atoms extracted from the data

error_array

vector of errors at each iteration

n_iter_int

Number of iterations run.

Notes

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (https://www.di.ens.fr/sierra/pdfs/icml09.pdf)

See also

SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA

Full API documentation: DictionaryLearningScikitsLearnNode

class mdp.nodes.MiniBatchDictionaryLearningScikitsLearnNode

Mini-batch dictionary learning This node has been automatically generated by wrapping the sklearn.decomposition._dict_learning.MiniBatchDictionaryLearning class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.

Solves the optimization problem:

(U^*,V^*) = argmin 0.5 || Y - U V ||_2^2 + alpha * || U ||_1
             (U,V)
             with || V_k ||_2 = 1 for all  0 <= k < n_components

Read more in the User Guide.

Parameters

n_componentsint,

number of dictionary elements to extract

alphafloat,

sparsity controlling parameter

n_iterint,

total number of iterations to perform

fit_algorithm{‘lars’, ‘cd’}

lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.

n_jobsint or None, optional (default=None)

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

batch_sizeint,

number of samples in each mini-batch

shufflebool,

whether to shuffle the samples before forming batches

dict_initarray of shape (n_components, n_features),

initial value of the dictionary for warm restart scenarios

transform_algorithm{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}

Algorithm used to transform the data. lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X’

transform_n_nonzero_coefsint, 0.1 * n_features by default

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.

transform_alphafloat, 1. by default

If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.

verbosebool, optional (default: False)

To control the verbosity of the procedure.

split_signbool, False by default

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

random_stateint, RandomState instance or None, optional (default=None)

Used for initializing the dictionary when dict_init is not specified, randomly shuffling the data when shuffle is set to True, and updating the dictionary. Pass an int for reproducible results across multiple function calls. See Glossary.

positive_codebool

Whether to enforce positivity when finding the code.

New in version 0.20.

positive_dictbool

Whether to enforce positivity when finding the dictionary.

New in version 0.20.

transform_max_iterint, optional (default=1000)

Maximum number of iterations to perform if algorithm=’lasso_cd’ or lasso_lars.

New in version 0.22.

Attributes

components_array, [n_components, n_features]

components extracted from the data

inner_stats_tuple of (A, B) ndarrays

Internal sufficient statistics that are kept by the algorithm. Keeping them is useful in online settings, to avoid losing the history of the evolution, but they shouldn’t have any use for the end user. A (n_components, n_components) is the dictionary covariance matrix. B (n_features, n_components) is the data approximation matrix

n_iter_int

Number of iterations run.

iter_offset_int

The number of iteration on data batches that has been performed before.

random_state_RandomState

RandomState instance that is generated either from a seed, the random number generattor or by np.random.

Notes

References:

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (https://www.di.ens.fr/sierra/pdfs/icml09.pdf)

See also

SparseCoder DictionaryLearning SparsePCA MiniBatchSparsePCA

Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode

class mdp.nodes.SparseCoderScikitsLearnNode

Sparse coding This node has been automatically generated by wrapping the sklearn.decomposition._dict_learning.SparseCoder class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Finds a sparse representation of data against a fixed, precomputed dictionary.

Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that:

X ~= code * dictionary

Read more in the User Guide.

Parameters

dictionaryarray, [n_components, n_features]

The dictionary atoms used for sparse coding. Lines are assumed to be normalized to unit norm.

transform_algorithm{‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}, default=’omp’

Algorithm used to transform the data:

  • lars: uses the least angle regression method (linear_model.lars_path)

  • lasso_lars: uses Lars to compute the Lasso solution

  • lasso_cd: uses the coordinate descent method to compute the

  • Lasso solution (linear_model.Lasso). lasso_lars will be faster if

  • the estimated components are sparse.

  • omp: uses orthogonal matching pursuit to estimate the sparse solution

  • threshold: squashes to zero all coefficients less than alpha from

  • the projection dictionary * X'

transform_n_nonzero_coefsint, default=0.1*n_features

Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.

transform_alphafloat, default=1.

If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.

split_signbool, default=False

Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.

n_jobsint or None, default=None

Number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

positive_codebool, default=False

Whether to enforce positivity when finding the code.

New in version 0.20.

transform_max_iterint, default=1000

Maximum number of iterations to perform if algorithm=’lasso_cd’ or lasso_lars.

New in version 0.22.

Attributes

components_array, [n_components, n_features]

The unchanged dictionary atoms

See also

DictionaryLearning MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA sparse_encode

Full API documentation: SparseCoderScikitsLearnNode

class mdp.nodes.FactorAnalysisScikitsLearnNode

Factor Analysis (FA) This node has been automatically generated by wrapping the sklearn.decomposition._factor_analysis.FactorAnalysis class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A simple linear generative model with Gaussian latent variables.

The observations are assumed to be caused by a linear transformation of lower dimensional latent factors and added Gaussian noise. Without loss of generality the factors are distributed according to a Gaussian with zero mean and unit covariance. The noise is also zero mean and has an arbitrary diagonal covariance matrix.

If we would restrict the model further, by assuming that the Gaussian noise is even isotropic (all diagonal entries are the same) we would obtain PPCA.

FactorAnalysis performs a maximum likelihood estimate of the so-called loading matrix, the transformation of the latent variables to the observed ones, using SVD based approach.

Read more in the User Guide.

New in version 0.13.

Parameters

n_componentsint | None

Dimensionality of latent space, the number of components of X that are obtained after transform. If None, n_components is set to the number of features.

tolfloat

Stopping tolerance for log-likelihood increase.

copybool

Whether to make a copy of X. If False, the input X gets overwritten during fitting.

max_iterint

Maximum number of iterations.

noise_variance_initNone | array, shape=(n_features,)

The initial guess of the noise variance for each feature. If None, it defaults to np.ones(n_features)

svd_method{‘lapack’, ‘randomized’}

Which SVD method to use. If ‘lapack’ use standard SVD from scipy.linalg, if ‘randomized’ use fast randomized_svd function. Defaults to ‘randomized’. For most applications ‘randomized’ will be sufficiently precise while providing significant speed gains. Accuracy can also be improved by setting higher values for iterated_power. If this is not sufficient, for maximum precision you should choose ‘lapack’.

iterated_powerint, optional

Number of iterations for the power method. 3 by default. Only used if svd_method equals ‘randomized’

random_stateint, RandomState instance, default=0

Only used when svd_method equals ‘randomized’. Pass an int for reproducible results across multiple function calls. See Glossary.

Attributes

components_array, [n_components, n_features]

Components with maximum variance.

loglike_list, [n_iterations]

The log likelihood at each iteration.

noise_variance_array, shape=(n_features,)

The estimated noise variance for each feature.

n_iter_int

Number of iterations run.

mean_array, shape (n_features,)

Per-feature empirical mean, estimated from the training set.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.decomposition import FactorAnalysis
>>> X, _ = load_digits(return_X_y=True)
>>> transformer = FactorAnalysis(n_components=7, random_state=0)
>>> X_transformed = transformer.fit_transform(X)
>>> X_transformed.shape
(1797, 7)

References

See also

PCA: Principal component analysis is also a latent linear variable model

which however assumes equal noise variance for each feature. This extra assumption makes probabilistic PCA faster as it can be computed in closed form.

FastICA: Independent component analysis, a latent variable model with

non-Gaussian latent variables.

Full API documentation: FactorAnalysisScikitsLearnNode

class mdp.nodes.LatentDirichletAllocationScikitsLearnNode

Latent Dirichlet Allocation with online variational Bayes algorithm This node has been automatically generated by wrapping the sklearn.decomposition._lda.LatentDirichletAllocation class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. .. versionadded:: 0.17

Read more in the User Guide.

Parameters

n_componentsint, optional (default=10)

Number of topics.

Changed in version 0.19: n_topics `` was renamed to ``n_components

doc_topic_priorfloat, optional (default=None)

Prior of document topic distribution theta. If the value is None, defaults to 1 / n_components. In [1]_, this is called alpha.

topic_word_priorfloat, optional (default=None)

Prior of topic word distribution beta. If the value is None, defaults to 1 / n_components. In [1]_, this is called eta.

learning_method‘batch’ | ‘online’, default=’batch’

Method used to update _component. Only used in fit() method. In general, if the data size is large, the online update will be much faster than the batch update.

Valid options:

'batch': Batch variational Bayes method. Use all training data in
    each EM update.
    Old `components_` will be overwritten in each iteration.
'online': Online variational Bayes method. In each EM update, use
    mini-batch of training data to update the ``components_``
    variable incrementally. The learning rate is controlled by the
    ``learning_decay`` and the ``learning_offset`` parameters.

Changed in version 0.20: The default learning method is now "batch".

learning_decayfloat, optional (default=0.7)

It is a parameter that control learning rate in the online learning method. The value should be set between (0.5, 1.0] to guarantee asymptotic convergence. When the value is 0.0 and batch_size is n_samples, the update method is same as batch learning. In the literature, this is called kappa.

learning_offsetfloat, optional (default=10.)

A (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0.

max_iterinteger, optional (default=10)

The maximum number of iterations.

batch_sizeint, optional (default=128)

Number of documents to use in each EM iteration. Only used in online learning.

evaluate_everyint, optional (default=0)

How often to evaluate perplexity. Only used in fit method. set it to 0 or negative number to not evaluate perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to two-fold.

total_samplesint, optional (default=1e6)

Total number of documents. Only used in the partial_fit() method.

perp_tolfloat, optional (default=1e-1)

Perplexity tolerance in batch learning. Only used when evaluate_every is greater than 0.

mean_change_tolfloat, optional (default=1e-3)

Stopping tolerance for updating document topic distribution in E-step.

max_doc_update_iterint (default=100)

Max number of iterations for updating document topic distribution in the E-step.

n_jobsint or None, optional (default=None)

The number of jobs to use in the E-step. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

verboseint, optional (default=0)

Verbosity level.

random_stateint, RandomState instance, default=None

Pass an int for reproducible results across multiple function calls. See Glossary.

Attributes

components_array, [n_components, n_features]

Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet, components_[i, j] can be viewed as pseudocount that represents the number of times word j was assigned to topic i. It can also be viewed as distribution over the words for each topic after normalization:

  • model.components_ / model.components_.sum(axis=1)[:, np.newaxis].

n_batch_iter_int

Number of iterations of the EM step.

n_iter_int

Number of passes over the dataset.

bound_float

Final perplexity score on training set.

doc_topic_prior_float

Prior of document topic distribution theta. If the value is None, it is 1 / n_components.

topic_word_prior_float

Prior of topic word distribution beta. If the value is None, it is 1 / n_components.

Examples

>>> from sklearn.decomposition import LatentDirichletAllocation
>>> from sklearn.datasets import make_multilabel_classification
>>> # This produces a feature matrix of token counts, similar to what
>>> # CountVectorizer would produce on text.
>>> X, _ = make_multilabel_classification(random_state=0)
>>> lda = LatentDirichletAllocation(n_components=5,
...     random_state=0)
>>> lda.fit(X)
LatentDirichletAllocation(...)
>>> # get topics for some given samples:
>>> lda.transform(X[-2:])
array([[0.00360392, 0.25499205, 0.0036211 , 0.64236448, 0.09541846],
       [0.15297572, 0.00362644, 0.44412786, 0.39568399, 0.003586  ]])

References

1

“Online Learning for Latent Dirichlet Allocation”, Matthew D. Hoffman, David M. Blei, Francis Bach, 2010

[2] “Stochastic Variational Inference”, Matthew D. Hoffman, David M. Blei,

Chong Wang, John Paisley, 2013

[3] Matthew D. Hoffman’s onlineldavb code. Link:

Full API documentation: LatentDirichletAllocationScikitsLearnNode

class mdp.nodes.KNeighborsTransformerScikitsLearnNode

Transform X into a (weighted) graph of k nearest neighbors This node has been automatically generated by wrapping the sklearn.neighbors._graph.KNeighborsTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The transformed data is a sparse graph as returned by kneighbors_graph.

Read more in the User Guide.

New in version 0.22.

Parameters

mode{‘distance’, ‘connectivity’}, default=’distance’

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, and ‘distance’ will return the distances between neighbors according to the given metric.

n_neighborsint, default=5

Number of neighbors for each sample in the transformed sparse graph. For compatibility reasons, as each sample is considered as its own neighbor, one extra neighbor will be computed when mode == ‘distance’. In this case, the sparse graph contains (n_neighbors + 1) neighbors.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

metricstr or callable, default=’minkowski’

metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used.

If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

Distance matrices are not supported.

Valid values for metric are:

  • from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]

  • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

See the documentation for scipy.spatial.distance for details on these metrics.

pint, default=2

Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=1

The number of parallel jobs to run for neighbors search. If -1, then the number of jobs is set to the number of CPU cores.

Examples

>>> from sklearn.manifold import Isomap
>>> from sklearn.neighbors import KNeighborsTransformer
>>> from sklearn.pipeline import make_pipeline
>>> estimator = make_pipeline(
...     KNeighborsTransformer(n_neighbors=5, mode='distance'),
...     Isomap(neighbors_algorithm='precomputed'))

Full API documentation: KNeighborsTransformerScikitsLearnNode

class mdp.nodes.RadiusNeighborsTransformerScikitsLearnNode

Transform X into a (weighted) graph of neighbors nearer than a radius This node has been automatically generated by wrapping the sklearn.neighbors._graph.RadiusNeighborsTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The transformed data is a sparse graph as returned by radius_neighbors_graph.

Read more in the User Guide.

New in version 0.22.

Parameters

mode{‘distance’, ‘connectivity’}, default=’distance’

Type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, and ‘distance’ will return the distances between neighbors according to the given metric.

radiusfloat, default=1.

Radius of neighborhood in the transformed sparse graph.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

metricstr or callable, default=’minkowski’

metric to use for distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used.

If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

Distance matrices are not supported.

Valid values for metric are:

  • from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]

  • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

See the documentation for scipy.spatial.distance for details on these metrics.

pint, default=2

Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=1

The number of parallel jobs to run for neighbors search. If -1, then the number of jobs is set to the number of CPU cores.

Examples

>>> from sklearn.cluster import DBSCAN
>>> from sklearn.neighbors import RadiusNeighborsTransformer
>>> from sklearn.pipeline import make_pipeline
>>> estimator = make_pipeline(
...     RadiusNeighborsTransformer(radius=42.0, mode='distance'),
...     DBSCAN(min_samples=30, metric='precomputed'))

Full API documentation: RadiusNeighborsTransformerScikitsLearnNode

class mdp.nodes.KNeighborsClassifierScikitsLearnNode

Classifier implementing the k-nearest neighbors vote. This node has been automatically generated by wrapping the sklearn.neighbors._classification.KNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_neighborsint, default=5

Number of neighbors to use by default for kneighbors() queries.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pint, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr or callable, default=’minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit() method.

Attributes

classes_array of shape (n_classes,)

Class labels known to the classifier

effective_metric_str or callble

The distance metric used. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.

effective_metric_params_dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

outputs_2d_bool

False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsClassifier
>>> neigh = KNeighborsClassifier(n_neighbors=3)
>>> neigh.fit(X, y)
KNeighborsClassifier(...)
>>> print(neigh.predict([[1.1]]))
[0]
>>> print(neigh.predict_proba([[0.9]]))
[[0.66666667 0.33333333]]

See also

RadiusNeighborsClassifier KNeighborsRegressor RadiusNeighborsRegressor NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsClassifierScikitsLearnNode

class mdp.nodes.RadiusNeighborsClassifierScikitsLearnNode

Classifier implementing a vote among neighbors within a given radius This node has been automatically generated by wrapping the sklearn.neighbors._classification.RadiusNeighborsClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

radiusfloat, default=1.0

Range of parameter space to use by default for radius_neighbors() queries.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pint, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr or callable, default=’minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

outlier_label{manual label, ‘most_frequent’}, default=None

label for outlier samples (samples with no neighbors in given radius).

  • manual label: str or int label (should be the same type as y) or list of manual labels if multi-output is used.

  • ‘most_frequent’ : assign the most frequent label of y to outliers.

  • None : when any outlier is detected, ValueError will be raised.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

classes_ndarray of shape (n_classes,)

Class labels known to the classifier.

effective_metric_str or callble

The distance metric used. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.

effective_metric_params_dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

outputs_2d_bool

False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsClassifier
>>> neigh = RadiusNeighborsClassifier(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsClassifier(...)
>>> print(neigh.predict([[1.5]]))
[0]
>>> print(neigh.predict_proba([[1.0]]))
[[0.66666667 0.33333333]]

See also

KNeighborsClassifier RadiusNeighborsRegressor KNeighborsRegressor NearestNeighbors

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: RadiusNeighborsClassifierScikitsLearnNode

class mdp.nodes.KNeighborsRegressorScikitsLearnNode

Regression based on k-nearest neighbors. This node has been automatically generated by wrapping the sklearn.neighbors._regression.KNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Read more in the User Guide.

New in version 0.9.

Parameters

n_neighborsint, default=5

Number of neighbors to use by default for kneighbors() queries.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pint, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr or callable, default=’minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit() method.

Attributes

effective_metric_str or callable

The distance metric to use. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.

effective_metric_params_dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import KNeighborsRegressor
>>> neigh = KNeighborsRegressor(n_neighbors=2)
>>> neigh.fit(X, y)
KNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[0.5]

See also

NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

Warning

Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: KNeighborsRegressorScikitsLearnNode

class mdp.nodes.RadiusNeighborsRegressorScikitsLearnNode

Regression based on neighbors within a fixed radius. This node has been automatically generated by wrapping the sklearn.neighbors._regression.RadiusNeighborsRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Read more in the User Guide.

New in version 0.9.

Parameters

radiusfloat, default=1.0

Range of parameter space to use by default for radius_neighbors() queries.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

weight function used in prediction. Possible values:

  • ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.

  • ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

  • [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pint, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr or callable, default=’minkowski’

the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

effective_metric_str or callable

The distance metric to use. It will be same as the metric parameter or a synonym of it, e.g. ‘euclidean’ if the metric parameter set to ‘minkowski’ and p parameter set to 2.

effective_metric_params_dict

Additional keyword arguments for the metric function. For most metrics will be same with metric_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

Examples

>>> X = [[0], [1], [2], [3]]
>>> y = [0, 0, 1, 1]
>>> from sklearn.neighbors import RadiusNeighborsRegressor
>>> neigh = RadiusNeighborsRegressor(radius=1.0)
>>> neigh.fit(X, y)
RadiusNeighborsRegressor(...)
>>> print(neigh.predict([[1.5]]))
[0.5]

See also

NearestNeighbors KNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier

Notes

See Nearest Neighbors in the online documentation for a discussion of the choice of algorithm and leaf_size.

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

Full API documentation: RadiusNeighborsRegressorScikitsLearnNode

class mdp.nodes.NearestCentroidScikitsLearnNode

Nearest centroid classifier. This node has been automatically generated by wrapping the sklearn.neighbors._nearest_centroid.NearestCentroid class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.

Read more in the User Guide.

Parameters

metricstr or callable

The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.pairwise_distances for its metric parameter. The centroids for the samples corresponding to each class is the point from which the sum of the distances (according to the metric) of all samples that belong to that particular class are minimized. If the “manhattan” metric is provided, this centroid is the median and for all other metrics, the centroid is now set to be the mean.

Changed in version 0.19: metric='precomputed' was deprecated and now raises an error

shrink_thresholdfloat, default=None

Threshold for shrinking centroids to remove features.

Attributes

centroids_array-like of shape (n_classes, n_features)

Centroid of each class.

classes_array of shape (n_classes,)

The unique classes labels.

Examples

>>> from sklearn.neighbors import NearestCentroid
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = NearestCentroid()
>>> clf.fit(X, y)
NearestCentroid()
>>> print(clf.predict([[-0.8, -1]]))
[1]

See also

sklearn.neighbors.KNeighborsClassifier: nearest neighbors classifier

Notes

When used for text classification with tf-idf vectors, this classifier is also known as the Rocchio classifier.

References

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences of the United States of America, 99(10), 6567-6572. The National Academy of Sciences.

Full API documentation: NearestCentroidScikitsLearnNode

class mdp.nodes.LocalOutlierFactorScikitsLearnNode

Unsupervised Outlier Detection using Local Outlier Factor (LOF) This node has been automatically generated by wrapping the sklearn.neighbors._lof.LocalOutlierFactor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers.

New in version 0.19.

Parameters

n_neighborsint, default=20

Number of neighbors to use by default for kneighbors() queries. If n_neighbors is larger than the number of samples provided, all samples will be used.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

  • ‘ball_tree’ will use BallTree

  • ‘kd_tree’ will use KDTree

  • ‘brute’ will use a brute-force search.

  • ‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit() method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

metricstr or callable, default=’minkowski’

metric used for the distance computation. Any metric from scikit-learn or scipy.spatial.distance can be used.

If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only “nonzero” elements may be considered neighbors.

If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

Valid values for metric are:

  • from scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]

  • from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

See the documentation for scipy.spatial.distance for details on these metrics:

pint, default=2

Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances(). When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

contamination‘auto’ or float, default=’auto’

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the scores of the samples.

  • if ‘auto’, the threshold is determined as in the original paper,

  • if a float, the contamination should be in the range [0, 0.5].

Changed in version 0.22: The default value of contamination changed from 0.1 to 'auto'.

noveltybool, default=False

By default, LocalOutlierFactor is only meant to be used for outlier detection (novelty=False). Set novelty to True if you want to use LocalOutlierFactor for novelty detection. In this case be aware that that you should only use predict, decision_function and score_samples on new unseen data and not on the training set.

New in version 0.20.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

negative_outlier_factor_ndarray of shape (n_samples,)

The opposite LOF of the training samples. The higher, the more normal. Inliers tend to have a LOF score close to 1 (negative_outlier_factor_ close to -1), while outliers tend to have a larger LOF score.

The local outlier factor (LOF) of a sample captures its supposed ‘degree of abnormality’. It is the average of the ratio of the local reachability density of a sample and those of its k-nearest neighbors.

n_neighbors_int

The actual number of neighbors used for kneighbors() queries.

offset_float

Offset used to obtain binary labels from the raw scores. Observations having a negative_outlier_factor smaller than offset_ are detected as abnormal. The offset is set to -1.5 (inliers score around -1), except when a contamination parameter different than “auto” is provided. In that case, the offset is defined in such a way we obtain the expected number of outliers in training.

New in version 0.20.

Examples

>>> import numpy as np
>>> from sklearn.neighbors import LocalOutlierFactor
>>> X = [[-1.1], [0.2], [101.1], [0.3]]
>>> clf = LocalOutlierFactor(n_neighbors=2)
>>> clf.fit_predict(X)
array([ 1,  1, -1,  1])
>>> clf.negative_outlier_factor_
array([ -0.9821...,  -1.0370..., -73.3697...,  -0.9821...])

References

1

Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In ACM sigmod record.

Full API documentation: LocalOutlierFactorScikitsLearnNode

class mdp.nodes.NeighborhoodComponentsAnalysisScikitsLearnNode

Neighborhood Components Analysis This node has been automatically generated by wrapping the sklearn.neighbors._nca.NeighborhoodComponentsAnalysis class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Neighborhood Component Analysis (NCA) is a machine learning algorithm for metric learning. It learns a linear transformation in a supervised fashion to improve the classification accuracy of a stochastic nearest neighbors rule in the transformed space.

Read more in the User Guide.

Parameters

n_componentsint, default=None

Preferred dimensionality of the projected space. If None it will be set to n_features.

init{‘auto’, ‘pca’, ‘lda’, ‘identity’, ‘random’} or ndarray of shape (n_features_a, n_features_b), default=’auto’

Initialization of the linear transformation. Possible options are ‘auto’, ‘pca’, ‘lda’, ‘identity’, ‘random’, and a numpy array of shape (n_features_a, n_features_b).

‘auto’

Depending on n_components, the most reasonable initialization will be chosen. If n_components <= n_classes we use ‘lda’, as it uses labels information. If not, but n_components < min(n_features, n_samples), we use ‘pca’, as it projects data in meaningful directions (those of higher variance). Otherwise, we just use ‘identity’.

‘pca’

n_components principal components of the inputs passed to fit() will be used to initialize the transformation. (See PCA)

‘lda’

min(n_components, n_classes) most discriminative components of the inputs passed to fit() will be used to initialize the transformation. (If n_components > n_classes, the rest of the components will be zero.) (See LinearDiscriminantAnalysis)

‘identity’

If n_components is strictly smaller than the dimensionality of the inputs passed to fit(), the identity matrix will be truncated to the first n_components rows.

‘random’

The initial transformation will be a random array of shape (n_components, n_features). Each value is sampled from the standard normal distribution.

numpy array

n_features_b must match the dimensionality of the inputs passed to fit() and n_features_a must be less than or equal to that. If n_components is not None, n_features_a must match it.

warm_startbool, default=False

If True and fit() has been called before, the solution of the previous call to fit() is used as the initial linear transformation (n_components and init will be ignored).

max_iterint, default=50

Maximum number of iterations in the optimization.

tolfloat, default=1e-5

Convergence tolerance for the optimization.

callbackcallable, default=None

If not None, this function is called after every iteration of the optimizer, taking as arguments the current solution (flattened transformation matrix) and the number of iterations. This might be useful in case one wants to examine or store the transformation found after each iteration.

verboseint, default=0

If 0, no progress messages will be printed. If 1, progress messages will be printed to stdout. If > 1, progress messages will be printed and the disp parameter of scipy.optimize.minimize() will be set to verbose - 2.

random_stateint or numpy.RandomState, default=None

A pseudo random number generator object or a seed for it if int. If init='random', random_state is used to initialize the random transformation. If init='pca', random_state is passed as an argument to PCA when initializing the transformation. Pass an int for reproducible results across multiple function calls. See :term: Glossary <random_state>.

Attributes

components_ndarray of shape (n_components, n_features)

The linear transformation learned during fitting.

n_iter_int

Counts the number of iterations performed by the optimizer.

random_state_numpy.RandomState

Pseudo random number generator object used during initialization.

Examples

>>> from sklearn.neighbors import NeighborhoodComponentsAnalysis
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> nca.fit(X_train, y_train)
NeighborhoodComponentsAnalysis(...)
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> knn.fit(X_train, y_train)
KNeighborsClassifier(...)
>>> print(knn.score(X_test, y_test))
0.933333...
>>> knn.fit(nca.transform(X_train), y_train)
KNeighborsClassifier(...)
>>> print(knn.score(nca.transform(X_test), y_test))
0.961904...

References

1

J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov. “Neighbourhood Components Analysis”. Advances in Neural Information Processing Systems. 17, 513-520, 2005. http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf

2

Wikipedia entry on Neighborhood Components Analysis https://en.wikipedia.org/wiki/Neighbourhood_components_analysis

Full API documentation: NeighborhoodComponentsAnalysisScikitsLearnNode

class mdp.nodes.DecisionTreeClassifierScikitsLearnNode

A decision tree classifier. This node has been automatically generated by wrapping the sklearn.tree._classes.DecisionTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

criterion{“gini”, “entropy”}, default=”gini”

The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

splitter{“best”, “random”}, default=”best”

The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_featuresint, float or {“auto”, “sqrt”, “log2”}, default=None

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=sqrt(n_features).

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

random_stateint, RandomState instance, default=None

Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

max_leaf_nodesint, default=None

Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=0

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

class_weightdict, list of dict or “balanced”, default=None

Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

presortdeprecated, default=’deprecated’

This parameter is deprecated and will be removed in v0.24.

Deprecated since version 0.22.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

classes_ndarray of shape (n_classes,) or list of ndarray

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

max_features_int

The inferred value of max_features.

n_classes_int or list of int

The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

tree_Tree

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

DecisionTreeRegressor : A decision tree regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

https://en.wikipedia.org/wiki/Decision_tree_learning

2

L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.

3

T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.

4

L. Breiman, and A. Cutler, “Random Forests”, https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...                             
...
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
        0.93...,  0.93...,  1.     ,  0.93...,  1.      ])

Full API documentation: DecisionTreeClassifierScikitsLearnNode

class mdp.nodes.DecisionTreeRegressorScikitsLearnNode

A decision tree regressor. This node has been automatically generated by wrapping the sklearn.tree._classes.DecisionTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

criterion{“mse”, “friedman_mse”, “mae”}, default=”mse”

The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node, “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits, and “mae” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node.

New in version 0.18: Mean Absolute Error (MAE) criterion.

splitter{“best”, “random”}, default=”best”

The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_featuresint, float or {“auto”, “sqrt”, “log2”}, default=None

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=n_features.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

random_stateint, RandomState instance, default=None

Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

max_leaf_nodesint, default=None

Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, (default=0)

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

presortdeprecated, default=’deprecated’

This parameter is deprecated and will be removed in v0.24.

Deprecated since version 0.22.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

feature_importances_ndarray of shape (n_features,)

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

max_features_int

The inferred value of max_features.

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

tree_Tree

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

DecisionTreeClassifier : A decision tree classifier.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

https://en.wikipedia.org/wiki/Decision_tree_learning

2

L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.

3

T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.

4

L. Breiman, and A. Cutler, “Random Forests”, https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> regressor = DecisionTreeRegressor(random_state=0)
>>> cross_val_score(regressor, X, y, cv=10)
...                    
...
array([-0.39..., -0.46...,  0.02...,  0.06..., -0.50...,
       0.16...,  0.11..., -0.73..., -0.30..., -0.00...])

Full API documentation: DecisionTreeRegressorScikitsLearnNode

class mdp.nodes.ExtraTreeClassifierScikitsLearnNode

An extremely randomized tree classifier. This node has been automatically generated by wrapping the sklearn.tree._classes.ExtraTreeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide.

Parameters

criterion{“gini”, “entropy”}, default=”gini”

The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

splitter{“random”, “best”}, default=”random”

The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_featuresint, float, {“auto”, “sqrt”, “log2”} or None, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=sqrt(n_features).

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

random_stateint, RandomState instance, default=None

Used to pick randomly the max_features used at each split. See Glossary for details.

max_leaf_nodesint, default=None

Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, (default=0)

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

class_weightdict, list of dict or “balanced”, default=None

Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

classes_ndarray of shape (n_classes,) or list of ndarray

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

max_features_int

The inferred value of max_features.

n_classes_int or list of int

The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

tree_Tree

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

ExtraTreeRegressor : An extremely randomized tree regressor. sklearn.ensemble.ExtraTreesClassifier : An extra-trees classifier. sklearn.ensemble.ExtraTreesRegressor : An extra-trees regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.tree import ExtraTreeClassifier
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...    X, y, random_state=0)
>>> extra_tree = ExtraTreeClassifier(random_state=0)
>>> cls = BaggingClassifier(extra_tree, random_state=0).fit(
...    X_train, y_train)
>>> cls.score(X_test, y_test)
0.8947...

Full API documentation: ExtraTreeClassifierScikitsLearnNode

class mdp.nodes.ExtraTreeRegressorScikitsLearnNode

An extremely randomized tree regressor. This node has been automatically generated by wrapping the sklearn.tree._classes.ExtraTreeRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide.

Parameters

criterion{“mse”, “friedman_mse”, “mae”}, default=”mse”

The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.

New in version 0.18: Mean Absolute Error (MAE) criterion.

splitter{“random”, “best”}, default=”random”

The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_featuresint, float, {“auto”, “sqrt”, “log2”} or None, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=n_features.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

random_stateint, RandomState instance, default=None

Used to pick randomly the max_features used at each split. See Glossary for details.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, (default=0)

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

max_leaf_nodesint, default=None

Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

max_features_int

The inferred value of max_features.

n_features_int

The number of features when fit is performed.

feature_importances_ndarray of shape (n_features,)

Return impurity-based feature importances (the higher, the more important the feature).

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

n_outputs_int

The number of outputs when fit is performed.

tree_Tree

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

See Also

ExtraTreeClassifier : An extremely randomized tree classifier. sklearn.ensemble.ExtraTreesClassifier : An extra-trees classifier. sklearn.ensemble.ExtraTreesRegressor : An extra-trees regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.tree import ExtraTreeRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> extra_tree = ExtraTreeRegressor(random_state=0)
>>> reg = BaggingRegressor(extra_tree, random_state=0).fit(
...     X_train, y_train)
>>> reg.score(X_test, y_test)
0.33...

Full API documentation: ExtraTreeRegressorScikitsLearnNode

class mdp.nodes.LocallyLinearEmbeddingScikitsLearnNode

Locally Linear Embedding This node has been automatically generated by wrapping the sklearn.manifold._locally_linear.LocallyLinearEmbedding class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_neighborsinteger

number of neighbors to consider for each point.

n_componentsinteger

number of coordinates for the manifold

regfloat

regularization constant, multiplies the trace of the local covariance matrix of the distances.

eigen_solverstring, {‘auto’, ‘arpack’, ‘dense’}

auto : algorithm will attempt to choose the best method for input data

arpackuse arnoldi iteration in shift-invert mode.

For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.

denseuse standard dense matrix operations for the eigenvalue

decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.

tolfloat, optional

Tolerance for ‘arpack’ method Not used if eigen_solver==’dense’.

max_iterinteger

maximum number of iterations for the arpack solver. Not used if eigen_solver==’dense’.

methodstring (‘standard’, ‘hessian’, ‘modified’ or ‘ltsa’)
standarduse the standard locally linear embedding algorithm. see

reference [1]

hessianuse the Hessian eigenmap method. This method requires

n_neighbors > n_components * (1 + (n_components + 1) / 2 see reference [2]

modifieduse the modified locally linear embedding algorithm.

see reference [3]

ltsause local tangent space alignment algorithm

see reference [4]

hessian_tolfloat, optional

Tolerance for Hessian eigenmapping method. Only used if method == 'hessian'

modified_tolfloat, optional

Tolerance for modified LLE method. Only used if method == 'modified'

neighbors_algorithmstring [‘auto’|’brute’|’kd_tree’|’ball_tree’]

algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance

random_stateint, RandomState instance, default=None

Determines the random number generator when eigen_solver == ‘arpack’. Pass an int for reproducible results across multiple function calls. See :term: Glossary <random_state>.

n_jobsint or None, optional (default=None)

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

embedding_array-like, shape [n_samples, n_components]

Stores the embedding vectors

reconstruction_error_float

Reconstruction error associated with embedding_

nbrs_NearestNeighbors object

Stores nearest neighbors instance, including BallTree or KDtree if applicable.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.manifold import LocallyLinearEmbedding
>>> X, _ = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> embedding = LocallyLinearEmbedding(n_components=2)
>>> X_transformed = embedding.fit_transform(X[:100])
>>> X_transformed.shape
(100, 2)

References

1

Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000).

2

Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003).

3

Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382

4

Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004)

Full API documentation: LocallyLinearEmbeddingScikitsLearnNode

class mdp.nodes.IsomapScikitsLearnNode

Isomap Embedding This node has been automatically generated by wrapping the sklearn.manifold._isomap.Isomap class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Non-linear dimensionality reduction through Isometric Mapping

Read more in the User Guide.

Parameters

n_neighborsinteger

number of neighbors to consider for each point.

n_componentsinteger

number of coordinates for the manifold

eigen_solver[‘auto’|’arpack’|’dense’]

‘auto’ : Attempt to choose the most efficient solver for the given problem.

‘arpack’ : Use Arnoldi decomposition to find the eigenvalues and eigenvectors.

‘dense’ : Use a direct solver (i.e. LAPACK) for the eigenvalue decomposition.

tolfloat

Convergence tolerance passed to arpack or lobpcg. not used if eigen_solver == ‘dense’.

max_iterinteger

Maximum number of iterations for the arpack solver. not used if eigen_solver == ‘dense’.

path_methodstring [‘auto’|’FW’|’D’]

Method to use in finding shortest path.

‘auto’ : attempt to choose the best algorithm automatically.

‘FW’ : Floyd-Warshall algorithm.

‘D’ : Dijkstra’s algorithm.

neighbors_algorithmstring [‘auto’|’brute’|’kd_tree’|’ball_tree’]

Algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance.

n_jobsint or None, default=None

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

metricstring, or callable, default=”minkowski”

The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances() for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a Glossary.

New in version 0.22.

pint, default=2

Parameter for the Minkowski metric from sklearn.metrics.pairwise.pairwise_distances. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

New in version 0.22.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

New in version 0.22.

Attributes

embedding_array-like, shape (n_samples, n_components)

Stores the embedding vectors.

kernel_pca_object

KernelPCA object used to implement the embedding.

nbrs_sklearn.neighbors.NearestNeighbors instance

Stores nearest neighbors instance, including BallTree or KDtree if applicable.

dist_matrix_array-like, shape (n_samples, n_samples)

Stores the geodesic distance matrix of training data.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.manifold import Isomap
>>> X, _ = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> embedding = Isomap(n_components=2)
>>> X_transformed = embedding.fit_transform(X[:100])
>>> X_transformed.shape
(100, 2)

References

1

Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500)

Full API documentation: IsomapScikitsLearnNode

class mdp.nodes.MeanShiftScikitsLearnNode

Mean shift clustering using a flat kernel. This node has been automatically generated by wrapping the sklearn.cluster._mean_shift.MeanShift class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Mean shift clustering aims to discover “blobs” in a smooth density of samples. It is a centroid-based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids.

Seeding is performed using a binning technique for scalability.

Read more in the User Guide.

Parameters

bandwidthfloat, default=None

Bandwidth used in the RBF kernel.

If not given, the bandwidth is estimated using sklearn.cluster.estimate_bandwidth; see the documentation for that function for hints on scalability (see also the Notes, below).

seedsarray-like of shape (n_samples, n_features), default=None

Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.

bin_seedingbool, default=False

If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness corresponds to the bandwidth. Setting this option to True will speed up the algorithm because fewer seeds will be initialized. The default value is False. Ignored if seeds argument is not None.

min_bin_freqint, default=1

To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds.

cluster_allbool, default=True

If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

n_jobsint, default=None

The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.

None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

max_iterint, default=300

Maximum number of iterations, per seed point before the clustering operation terminates (for that seed point), if has not converged yet.

New in version 0.22.

Attributes

cluster_centers_array, [n_clusters, n_features]

Coordinates of cluster centers.

labels_array of shape (n_samples,)

Labels of each point.

n_iter_int

Maximum number of iterations performed on each seed.

New in version 0.22.

Examples

>>> from sklearn.cluster import MeanShift
>>> import numpy as np
>>> X = np.array([[1, 1], [2, 1], [1, 0],
...               [4, 7], [3, 5], [3, 6]])
>>> clustering = MeanShift(bandwidth=2).fit(X)
>>> clustering.labels_
array([1, 1, 1, 0, 0, 0])
>>> clustering.predict([[0, 0], [5, 5]])
array([1, 0])
>>> clustering
MeanShift(bandwidth=2)

Notes

Scalability:

Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will tend towards O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2).

Scalability can be boosted by using fewer seeds, for example by using a higher value of min_bin_freq in the get_bin_seeds function.

Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used.

References

Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619.

Full API documentation: MeanShiftScikitsLearnNode

class mdp.nodes.AffinityPropagationScikitsLearnNode

Perform Affinity Propagation Clustering of data. This node has been automatically generated by wrapping the sklearn.cluster._affinity_propagation.AffinityPropagation class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

dampingfloat, default=0.5

Damping factor (between 0.5 and 1) is the extent to which the current value is maintained relative to incoming values (weighted 1 - damping). This in order to avoid numerical oscillations when updating these values (messages).

max_iterint, default=200

Maximum number of iterations.

convergence_iterint, default=15

Number of iterations with no change in the number of estimated clusters that stops the convergence.

copybool, default=True

Make a copy of input data.

preferencearray-like of shape (n_samples,) or float, default=None

Preferences for each point - points with larger values of preferences are more likely to be chosen as exemplars. The number of exemplars, ie of clusters, is influenced by the input preferences value. If the preferences are not passed as arguments, they will be set to the median of the input similarities.

affinity{‘euclidean’, ‘precomputed’}, default=’euclidean’

Which affinity to use. At the moment ‘precomputed’ and euclidean are supported. ‘euclidean’ uses the negative squared euclidean distance between points.

verbosebool, default=False

Whether to be verbose.

random_stateint or np.random.RandomStateInstance, default: 0

Pseudo-random number generator to control the starting state. Use an int for reproducible results across function calls. See the Glossary.

New in version 0.23: this parameter was previously hardcoded as 0.

Attributes

cluster_centers_indices_ndarray of shape (n_clusters,)

Indices of cluster centers

cluster_centers_ndarray of shape (n_clusters, n_features)

Cluster centers (if affinity != precomputed).

labels_ndarray of shape (n_samples,)

Labels of each point

affinity_matrix_ndarray of shape (n_samples, n_samples)

Stores the affinity matrix used in fit.

n_iter_int

Number of iterations taken to converge.

Notes

For an example, see examples/cluster/plot_affinity_propagation.py.

The algorithmic complexity of affinity propagation is quadratic in the number of points.

When fit does not converge, cluster_centers_ becomes an empty array and all training samples will be labelled as -1. In addition, predict will then label every sample as -1.

When all training samples have equal similarities and equal preferences, the assignment of cluster centers and labels depends on the preference. If the preference is smaller than the similarities, fit will result in a single cluster center and label 0 for every sample. Otherwise, every training sample becomes its own cluster center and is assigned a unique label.

References

Brendan J. Frey and Delbert Dueck, “Clustering by Passing Messages Between Data Points”, Science Feb. 2007

Examples

>>> from sklearn.cluster import AffinityPropagation
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [4, 2], [4, 4], [4, 0]])
>>> clustering = AffinityPropagation(random_state=5).fit(X)
>>> clustering
AffinityPropagation(random_state=5)
>>> clustering.labels_
array([0, 0, 0, 1, 1, 1])
>>> clustering.predict([[0, 0], [4, 4]])
array([0, 1])
>>> clustering.cluster_centers_
array([[1, 2],
       [4, 2]])

Full API documentation: AffinityPropagationScikitsLearnNode

class mdp.nodes.FeatureAgglomerationScikitsLearnNode

Agglomerate features. This node has been automatically generated by wrapping the sklearn.cluster._agglomerative.FeatureAgglomeration class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Similar to AgglomerativeClustering, but recursively merges features instead of samples.

Read more in the User Guide.

Parameters

n_clustersint, default=2

The number of clusters to find. It must be None if distance_threshold is not None.

affinitystr or callable, default=’euclidean’

Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is “ward”, only “euclidean” is accepted.

memorystr or object with the joblib.Memory interface, default=None

Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

connectivityarray-like or callable, default=None

Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

compute_full_tree‘auto’ or bool, optional, default=’auto’

Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of features. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be True if distance_threshold is not None. By default compute_full_tree is “auto”, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, “auto” is equivalent to False.

linkage{‘ward’, ‘complete’, ‘average’, ‘single’}, default=’ward’

Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion.

  • ward minimizes the variance of the clusters being merged.

  • average uses the average of the distances of each feature of the two sets.

  • complete or maximum linkage uses the maximum distances between all features of the two sets.

  • single uses the minimum of the distances between all observations of the two sets.

pooling_funccallable, default=np.mean

This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].

distance_thresholdfloat, default=None

The linkage distance threshold above which, clusters will not be merged. If not None, n_clusters must be None and compute_full_tree must be True.

New in version 0.21.

Attributes

n_clusters_int

The number of clusters found by the algorithm. If distance_threshold=None, it will be equal to the given n_clusters.

labels_array-like of (n_features,)

cluster labels for each feature.

n_leaves_int

Number of leaves in the hierarchical tree.

n_connected_components_int

The estimated number of connected components in the graph.

New in version 0.21: n_connected_components_ was added to replace n_components_.

children_array-like of shape (n_nodes-1, 2)

The children of each non-leaf node. Values less than n_features correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_features is a non-leaf node and has children children_[i - n_features]. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node n_features + i

distances_array-like of shape (n_nodes-1,)

Distances between nodes in the corresponding place in children_. Only computed if distance_threshold is not None.

Examples

>>> import numpy as np
>>> from sklearn import datasets, cluster
>>> digits = datasets.load_digits()
>>> images = digits.images
>>> X = np.reshape(images, (len(images), -1))
>>> agglo = cluster.FeatureAgglomeration(n_clusters=32)
>>> agglo.fit(X)
FeatureAgglomeration(n_clusters=32)
>>> X_reduced = agglo.transform(X)
>>> X_reduced.shape
(1797, 32)

Full API documentation: FeatureAgglomerationScikitsLearnNode

class mdp.nodes.KMeansScikitsLearnNode

K-Means clustering. This node has been automatically generated by wrapping the sklearn.cluster._kmeans.KMeans class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_clustersint, default=8

The number of clusters to form as well as the number of centroids to generate.

init{‘k-means++’, ‘random’, ndarray, callable}, default=’k-means++’

Method for initialization:

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

‘random’: choose n_clusters observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

If a callable is passed, it should take arguments X, n_clusters and a random state and return an initialization.

n_initint, default=10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

max_iterint, default=300

Maximum number of iterations of the k-means algorithm for a single run.

tolfloat, default=1e-4

Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence. It’s not advised to set tol=0 since convergence might never be declared due to rounding errors. Use a very small number instead.

precompute_distances{‘auto’, True, False}, default=’auto’

Precompute distances (faster but takes more memory).

‘auto’ : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision.

True : always precompute distances.

False : never precompute distances.

Deprecated since version 0.23: ‘precompute_distances’ was deprecated in version 0.22 and will be removed in 0.25. It has no effect.

verboseint, default=0

Verbosity mode.

random_stateint, RandomState instance, default=None

Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. See Glossary.

copy_xbool, default=True

When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean. Note that if the original data is not C-contiguous, a copy will be made even if copy_x is False. If the original data is sparse, but not in CSR format, a copy will be made even if copy_x is False.

n_jobsint, default=None

The number of OpenMP threads to use for the computation. Parallelism is sample-wise on the main cython loop which assigns each sample to its closest center.

None or -1 means using all processors.

Deprecated since version 0.23: n_jobs was deprecated in version 0.23 and will be removed in 0.25.

algorithm{“auto”, “full”, “elkan”}, default=”auto”

K-means algorithm to use. The classical EM-style algorithm is “full”. The “elkan” variation is more efficient on data with well-defined clusters, by using the triangle inequality. However it’s more memory intensive due to the allocation of an extra array of shape (n_samples, n_clusters).

For now “auto” (kept for backward compatibiliy) chooses “elkan” but it might change in the future for a better heuristic.

Changed in version 0.18: Added Elkan algorithm

Attributes

cluster_centers_ndarray of shape (n_clusters, n_features)

Coordinates of cluster centers. If the algorithm stops before fully converging (see tol and max_iter), these will not be consistent with labels_.

labels_ndarray of shape (n_samples,)

Labels of each point

inertia_float

Sum of squared distances of samples to their closest cluster center.

n_iter_int

Number of iterations run.

See also

MiniBatchKMeans

Alternative online implementation that does incremental updates of the centers positions using mini-batches. For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation.

Notes

The k-means problem is solved using either Lloyd’s or Elkan’s algorithm.

The average complexity is given by O(k n T), were n is the number of samples and T is the number of iteration.

The worst case complexity is given by O(n^(k+2/p)) with n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii, ‘How slow is the k-means method?’ SoCG2006)

In practice, the k-means algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That’s why it can be useful to restart it several times.

If the algorithm stops before fully converging (because of tol or max_iter), labels_ and cluster_centers_ will not be consistent, i.e. the cluster_centers_ will not be the means of the points in each cluster. Also, the estimator will reassign labels_ after the last iteration to make labels_ consistent with predict on the training set.

Examples

>>> from sklearn.cluster import KMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
>>> kmeans.predict([[0, 0], [12, 3]])
array([1, 0], dtype=int32)
>>> kmeans.cluster_centers_
array([[10.,  2.],
       [ 1.,  2.]])

Full API documentation: KMeansScikitsLearnNode

class mdp.nodes.MiniBatchKMeansScikitsLearnNode

Mini-Batch K-Means clustering. This node has been automatically generated by wrapping the sklearn.cluster._kmeans.MiniBatchKMeans class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

n_clustersint, default=8

The number of clusters to form as well as the number of centroids to generate.

init{‘k-means++’, ‘random’} or ndarray of shape (n_clusters, n_features), default=’k-means++’

Method for initialization

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

‘random’: choose k observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

max_iterint, default=100

Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics.

batch_sizeint, default=100

Size of the mini batches.

verboseint, default=0

Verbosity mode.

compute_labelsbool, default=True

Compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit.

random_stateint, RandomState instance, default=None

Determines random number generation for centroid initialization and random reassignment. Use an int to make the randomness deterministic. See Glossary.

tolfloat, default=0.0

Control early stopping based on the relative center changes as measured by a smoothed, variance-normalized of the mean center squared position changes. This early stopping heuristics is closer to the one used for the batch variant of the algorithms but induces a slight computational and memory overhead over the inertia heuristic.

To disable convergence detection based on normalized center change, set tol to 0.0 (default).

max_no_improvementint, default=10

Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia.

To disable convergence detection based on inertia, set max_no_improvement to None.

init_sizeint, default=None

Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters.

If None, init_size= 3 * batch_size.

n_initint, default=3

Number of random initializations that are tried. In contrast to KMeans, the algorithm is only run once, using the best of the n_init initializations as measured by inertia.

reassignment_ratiofloat, default=0.01

Control the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering.

Attributes

cluster_centers_ndarray of shape (n_clusters, n_features)

Coordinates of cluster centers

labels_int

Labels of each point (if compute_labels is set to True).

inertia_float

The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor.

See Also

KMeans

The classic implementation of the clustering method based on the Lloyd’s algorithm. It consumes the whole set of input data at each iteration.

Notes

See https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Examples

>>> from sklearn.cluster import MiniBatchKMeans
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
...               [4, 2], [4, 0], [4, 4],
...               [4, 5], [0, 1], [2, 2],
...               [3, 2], [5, 5], [1, -1]])
>>> # manually fit on batches
>>> kmeans = MiniBatchKMeans(n_clusters=2,
...                          random_state=0,
...                          batch_size=6)
>>> kmeans = kmeans.partial_fit(X[0:6,:])
>>> kmeans = kmeans.partial_fit(X[6:12,:])
>>> kmeans.cluster_centers_
array([[2. , 1. ],
       [3.5, 4.5]])
>>> kmeans.predict([[0, 0], [4, 4]])
array([0, 1], dtype=int32)
>>> # fit on the whole data
>>> kmeans = MiniBatchKMeans(n_clusters=2,
...                          random_state=0,
...                          batch_size=6,
...                          max_iter=10).fit(X)
>>> kmeans.cluster_centers_
array([[3.95918367, 2.40816327],
       [1.12195122, 1.3902439 ]])
>>> kmeans.predict([[0, 0], [4, 4]])
array([1, 0], dtype=int32)

Full API documentation: MiniBatchKMeansScikitsLearnNode

class mdp.nodes.BirchScikitsLearnNode

Implements the Birch clustering algorithm. This node has been automatically generated by wrapping the sklearn.cluster._birch.Birch class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. It is a memory-efficient, online-learning algorithm provided as an alternative to MiniBatchKMeans. It constructs a tree data structure with the cluster centroids being read off the leaf. These can be either the final cluster centroids or can be provided as input to another clustering algorithm such as AgglomerativeClustering.

Read more in the User Guide.

New in version 0.16.

Parameters

thresholdfloat, default=0.5

The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold. Otherwise a new subcluster is started. Setting this value to be very low promotes splitting and vice-versa.

branching_factorint, default=50

Maximum number of CF subclusters in each node. If a new samples enters such that the number of subclusters exceed the branching_factor then that node is split into two nodes with the subclusters redistributed in each. The parent subcluster of that node is removed and two new subclusters are added as parents of the 2 split nodes.

n_clustersint, instance of sklearn.cluster model, default=3

Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples.

  • None : the final clustering step is not performed and the subclusters are returned as they are.

  • sklearn.cluster Estimator : If a model is provided, the model is fit treating the subclusters as new samples and the initial data is mapped to the label of the closest subcluster.

  • int : the model fit is AgglomerativeClustering with n_clusters set to be equal to the int.

compute_labelsbool, default=True

Whether or not to compute labels for each fit.

copybool, default=True

Whether or not to make a copy of the given data. If set to False, the initial data will be overwritten.

Attributes

root__CFNode

Root of the CFTree.

dummy_leaf__CFNode

Start pointer to all the leaves.

subcluster_centers_ndarray

Centroids of all subclusters read directly from the leaves.

subcluster_labels_ndarray

Labels assigned to the centroids of the subclusters after they are clustered globally.

labels_ndarray of shape (n_samples,)

Array of labels assigned to the input data. if partial_fit is used instead of fit, they are assigned to the last batch of data.

See Also

MiniBatchKMeans

Alternative implementation that does incremental updates of the centers’ positions using mini-batches.

Notes

The tree data structure consists of nodes with each node consisting of a number of subclusters. The maximum number of subclusters in a node is determined by the branching factor. Each subcluster maintains a linear sum, squared sum and the number of samples in that subcluster. In addition, each subcluster can also have a node as its child, if the subcluster is not a member of a leaf node.

For a new point entering the root, it is merged with the subcluster closest to it and the linear sum, squared sum and the number of samples of that subcluster are updated. This is done recursively till the properties of the leaf node are updated.

References

Examples

>>> from sklearn.cluster import Birch
>>> X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]]
>>> brc = Birch(n_clusters=None)
>>> brc.fit(X)
Birch(n_clusters=None)
>>> brc.predict(X)
array([0, 0, 0, 1, 1, 1])

Full API documentation: BirchScikitsLearnNode

class mdp.nodes.PLSCanonicalScikitsLearnNode

PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, referred as PLS-C2A in [Wegelin 2000]. This node has been automatically generated by wrapping the sklearn.cross_decomposition._pls.PLSCanonical class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This class inherits from PLS with mode=”A” and deflation_mode=”canonical”, norm_y_weights=True and algorithm=”nipals”, but svd should provide similar results up to numerical errors.

Read more in the User Guide.

New in version 0.8.

Parameters

n_componentsint, (default 2).

Number of components to keep

scaleboolean, (default True)

Option to scale data

algorithmstring, “nipals” or “svd”

The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.

max_iteran integer, (default 500)

the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)

tolnon-negative real, default 1e-06

the tolerance used in the iterative algorithm

copyboolean, default True

Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_array, shape = [p, n_components]

X block weights vectors.

y_weights_array, shape = [q, n_components]

Y block weights vectors.

x_loadings_array, shape = [p, n_components]

X block loadings vectors.

y_loadings_array, shape = [q, n_components]

Y block loadings vectors.

x_scores_array, shape = [n_samples, n_components]

X scores.

y_scores_array, shape = [n_samples, n_components]

Y scores.

x_rotations_array, shape = [p, n_components]

X block to latents rotations.

y_rotations_array, shape = [q, n_components]

Y block to latents rotations.

coef_array of shape (p, q)

The coefficients of the linear model: Y = X ``coef_ + Err``

n_iter_array-like

Number of iterations of the NIPALS inner loop for each component. Not useful if the algorithm provided is “svd”.

Notes

Matrices:

T: ``x_scores_``
U: ``y_scores_``
W: ``x_weights_``
C: ``y_weights_``
P: ``x_loadings_``
Q: ``y_loadings__``

Are computed such that:

X = T P.T + Err and Y = U Q.T + Err
T[:, k] = Xk W[:, k] for k in range(n_components)
U[:, k] = Yk C[:, k] for k in range(n_components)
``x_rotations_`` = W (P.T W)^(-1)
``y_rotations_`` = C (Q.T C)^(-1)

where Xk and Yk are residual matrices at iteration k.

Slides explaining PLS

For each component k, find weights u, v that optimize:

max corr(Xk u, Yk v) * std(Xk u) std(Yk u), such that ``|u| = |v| = 1``

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symmetric version of the PLS regression. But slightly different than the CCA. This is mostly used for modeling.

This implementation provides the same results that the “plspm” package provided in the R language (R-project), using the function plsca(X, Y). Results are equal or collinear with the function pls(..., mode = "canonical") of the “mixOmics” package. The difference relies in the fact that mixOmics implementation does not exactly implement the Wold algorithm since it does not normalize y_weights to one.

Examples

>>> from sklearn.cross_decomposition import PLSCanonical
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> plsca = PLSCanonical(n_components=2)
>>> plsca.fit(X, Y)
PLSCanonical()
>>> X_c, Y_c = plsca.transform(X, Y)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

See also

CCA PLSSVD

Full API documentation: PLSCanonicalScikitsLearnNode

class mdp.nodes.PLSRegressionScikitsLearnNode

PLS regression This node has been automatically generated by wrapping the sklearn.cross_decomposition._pls.PLSRegression class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. PLSRegression implements the PLS 2 blocks regression known as PLS2 or PLS1 in case of one dimensional response. This class inherits from _PLS with mode=”A”, deflation_mode=”regression”, norm_y_weights=False and algorithm=”nipals”.

Read more in the User Guide.

New in version 0.8.

Parameters

n_componentsint, (default 2)

Number of components to keep.

scaleboolean, (default True)

whether to scale the data

max_iteran integer, (default 500)

the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)

tolnon-negative real

Tolerance used in the iterative algorithm default 1e-06.

copyboolean, default True

Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect

Attributes

x_weights_array, [p, n_components]

X block weights vectors.

y_weights_array, [q, n_components]

Y block weights vectors.

x_loadings_array, [p, n_components]

X block loadings vectors.

y_loadings_array, [q, n_components]

Y block loadings vectors.

x_scores_array, [n_samples, n_components]

X scores.

y_scores_array, [n_samples, n_components]

Y scores.

x_rotations_array, [p, n_components]

X block to latents rotations.

y_rotations_array, [q, n_components]

Y block to latents rotations.

coef_array, [p, q]

The coefficients of the linear model: Y = X ``coef_ + Err``

n_iter_array-like

Number of iterations of the NIPALS inner loop for each component.

Notes

Matrices:

T: ``x_scores_``
U: ``y_scores_``
W: ``x_weights_``
C: ``y_weights_``
P: ``x_loadings_``
Q: ``y_loadings_``

Are computed such that:

X = T P.T + Err and Y = U Q.T + Err
T[:, k] = Xk W[:, k] for k in range(n_components)
U[:, k] = Yk C[:, k] for k in range(n_components)
``x_rotations_`` = W (P.T W)^(-1)
``y_rotations_`` = C (Q.T C)^(-1)

where Xk and Yk are residual matrices at iteration k.

Slides explaining PLS

For each component k, find weights u, v that optimizes:

max corr(Xk u, Yk v) * std(Xk u) std(Yk u), such that |u| = 1

Note that it maximizes both the correlations between the scores and the intra-block variances.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current X score. This performs the PLS regression known as PLS2. This mode is prediction oriented.

This implementation provides the same results that 3 PLS packages provided in the R language (R-project):

  • “mixOmics” with function pls(X, Y, mode = “regression”)

  • “plspm ” with function plsreg2(X, Y)

  • “pls” with function oscorespls.fit(X, Y)

Examples

>>> from sklearn.cross_decomposition import PLSRegression
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> pls2 = PLSRegression(n_components=2)
>>> pls2.fit(X, Y)
PLSRegression()
>>> Y_pred = pls2.predict(X)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

Full API documentation: PLSRegressionScikitsLearnNode

class mdp.nodes.PLSSVDScikitsLearnNode

Partial Least Square SVD This node has been automatically generated by wrapping the sklearn.cross_decomposition._pls.PLSSVD class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Simply perform a svd on the crosscovariance matrix: X’Y There are no iterative deflation here.

Read more in the User Guide.

New in version 0.8.

Parameters

n_componentsint, default 2

Number of components to keep.

scaleboolean, default True

Whether to scale X and Y.

copyboolean, default True

Whether to copy X and Y, or perform in-place computations.

Attributes

x_weights_array, [p, n_components]

X block weights vectors.

y_weights_array, [q, n_components]

Y block weights vectors.

x_scores_array, [n_samples, n_components]

X scores.

y_scores_array, [n_samples, n_components]

Y scores.

Examples

>>> import numpy as np
>>> from sklearn.cross_decomposition import PLSSVD
>>> X = np.array([[0., 0., 1.],
...     [1.,0.,0.],
...     [2.,2.,2.],
...     [2.,5.,4.]])
>>> Y = np.array([[0.1, -0.2],
...     [0.9, 1.1],
...     [6.2, 5.9],
...     [11.9, 12.3]])
>>> plsca = PLSSVD(n_components=2)
>>> plsca.fit(X, Y)
PLSSVD()
>>> X_c, Y_c = plsca.transform(X, Y)
>>> X_c.shape, Y_c.shape
((4, 2), (4, 2))

See also

PLSCanonical CCA

Full API documentation: PLSSVDScikitsLearnNode

class mdp.nodes.CCAScikitsLearnNode

CCA Canonical Correlation Analysis. This node has been automatically generated by wrapping the sklearn.cross_decomposition._cca.CCA class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. CCA inherits from PLS with mode=”B” and deflation_mode=”canonical”.

Read more in the User Guide.

Parameters

n_componentsint, (default 2).

number of components to keep.

scaleboolean, (default True)

whether to scale the data?

max_iteran integer, (default 500)

the maximum number of iterations of the NIPALS inner loop

tolnon-negative real, default 1e-06.

the tolerance used in the iterative algorithm

copyboolean

Whether the deflation be done on a copy. Let the default value to True unless you don’t care about side effects

Attributes

x_weights_array, [p, n_components]

X block weights vectors.

y_weights_array, [q, n_components]

Y block weights vectors.

x_loadings_array, [p, n_components]

X block loadings vectors.

y_loadings_array, [q, n_components]

Y block loadings vectors.

x_scores_array, [n_samples, n_components]

X scores.

y_scores_array, [n_samples, n_components]

Y scores.

x_rotations_array, [p, n_components]

X block to latents rotations.

y_rotations_array, [q, n_components]

Y block to latents rotations.

coef_array of shape (p, q)

The coefficients of the linear model: Y = X ``coef_ + Err``

n_iter_array-like

Number of iterations of the NIPALS inner loop for each component.

Notes

For each component k, find the weights u, v that maximizes max corr(Xk u, Yk v), such that |u| = |v| = 1

Note that it maximizes only the correlations between the scores.

The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.

The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score.

Examples

>>> from sklearn.cross_decomposition import CCA
>>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]]
>>> Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]]
>>> cca = CCA(n_components=1)
>>> cca.fit(X, Y)
CCA(n_components=1)
>>> X_c, Y_c = cca.transform(X, Y)

References

Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the two-block case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.

In french but still a reference:

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:

Editions Technic.

See also

PLSCanonical PLSSVD

Full API documentation: CCAScikitsLearnNode

class mdp.nodes.LinearDiscriminantAnalysisScikitsLearnNode

Linear Discriminant Analysis This node has been automatically generated by wrapping the sklearn.discriminant_analysis.LinearDiscriminantAnalysis class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

New in version 0.17: LinearDiscriminantAnalysis.

Read more in the User Guide.

Parameters

solver{‘svd’, ‘lsqr’, ‘eigen’}, default=’svd’

Solver to use, possible values:

    • ‘svd’: Singular value decomposition (default).

  • Does not compute the covariance matrix, therefore this solver is

  • recommended for data with a large number of features.

    • ‘lsqr’: Least squares solution, can be combined with shrinkage.

    • ‘eigen’: Eigenvalue decomposition, can be combined with shrinkage.

shrinkage‘auto’ or float, default=None

Shrinkage parameter, possible values:

    • None: no shrinkage (default).

    • ‘auto’: automatic shrinkage using the Ledoit-Wolf lemma.

    • float between 0 and 1: fixed shrinkage parameter.

Note that shrinkage works only with ‘lsqr’ and ‘eigen’ solvers.

priorsarray-like of shape (n_classes,), default=None

The class prior probabilities. By default, the class proportions are inferred from the training data.

n_componentsint, default=None

Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.

store_covariancebool, default=False

If True, explicitely compute the weighted within-class covariance matrix when solver is ‘svd’. The matrix is always computed and stored for the other solvers.

New in version 0.17.

tolfloat, default=1.0e-4

Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular values are non-significant are discarded. Only used if solver is ‘svd’.

New in version 0.17.

Attributes

coef_ndarray of shape (n_features,) or (n_classes, n_features)

Weight vector(s).

intercept_ndarray of shape (n_classes,)

Intercept term.

covariance_array-like of shape (n_features, n_features)

Weighted within-class covariance matrix. It corresponds to sum_k prior_k * C_k where C_k is the covariance matrix of the samples in class k. The C_k are estimated using the (potentially shrunk) biased estimator of covariance. If solver is ‘svd’, only exists when store_covariance is True.

explained_variance_ratio_ndarray of shape (n_components,)

Percentage of variance explained by each of the selected components. If n_components is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used.

means_array-like of shape (n_classes, n_features)

Class-wise means.

priors_array-like of shape (n_classes,)

Class priors (sum to 1).

scalings_array-like of shape (rank, n_classes - 1)

Scaling of the features in the space spanned by the class centroids. Only available for ‘svd’ and ‘eigen’ solvers.

xbar_array-like of shape (n_features,)

Overall mean. Only present if solver is ‘svd’.

classes_array-like of shape (n_classes,)

Unique class labels.

See also

sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis: Quadratic

Discriminant Analysis

Examples

>>> import numpy as np
>>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = LinearDiscriminantAnalysis()
>>> clf.fit(X, y)
LinearDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

Full API documentation: LinearDiscriminantAnalysisScikitsLearnNode

class mdp.nodes.QuadraticDiscriminantAnalysisScikitsLearnNode

Quadratic Discriminant Analysis This node has been automatically generated by wrapping the sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.

The model fits a Gaussian density to each class.

New in version 0.17: QuadraticDiscriminantAnalysis

Read more in the User Guide.

Parameters

priorsndarray of shape (n_classes,), default=None

Class priors. By default, the class proportions are inferred from the training data.

reg_paramfloat, default=0.0

Regularizes the per-class covariance estimates by transforming S2 as S2 = (1 - reg_param) * S2 + reg_param * np.eye(n_features), where S2 corresponds to the scaling_ attribute of a given class.

store_covariancebool, default=False

If True, the class covariance matrices are explicitely computed and stored in the self.covariance_ attribute.

New in version 0.17.

tolfloat, default=1.0e-4

Absolute threshold for a singular value to be considered significant, used to estimate the rank of Xk where Xk is the centered matrix of samples in class k. This parameter does not affect the predictions. It only controls a warning that is raised when features are considered to be colinear.

New in version 0.17.

Attributes

covariance_list of len n_classes of ndarray of shape (n_features, n_features)

For each class, gives the covariance matrix estimated using the samples of that class. The estimations are unbiased. Only present if store_covariance is True.

means_array-like of shape (n_classes, n_features)

Class-wise means.

priors_array-like of shape (n_classes,)

Class priors (sum to 1).

rotations_list of len n_classes of ndarray of shape (n_features, n_k)

For each class k an array of shape (n_features, n_k), where n_k = min(n_features, number of elements in class k) It is the rotation of the Gaussian distribution, i.e. its principal axis. It corresponds to V, the matrix of eigenvectors coming from the SVD of Xk = U S Vt where Xk is the centered matrix of samples from class k.

scalings_list of len n_classes of ndarray of shape (n_k,)

For each class, contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system. It corresponds to S^2 / (n_samples - 1), where S is the diagonal matrix of singular values from the SVD of Xk, where Xk is the centered matrix of samples from class k.

classes_ndarray of shape (n_classes,)

Unique class labels.

Examples

>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> clf = QuadraticDiscriminantAnalysis()
>>> clf.fit(X, y)
QuadraticDiscriminantAnalysis()
>>> print(clf.predict([[-0.8, -1]]))
[1]

See also

sklearn.discriminant_analysis.LinearDiscriminantAnalysis: Linear

Discriminant Analysis

Full API documentation: QuadraticDiscriminantAnalysisScikitsLearnNode

class mdp.nodes.DummyClassifierScikitsLearnNode

DummyClassifier is a classifier that makes predictions using simple rules. This node has been automatically generated by wrapping the sklearn.dummy.DummyClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This classifier is useful as a simple baseline to compare with other (real) classifiers. Do not use it for real problems.

Read more in the User Guide.

New in version 0.13.

Parameters

strategystr, default=”stratified”

Strategy to use to generate predictions.

  • “stratified”: generates predictions by respecting the training set’s class distribution.

  • “most_frequent”: always predicts the most frequent label in the training set.

  • “prior”: always predicts the class that maximizes the class prior (like “most_frequent”) and predict_proba returns the class prior.

  • “uniform”: generates predictions uniformly at random.

  • “constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class

    Changed in version 0.22: The default value of strategy will change to “prior” in version 0.24. Starting from version 0.22, a warning will be raised if strategy is not explicitly set.

    New in version 0.17: Dummy Classifier now supports prior fitting strategy using parameter prior.

random_stateint, RandomState instance or None, optional, default=None

Controls the randomness to generate the predictions when strategy='stratified' or strategy='uniform'. Pass an int for reproducible output across multiple function calls. See Glossary.

constantint or str or array-like of shape (n_outputs,)

The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.

Attributes

classes_array or list of array of shape (n_classes,)

Class labels for each output.

n_classes_array or list of array of shape (n_classes,)

Number of label for each output.

class_prior_array or list of array of shape (n_classes,)

Probability of each class for each output.

n_outputs_int,

Number of outputs.

sparse_output_bool,

True if the array returned from predict is to be in sparse CSC format. Is automatically set to True if the input y is passed in sparse format.

Examples

>>> import numpy as np
>>> from sklearn.dummy import DummyClassifier
>>> X = np.array([-1, 1, 1, 1])
>>> y = np.array([0, 1, 1, 1])
>>> dummy_clf = DummyClassifier(strategy="most_frequent")
>>> dummy_clf.fit(X, y)
DummyClassifier(strategy='most_frequent')
>>> dummy_clf.predict(X)
array([1, 1, 1, 1])
>>> dummy_clf.score(X, y)
0.75

Full API documentation: DummyClassifierScikitsLearnNode

class mdp.nodes.DummyRegressorScikitsLearnNode

DummyRegressor is a regressor that makes predictions using simple rules. This node has been automatically generated by wrapping the sklearn.dummy.DummyRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This regressor is useful as a simple baseline to compare with other (real) regressors. Do not use it for real problems.

Read more in the User Guide.

New in version 0.13.

Parameters

strategystr

Strategy to use to generate predictions.

  • “mean”: always predicts the mean of the training set

  • “median”: always predicts the median of the training set

  • “quantile”: always predicts a specified quantile of the training set, provided with the quantile parameter.

  • “constant”: always predicts a constant value that is provided by the user.

constantint or float or array-like of shape (n_outputs,)

The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.

quantilefloat in [0.0, 1.0]

The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.

Attributes

constant_array, shape (1, n_outputs)

Mean or median or quantile of the training targets or constant value given by the user.

n_outputs_int,

Number of outputs.

Examples

>>> import numpy as np
>>> from sklearn.dummy import DummyRegressor
>>> X = np.array([1.0, 2.0, 3.0, 4.0])
>>> y = np.array([2.0, 3.0, 5.0, 10.0])
>>> dummy_regr = DummyRegressor(strategy="mean")
>>> dummy_regr.fit(X, y)
DummyRegressor()
>>> dummy_regr.predict(X)
array([5., 5., 5., 5.])
>>> dummy_regr.score(X, y)
0.0

Full API documentation: DummyRegressorScikitsLearnNode

class mdp.nodes.RandomForestClassifierScikitsLearnNode

A random forest classifier. This node has been automatically generated by wrapping the sklearn.ensemble._forest.RandomForestClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.

Read more in the User Guide.

Parameters

n_estimatorsint, default=100

The number of trees in the forest.

Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.

criterion{“gini”, “entropy”}, default=”gini”

The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Note: this parameter is tree-specific.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_features{“auto”, “sqrt”, “log2”}, int or float, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=sqrt(n_features).

  • If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the generalization accuracy.

n_jobsint, default=None

The number of jobs to run in parallel. fit(), predict(), decision_path() and apply() are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

class_weight{“balanced”, “balanced_subsample”}, dict or list of dicts, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator.

  • If None (default), then draw X.shape[0] samples.

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).

New in version 0.22.

Attributes

base_estimator_DecisionTreeClassifier

The child estimator template used to create the collection of fitted sub-estimators.

estimators_list of DecisionTreeClassifier

The collection of fitted sub-estimators.

classes_ndarray of shape (n_classes,) or a list of such arrays

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

n_classes_int or list

The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_decision_function_ndarray of shape (n_samples, n_classes)

Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

See Also

DecisionTreeClassifier, ExtraTreesClassifier

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

References

1
  1. Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.

Examples

>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=1000, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = RandomForestClassifier(max_depth=2, random_state=0)
>>> clf.fit(X, y)
RandomForestClassifier(...)
>>> print(clf.predict([[0, 0, 0, 0]]))
[1]

Full API documentation: RandomForestClassifierScikitsLearnNode

class mdp.nodes.RandomForestRegressorScikitsLearnNode

A random forest regressor. This node has been automatically generated by wrapping the sklearn.ensemble._forest.RandomForestRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.

Read more in the User Guide.

Parameters

n_estimatorsint, default=100

The number of trees in the forest.

Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.

criterion{“mse”, “mae”}, default=”mse”

The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.

New in version 0.18: Mean Absolute Error (MAE) criterion.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_features{“auto”, “sqrt”, “log2”}, int or float, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=n_features.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_scorebool, default=False

whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobsint, default=None

The number of jobs to run in parallel. fit(), predict(), decision_path() and apply() are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features). See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator.

  • If None (default), then draw X.shape[0] samples.

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).

New in version 0.22.

Attributes

base_estimator_DecisionTreeRegressor

The child estimator template used to create the collection of fitted sub-estimators.

estimators_list of DecisionTreeRegressor

The collection of fitted sub-estimators.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_prediction_ndarray of shape (n_samples,)

Prediction computed with out-of-bag estimate on the training set. This attribute exists only when oob_score is True.

See Also

DecisionTreeRegressor, ExtraTreesRegressor

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

The default value max_features="auto" uses n_features rather than n_features / 3. The latter was originally suggested in [1], whereas the former was more recently justified empirically in [2].

References

1
  1. Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.

2

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Examples

>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, n_informative=2,
...                        random_state=0, shuffle=False)
>>> regr = RandomForestRegressor(max_depth=2, random_state=0)
>>> regr.fit(X, y)
RandomForestRegressor(...)
>>> print(regr.predict([[0, 0, 0, 0]]))
[-8.32987858]

Full API documentation: RandomForestRegressorScikitsLearnNode

class mdp.nodes.RandomTreesEmbeddingScikitsLearnNode

An ensemble of totally random trees. This node has been automatically generated by wrapping the sklearn.ensemble._forest.RandomTreesEmbedding class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. An unsupervised transformation of a dataset to a high-dimensional sparse representation. A datapoint is coded according to which leaf of each tree it is sorted into. Using a one-hot encoding of the leaves, this leads to a binary coding with as many ones as there are trees in the forest.

The dimensionality of the resulting representation is n_out <= n_estimators * max_leaf_nodes. If max_leaf_nodes == None, the number of leaf nodes is at most n_estimators * 2 ** max_depth.

Read more in the User Guide.

Parameters

n_estimatorsint, default=100

Number of trees in the forest.

Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.

max_depthint, default=5

The maximum depth of each tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) is the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) is the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

sparse_outputbool, default=True

Whether or not to return a sparse CSR matrix, as default behavior, or to return a dense array compatible with dense pipeline operators.

n_jobsint, default=None

The number of jobs to run in parallel. fit(), transform(), decision_path() and apply() are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls the generation of the random y used to fit the trees and the draw of the splits for each feature at the trees’ nodes. See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

Attributes

estimators_list of DecisionTreeClassifier

The collection of fitted sub-estimators.

References

1

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

2

Moosmann, F. and Triggs, B. and Jurie, F. “Fast discriminative visual codebooks using randomized clustering forests” NIPS 2007

Examples

>>> from sklearn.ensemble import RandomTreesEmbedding
>>> X = [[0,0], [1,0], [0,1], [-1,0], [0,-1]]
>>> random_trees = RandomTreesEmbedding(
...    n_estimators=5, random_state=0, max_depth=1).fit(X)
>>> X_sparse_embedding = random_trees.transform(X)
>>> X_sparse_embedding.toarray()
array([[0., 1., 1., 0., 1., 0., 0., 1., 1., 0.],
       [0., 1., 1., 0., 1., 0., 0., 1., 1., 0.],
       [0., 1., 0., 1., 0., 1., 0., 1., 0., 1.],
       [1., 0., 1., 0., 1., 0., 1., 0., 1., 0.],
       [0., 1., 1., 0., 1., 0., 0., 1., 1., 0.]])

Full API documentation: RandomTreesEmbeddingScikitsLearnNode

class mdp.nodes.ExtraTreesClassifierScikitsLearnNode

An extra-trees classifier. This node has been automatically generated by wrapping the sklearn.ensemble._forest.ExtraTreesClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Read more in the User Guide.

Parameters

n_estimatorsint, default=100

The number of trees in the forest.

Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.

criterion{“gini”, “entropy”}, default=”gini”

The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_features{“auto”, “sqrt”, “log2”}, int or float, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=sqrt(n_features).

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

bootstrapbool, default=False

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the generalization accuracy.

n_jobsint, default=None

The number of jobs to run in parallel. fit(), predict(), decision_path() and apply() are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint, RandomState, default=None

Controls 3 sources of randomness:

  • the bootstrapping of the samples used when building trees (if bootstrap=True)

  • the sampling of the features to consider when looking for the best split at each node (if max_features < n_features)

  • the draw of the splits for each of the max_features

See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

class_weight{“balanced”, “balanced_subsample”}, dict or list of dicts, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator.

  • If None (default), then draw X.shape[0] samples.

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).

New in version 0.22.

Attributes

base_estimator_ExtraTreesClassifier

The child estimator template used to create the collection of fitted sub-estimators.

estimators_list of DecisionTreeClassifier

The collection of fitted sub-estimators.

classes_ndarray of shape (n_classes,) or a list of such arrays

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

n_classes_int or list

The number of classes (single output problem), or a list containing the number of classes for each output (multi-output problem).

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

n_features_int

The number of features when fit is performed.

n_outputs_int

The number of outputs when fit is performed.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_decision_function_ndarray of shape (n_samples, n_classes)

Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

See Also

sklearn.tree.ExtraTreeClassifier : Base classifier for this ensemble. RandomForestClassifier : Ensemble Classifier based on trees with optimal

splits.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Examples

>>> from sklearn.ensemble import ExtraTreesClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_features=4, random_state=0)
>>> clf = ExtraTreesClassifier(n_estimators=100, random_state=0)
>>> clf.fit(X, y)
ExtraTreesClassifier(random_state=0)
>>> clf.predict([[0, 0, 0, 0]])
array([1])

Full API documentation: ExtraTreesClassifierScikitsLearnNode

class mdp.nodes.ExtraTreesRegressorScikitsLearnNode

An extra-trees regressor. This node has been automatically generated by wrapping the sklearn.ensemble._forest.ExtraTreesRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Read more in the User Guide.

Parameters

n_estimatorsint, default=100

The number of trees in the forest.

Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.

criterion{“mse”, “mae”}, default=”mse”

The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.

New in version 0.18: Mean Absolute Error (MAE) criterion.

max_depthint, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_features{“auto”, “sqrt”, “log2”} int or float, default=”auto”

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=n_features.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

bootstrapbool, default=False

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobsint, default=None

The number of jobs to run in parallel. fit(), predict(), decision_path() and apply() are all parallelized over the trees. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls 3 sources of randomness:

  • the bootstrapping of the samples used when building trees (if bootstrap=True)

  • the sampling of the features to consider when looking for the best split at each node (if max_features < n_features)

  • the draw of the splits for each of the max_features

See Glossary for details.

verboseint, default=0

Controls the verbosity when fitting and predicting.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

max_samplesint or float, default=None

If bootstrap is True, the number of samples to draw from X to train each base estimator.

  • If None (default), then draw X.shape[0] samples.

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).

New in version 0.22.

Attributes

base_estimator_ExtraTreeRegressor

The child estimator template used to create the collection of fitted sub-estimators.

estimators_list of DecisionTreeRegressor

The collection of fitted sub-estimators.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

n_features_int

The number of features.

n_outputs_int

The number of outputs.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_prediction_ndarray of shape (n_samples,)

Prediction computed with out-of-bag estimate on the training set. This attribute exists only when oob_score is True.

See Also

sklearn.tree.ExtraTreeRegressor: Base estimator for this ensemble. RandomForestRegressor: Ensemble regressor using trees with optimal splits.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

1

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.ensemble import ExtraTreesRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> reg = ExtraTreesRegressor(n_estimators=100, random_state=0).fit(
...    X_train, y_train)
>>> reg.score(X_test, y_test)
0.2708...

Full API documentation: ExtraTreesRegressorScikitsLearnNode

class mdp.nodes.BaggingClassifierScikitsLearnNode

A Bagging classifier. This node has been automatically generated by wrapping the sklearn.ensemble._bagging.BaggingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]_. If samples are drawn with replacement, then the method is known as Bagging [2]_. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]_. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4]_.

Read more in the User Guide.

New in version 0.15.

Parameters

base_estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=1.0

The number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details).

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples.

max_featuresint or float, default=1.0

The number of features to draw from X to train each base estimator ( without replacement by default, see bootstrap_features for more details).

  • If int, then draw max_features features.

  • If float, then draw max_features * X.shape[1] features.

bootstrapbool, default=True

Whether samples are drawn with replacement. If False, sampling without replacement is performed.

bootstrap_featuresbool, default=False

Whether features are drawn with replacement.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the generalization error.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary.

New in version 0.17: warm_start constructor parameter.

n_jobsint, default=None

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary.

verboseint, default=0

Controls the verbosity when fitting and predicting.

Attributes

base_estimator_estimator

The base estimator from which the ensemble is grown.

n_features_int

The number of features when fit() is performed.

estimators_list of estimators

The collection of fitted base estimators.

estimators_samples_list of arrays

The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.

estimators_features_list of arrays

The subset of drawn features for each base estimator.

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int or list

The number of classes.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_decision_function_ndarray of shape (n_samples, n_classes)

Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.

Examples

>>> from sklearn.svm import SVC
>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = BaggingClassifier(base_estimator=SVC(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])

References

1

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

2

L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

3

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

4

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Full API documentation: BaggingClassifierScikitsLearnNode

class mdp.nodes.BaggingRegressorScikitsLearnNode

A Bagging regressor. This node has been automatically generated by wrapping the sklearn.ensemble._bagging.BaggingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting [1]_. If samples are drawn with replacement, then the method is known as Bagging [2]_. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces [3]_. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches [4]_.

Read more in the User Guide.

New in version 0.15.

Parameters

base_estimatorobject, default=None

The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.

n_estimatorsint, default=10

The number of base estimators in the ensemble.

max_samplesint or float, default=1.0

The number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details).

  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples.

max_featuresint or float, default=1.0

The number of features to draw from X to train each base estimator ( without replacement by default, see bootstrap_features for more details).

  • If int, then draw max_features features.

  • If float, then draw max_features * X.shape[1] features.

bootstrapbool, default=True

Whether samples are drawn with replacement. If False, sampling without replacement is performed.

bootstrap_featuresbool, default=False

Whether features are drawn with replacement.

oob_scorebool, default=False

Whether to use out-of-bag samples to estimate the generalization error.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary.

n_jobsint, default=None

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

random_stateint or RandomState, default=None

Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary.

verboseint, default=0

Controls the verbosity when fitting and predicting.

Attributes

base_estimator_estimator

The base estimator from which the ensemble is grown.

n_features_int

The number of features when fit() is performed.

estimators_list of estimators

The collection of fitted sub-estimators.

estimators_samples_list of arrays

The subset of drawn samples (i.e., the in-bag samples) for each base estimator. Each subset is defined by an array of the indices selected.

estimators_features_list of arrays

The subset of drawn features for each base estimator.

oob_score_float

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

oob_prediction_ndarray of shape (n_samples,)

Prediction computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_prediction_ might contain NaN. This attribute exists only when oob_score is True.

Examples

>>> from sklearn.svm import SVR
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=100, n_features=4,
...                        n_informative=2, n_targets=1,
...                        random_state=0, shuffle=False)
>>> regr = BaggingRegressor(base_estimator=SVR(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> regr.predict([[0, 0, 0, 0]])
array([-2.8720...])

References

1

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

2

L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

3

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

4

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Full API documentation: BaggingRegressorScikitsLearnNode

class mdp.nodes.IsolationForestScikitsLearnNode

Isolation Forest Algorithm. This node has been automatically generated by wrapping the sklearn.ensemble._iforest.IsolationForest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Return the anomaly score of each sample using the IsolationForest algorithm

The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.

This path length, averaged over a forest of such random trees, is a measure of normality and our decision function.

Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.

Read more in the User Guide.

New in version 0.18.

Parameters

n_estimatorsint, default=100

The number of base estimators in the ensemble.

max_samples“auto”, int or float, default=”auto”
The number of samples to draw from X to train each base estimator.
  • If int, then draw max_samples samples.

  • If float, then draw max_samples * X.shape[0] samples.

  • If “auto”, then max_samples=min(256, n_samples).

If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling).

contamination‘auto’ or float, default=’auto’

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.

  • If ‘auto’, the threshold is determined as in the original paper.

  • If float, the contamination should be in the range [0, 0.5].

Changed in version 0.22: The default value of contamination changed from 0.1 to 'auto'.

max_featuresint or float, default=1.0

The number of features to draw from X to train each base estimator.

  • If int, then draw max_features features.

  • If float, then draw max_features * X.shape[1] features.

bootstrapbool, default=False

If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

n_jobsint, default=None

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

behaviourstr, default=’deprecated’

This parameter has no effect, is deprecated, and will be removed.

New in version 0.20: behaviour is added in 0.20 for back-compatibility purpose.

Deprecated since version 0.20: behaviour='old' is deprecated in 0.20 and will not be possible in 0.22.

Deprecated since version 0.22: behaviour parameter is deprecated in 0.22 and removed in 0.24.

random_stateint or RandomState, default=None

Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

Pass an int for reproducible results across multiple function calls. See Glossary.

verboseint, default=0

Controls the verbosity of the tree building process.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.

New in version 0.21.

Attributes

estimators_list of DecisionTreeClassifier

The collection of fitted sub-estimators.

estimators_samples_list of arrays

The subset of drawn samples (i.e., the in-bag samples) for each base estimator.

max_samples_int

The actual number of samples.

offset_float

Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples - offset_. offset_ is defined as follows. When the contamination parameter is set to “auto”, the offset is equal to -0.5 as the scores of inliers are close to 0 and the scores of outliers are close to -1. When a contamination parameter different than “auto” is provided, the offset is defined in such a way we obtain the expected number of outliers (samples with decision function < 0) in training.

New in version 0.20.

estimators_features_list of arrays

The subset of drawn features for each base estimator.

Notes

The implementation is based on an ensemble of ExtraTreeRegressor. The maximum depth of each tree is set to ceil(log_2(n)) where \(n\) is the number of samples used to build the tree (see (Liu et al., 2008) for more details).

References

1

Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. “Isolation forest.” Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on.

2

Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. “Isolation-based anomaly detection.” ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 3.

See Also

sklearn.covariance.EllipticEnvelopeAn object for detecting outliers in a

Gaussian distributed dataset.

sklearn.svm.OneClassSVMUnsupervised Outlier Detection.

Estimate the support of a high-dimensional distribution. The implementation is based on libsvm.

sklearn.neighbors.LocalOutlierFactorUnsupervised Outlier Detection

using Local Outlier Factor (LOF).

Examples

>>> from sklearn.ensemble import IsolationForest
>>> X = [[-1.1], [0.3], [0.5], [100]]
>>> clf = IsolationForest(random_state=0).fit(X)
>>> clf.predict([[0.1], [0], [90]])
array([ 1,  1, -1])

Full API documentation: IsolationForestScikitsLearnNode

class mdp.nodes.AdaBoostClassifierScikitsLearnNode

An AdaBoost classifier. This node has been automatically generated by wrapping the sklearn.ensemble._weight_boosting.AdaBoostClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. An AdaBoost [1] classifier is a meta-estimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

This class implements the algorithm known as AdaBoost-SAMME [2].

Read more in the User Guide.

New in version 0.14.

Parameters

base_estimatorobject, default=None

The base estimator from which the boosted ensemble is built. Support for sample weighting is required, as well as proper classes_ and n_classes_ attributes. If None, then the base estimator is DecisionTreeClassifier(max_depth=1).

n_estimatorsint, default=50

The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

learning_ratefloat, default=1.

Learning rate shrinks the contribution of each classifier by learning_rate. There is a trade-off between learning_rate and n_estimators.

algorithm{‘SAMME’, ‘SAMME.R’}, default=’SAMME.R’

If ‘SAMME.R’ then use the SAMME.R real boosting algorithm. base_estimator must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.

random_stateint or RandomState, default=None

Controls the random seed given at each base_estimator at each boosting iteration. Thus, it is only used when base_estimator exposes a random_state. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

base_estimator_estimator

The base estimator from which the ensemble is grown.

estimators_list of classifiers

The collection of fitted sub-estimators.

classes_ndarray of shape (n_classes,)

The classes labels.

n_classes_int

The number of classes.

estimator_weights_ndarray of floats

Weights for each estimator in the boosted ensemble.

estimator_errors_ndarray of floats

Classification error for each estimator in the boosted ensemble.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances if supported by the base_estimator (when based on decision trees).

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

See Also

AdaBoostRegressor

An AdaBoost regressor that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction.

GradientBoostingClassifier

GB builds an additive model in a forward stage-wise fashion. Regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.

sklearn.tree.DecisionTreeClassifier

A non-parametric supervised learning method used for classification. Creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

References

1

Y. Freund, R. Schapire, “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting”, 1995.

2
  1. Zhu, H. Zou, S. Rosset, T. Hastie, “Multi-class AdaBoost”, 2009.

Examples

>>> from sklearn.ensemble import AdaBoostClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=1000, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = AdaBoostClassifier(n_estimators=100, random_state=0)
>>> clf.fit(X, y)
AdaBoostClassifier(n_estimators=100, random_state=0)
>>> clf.predict([[0, 0, 0, 0]])
array([1])
>>> clf.score(X, y)
0.983...

Full API documentation: AdaBoostClassifierScikitsLearnNode

class mdp.nodes.AdaBoostRegressorScikitsLearnNode

An AdaBoost regressor. This node has been automatically generated by wrapping the sklearn.ensemble._weight_boosting.AdaBoostRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. An AdaBoost [1] regressor is a meta-estimator that begins by fitting a regressor on the original dataset and then fits additional copies of the regressor on the same dataset but where the weights of instances are adjusted according to the error of the current prediction. As such, subsequent regressors focus more on difficult cases.

This class implements the algorithm known as AdaBoost.R2 [2].

Read more in the User Guide.

New in version 0.14.

Parameters

base_estimatorobject, default=None

The base estimator from which the boosted ensemble is built. If None, then the base estimator is DecisionTreeRegressor(max_depth=3).

n_estimatorsint, default=50

The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

learning_ratefloat, default=1.

Learning rate shrinks the contribution of each regressor by learning_rate. There is a trade-off between learning_rate and n_estimators.

loss{‘linear’, ‘square’, ‘exponential’}, default=’linear’

The loss function to use when updating the weights after each boosting iteration.

random_stateint or RandomState, default=None

Controls the random seed given at each base_estimator at each boosting iteration. Thus, it is only used when base_estimator exposes a random_state. In addition, it controls the bootstrap of the weights used to train the base_estimator at each boosting iteration. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

base_estimator_estimator

The base estimator from which the ensemble is grown.

estimators_list of classifiers

The collection of fitted sub-estimators.

estimator_weights_ndarray of floats

Weights for each estimator in the boosted ensemble.

estimator_errors_ndarray of floats

Regression error for each estimator in the boosted ensemble.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances if supported by the base_estimator (when based on decision trees).

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Examples

>>> from sklearn.ensemble import AdaBoostRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, n_informative=2,
...                        random_state=0, shuffle=False)
>>> regr = AdaBoostRegressor(random_state=0, n_estimators=100)
>>> regr.fit(X, y)
AdaBoostRegressor(n_estimators=100, random_state=0)
>>> regr.predict([[0, 0, 0, 0]])
array([4.7972...])
>>> regr.score(X, y)
0.9771...

See also

AdaBoostClassifier, GradientBoostingRegressor, sklearn.tree.DecisionTreeRegressor

References

1

Y. Freund, R. Schapire, “A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting”, 1995.

2
  1. Drucker, “Improving Regressors using Boosting Techniques”, 1997.

Full API documentation: AdaBoostRegressorScikitsLearnNode

class mdp.nodes.GradientBoostingClassifierScikitsLearnNode

Gradient Boosting for classification. This node has been automatically generated by wrapping the sklearn.ensemble._gb.GradientBoostingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the binomial or multinomial deviance loss function. Binary classification is a special case where only a single regression tree is induced.

Read more in the User Guide.

Parameters

loss{‘deviance’, ‘exponential’}, default=’deviance’

loss function to be optimized. ‘deviance’ refers to deviance (= logistic regression) for classification with probabilistic outputs. For loss ‘exponential’ gradient boosting recovers the AdaBoost algorithm.

learning_ratefloat, default=0.1

learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

n_estimatorsint, default=100

The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

subsamplefloat, default=1.0

The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

criterion{‘friedman_mse’, ‘mse’, ‘mae’}, default=’friedman_mse’

The function to measure the quality of a split. Supported criteria are ‘friedman_mse’ for the mean squared error with improvement score by Friedman, ‘mse’ for mean squared error, and ‘mae’ for the mean absolute error. The default value of ‘friedman_mse’ is generally the best as it can provide a better approximation in some cases.

New in version 0.18.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_depthint, default=3

maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

initestimator or ‘zero’, default=None

An estimator object that is used to compute the initial predictions. init has to provide fit() and predict_proba(). If ‘zero’, the initial raw predictions are set to zero. By default, a DummyEstimator predicting the classes priors is used.

random_stateint or RandomState, default=None

Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details). It also controls the random spliting of the training data to obtain a validation set if n_iter_no_change is not None. Pass an int for reproducible output across multiple function calls. See Glossary.

max_features{‘auto’, ‘sqrt’, ‘log2’}, int or float, default=None

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If ‘auto’, then max_features=sqrt(n_features).

  • If ‘sqrt’, then max_features=sqrt(n_features).

  • If ‘log2’, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

verboseint, default=0

Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. See the Glossary.

presortdeprecated, default=’deprecated’

This parameter is deprecated and will be removed in v0.24.

Deprecated since version 0.22.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if n_iter_no_change is set to an integer.

New in version 0.20.

n_iter_no_changeint, default=None

n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping. If set to a number, it will set aside validation_fraction size of the training data as validation and terminate training when validation score is not improving in all of the previous n_iter_no_change numbers of iterations. The split is stratified.

New in version 0.20.

tolfloat, default=1e-4

Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

New in version 0.20.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

n_estimators_int

The number of estimators as selected by early stopping (if n_iter_no_change is specified). Otherwise it is set to n_estimators.

New in version 0.20.

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

oob_improvement_ndarray of shape (n_estimators,)

The improvement in loss (= deviance) on the out-of-bag samples relative to the previous iteration. oob_improvement_[0] is the improvement in loss of the first stage over the init estimator. Only available if subsample < 1.0

train_score_ndarray of shape (n_estimators,)

The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data.

loss_LossFunction

The concrete LossFunction object.

init_estimator

The estimator that provides the initial predictions. Set via the init argument or loss.init_estimator.

estimators_ndarray of DecisionTreeRegressor of shape (n_estimators, loss_.K)

The collection of fitted sub-estimators. loss_.K is 1 for binary classification, otherwise n_classes.

classes_ndarray of shape (n_classes,)

The classes labels.

n_features_int

The number of data features.

n_classes_int

The number of classes.

max_features_int

The inferred value of max_features.

Notes

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

Examples

>>> from sklearn.datasets import make_classification
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> clf = GradientBoostingClassifier(random_state=0)
>>> clf.fit(X_train, y_train)
GradientBoostingClassifier(random_state=0)
>>> clf.predict(X_test[:2])
array([1, 0])
>>> clf.score(X_test, y_test)
0.88

See also

sklearn.ensemble.HistGradientBoostingClassifier, sklearn.tree.DecisionTreeClassifier, RandomForestClassifier AdaBoostClassifier

References

J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

  1. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.

Full API documentation: GradientBoostingClassifierScikitsLearnNode

class mdp.nodes.GradientBoostingRegressorScikitsLearnNode

Gradient Boosting for regression. This node has been automatically generated by wrapping the sklearn.ensemble._gb.GradientBoostingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.

Read more in the User Guide.

Parameters

loss{‘ls’, ‘lad’, ‘huber’, ‘quantile’}, default=’ls’

loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).

learning_ratefloat, default=0.1

learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

n_estimatorsint, default=100

The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

subsamplefloat, default=1.0

The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.

criterion{‘friedman_mse’, ‘mse’, ‘mae’}, default=’friedman_mse’

The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error. The default value of “friedman_mse” is generally the best as it can provide a better approximation in some cases.

New in version 0.18.

min_samples_splitint or float, default=2

The minimum number of samples required to split an internal node:

  • If int, then consider min_samples_split as the minimum number.

  • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

Changed in version 0.18: Added float values for fractions.

min_samples_leafint or float, default=1

The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

  • If int, then consider min_samples_leaf as the minimum number.

  • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

Changed in version 0.18: Added float values for fractions.

min_weight_fraction_leaffloat, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_depthint, default=3

maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.

min_impurity_decreasefloat, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

New in version 0.19.

min_impurity_splitfloat, default=None

Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

Deprecated since version 0.19: min_impurity_split has been deprecated in favor of min_impurity_decrease in 0.19. The default value of min_impurity_split has changed from 1e-7 to 0 in 0.23 and it will be removed in 0.25. Use min_impurity_decrease instead.

initestimator or ‘zero’, default=None

An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’ls’), or a quantile for the other losses.

random_stateint or RandomState, default=None

Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details). It also controls the random spliting of the training data to obtain a validation set if n_iter_no_change is not None. Pass an int for reproducible output across multiple function calls. See Glossary.

max_features{‘auto’, ‘sqrt’, ‘log2’}, int or float, default=None

The number of features to consider when looking for the best split:

  • If int, then consider max_features features at each split.

  • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

  • If “auto”, then max_features=n_features.

  • If “sqrt”, then max_features=sqrt(n_features).

  • If “log2”, then max_features=log2(n_features).

  • If None, then max_features=n_features.

Choosing max_features < n_features leads to a reduction of variance and an increase in bias.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

alphafloat, default=0.9

The alpha-quantile of the huber loss function and the quantile loss function. Only if loss='huber' or loss='quantile'.

verboseint, default=0

Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.

max_leaf_nodesint, default=None

Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. See the Glossary.

presortdeprecated, default=’deprecated’

This parameter is deprecated and will be removed in v0.24.

Deprecated since version 0.22.

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if n_iter_no_change is set to an integer.

New in version 0.20.

n_iter_no_changeint, default=None

n_iter_no_change is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping. If set to a number, it will set aside validation_fraction size of the training data as validation and terminate training when validation score is not improving in all of the previous n_iter_no_change numbers of iterations.

New in version 0.20.

tolfloat, default=1e-4

Tolerance for the early stopping. When the loss is not improving by at least tol for n_iter_no_change iterations (if set to a number), the training stops.

New in version 0.20.

ccp_alphanon-negative float, default=0.0

Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

New in version 0.22.

Attributes

feature_importances_ndarray of shape (n_features,)

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

oob_improvement_ndarray of shape (n_estimators,)

The improvement in loss (= deviance) on the out-of-bag samples relative to the previous iteration. oob_improvement_[0] is the improvement in loss of the first stage over the init estimator. Only available if subsample < 1.0

train_score_ndarray of shape (n_estimators,)

The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. If subsample == 1 this is the deviance on the training data.

loss_LossFunction

The concrete LossFunction object.

init_estimator

The estimator that provides the initial predictions. Set via the init argument or loss.init_estimator.

estimators_ndarray of DecisionTreeRegressor of shape (n_estimators, 1)

The collection of fitted sub-estimators.

n_features_int

The number of data features.

max_features_int

The inferred value of max_features.

Notes

The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

Examples

>>> from sklearn.datasets import make_regression
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_regression(random_state=0)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> reg = GradientBoostingRegressor(random_state=0)
>>> reg.fit(X_train, y_train)
GradientBoostingRegressor(random_state=0)
>>> reg.predict(X_test[1:2])
array([-61...])
>>> reg.score(X_test, y_test)
0.4...

See also

sklearn.ensemble.HistGradientBoostingRegressor, sklearn.tree.DecisionTreeRegressor, RandomForestRegressor

References

J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.

  1. Friedman, Stochastic Gradient Boosting, 1999

T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.

Full API documentation: GradientBoostingRegressorScikitsLearnNode

class mdp.nodes.VotingClassifierScikitsLearnNode

Soft Voting/Majority Rule classifier for unfitted estimators. This node has been automatically generated by wrapping the sklearn.ensemble._voting.VotingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. .. versionadded:: 0.17

Read more in the User Guide.

Parameters

estimatorslist of (str, estimator) tuples

Invoking the fit method on the VotingClassifier will fit clones of those original estimators that will be stored in the class attribute self.estimators_. An estimator can be set to 'drop' using set_params.

Changed in version 0.21: 'drop' is accepted.

Deprecated since version 0.22: Using None to drop an estimator is deprecated in 0.22 and support will be dropped in 0.24. Use the string 'drop' instead.

voting{‘hard’, ‘soft’}, default=’hard’

If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

weightsarray-like of shape (n_classifiers,), default=None

Sequence of weights (float or int) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights if None.

n_jobsint, default=None

The number of jobs to run in parallel for fit. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

New in version 0.18.

flatten_transformbool, default=True

Affects shape of transform output only when voting=’soft’ If voting=’soft’ and flatten_transform=True, transform method returns matrix with shape (n_samples, n_classifiers * n_classes). If flatten_transform=False, it returns (n_classifiers, n_samples, n_classes).

verbosebool, default=False

If True, the time elapsed while fitting will be printed as it is completed.

Attributes

estimators_list of classifiers

The collection of fitted sub-estimators as defined in estimators that are not ‘drop’.

named_estimators_Bunch

Attribute to access any fitted sub-estimators by name.

New in version 0.20.

classes_array-like of shape (n_predictions,)

The classes labels.

See Also

VotingRegressor: Prediction voting regressor.

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.ensemble import RandomForestClassifier, VotingClassifier
>>> clf1 = LogisticRegression(multi_class='multinomial', random_state=1)
>>> clf2 = RandomForestClassifier(n_estimators=50, random_state=1)
>>> clf3 = GaussianNB()
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> y = np.array([1, 1, 1, 2, 2, 2])
>>> eclf1 = VotingClassifier(estimators=[
...         ('lr', clf1), ('rf', clf2), ('gnb', clf3)], voting='hard')
>>> eclf1 = eclf1.fit(X, y)
>>> print(eclf1.predict(X))
[1 1 1 2 2 2]
>>> np.array_equal(eclf1.named_estimators_.lr.predict(X),
...                eclf1.named_estimators_['lr'].predict(X))
True
>>> eclf2 = VotingClassifier(estimators=[
...         ('lr', clf1), ('rf', clf2), ('gnb', clf3)],
...         voting='soft')
>>> eclf2 = eclf2.fit(X, y)
>>> print(eclf2.predict(X))
[1 1 1 2 2 2]
>>> eclf3 = VotingClassifier(estimators=[
...        ('lr', clf1), ('rf', clf2), ('gnb', clf3)],
...        voting='soft', weights=[2,1,1],
...        flatten_transform=True)
>>> eclf3 = eclf3.fit(X, y)
>>> print(eclf3.predict(X))
[1 1 1 2 2 2]
>>> print(eclf3.transform(X).shape)
(6, 6)

Full API documentation: VotingClassifierScikitsLearnNode

class mdp.nodes.VotingRegressorScikitsLearnNode

Prediction voting regressor for unfitted estimators. This node has been automatically generated by wrapping the sklearn.ensemble._voting.VotingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. .. versionadded:: 0.21

A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.

Read more in the User Guide.

Parameters

estimatorslist of (str, estimator) tuples

Invoking the fit method on the VotingRegressor will fit clones of those original estimators that will be stored in the class attribute self.estimators_. An estimator can be set to 'drop' using set_params.

Changed in version 0.21: 'drop' is accepted.

Deprecated since version 0.22: Using None to drop an estimator is deprecated in 0.22 and support will be dropped in 0.24. Use the string 'drop' instead.

weightsarray-like of shape (n_regressors,), default=None

Sequence of weights (float or int) to weight the occurrences of predicted values before averaging. Uses uniform weights if None.

n_jobsint, default=None

The number of jobs to run in parallel for fit. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

verbosebool, default=False

If True, the time elapsed while fitting will be printed as it is completed.

Attributes

estimators_list of regressors

The collection of fitted sub-estimators as defined in estimators that are not ‘drop’.

named_estimators_Bunch

Attribute to access any fitted sub-estimators by name.

New in version 0.20.

See Also

VotingClassifier: Soft Voting/Majority Rule classifier.

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import VotingRegressor
>>> r1 = LinearRegression()
>>> r2 = RandomForestRegressor(n_estimators=10, random_state=1)
>>> X = np.array([[1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36]])
>>> y = np.array([2, 6, 12, 20, 30, 42])
>>> er = VotingRegressor([('lr', r1), ('rf', r2)])
>>> print(er.fit(X, y).predict(X))
[ 3.3  5.7 11.8 19.7 28.  40.3]

Full API documentation: VotingRegressorScikitsLearnNode

class mdp.nodes.StackingClassifierScikitsLearnNode

Stack of estimators with a final classifier. This node has been automatically generated by wrapping the sklearn.ensemble._stacking.StackingClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.

New in version 0.22.

Read more in the User Guide.

Parameters

estimatorslist of (str, estimator)

Base estimators which will be stacked together. Each element of the list is defined as a tuple of string (i.e. name) and an estimator instance. An estimator can be set to ‘drop’ using set_params.

final_estimatorestimator, default=None

A classifier which will be used to combine the base estimators. The default classifier is a LogisticRegression.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy used in cross_val_predict to train final_estimator. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • integer, to specify the number of folds in a (Stratified) KFold,

  • An object to be used as a cross-validation generator,

  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Note

  • A larger number of split will provide no benefits if the number

  • of training samples is large enough. Indeed, the training time

  • will increase. cv is not used for model evaluation but for

  • prediction.

stack_method{‘auto’, ‘predict_proba’, ‘decision_function’, ‘predict’}, default=’auto’

Methods called for each base estimator. It can be:

  • if ‘auto’, it will try to invoke, for each estimator, ‘predict_proba’, ‘decision_function’ or ‘predict’ in that order.

  • otherwise, one of ‘predict_proba’, ‘decision_function’ or ‘predict’. If the method is not implemented by the estimator, it will raise an error.

n_jobsint, default=None

The number of jobs to run in parallel all estimators fit. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

passthroughbool, default=False

When False, only the predictions of estimators will be used as training data for final_estimator. When True, the final_estimator is trained on the predictions as well as the original training data.

verboseint, default=0

Verbosity level.

Attributes

classes_ndarray of shape (n_classes,)

Class labels.

estimators_list of estimators

The elements of the estimators parameter, having been fitted on the training data. If an estimator has been set to ‘drop’, it will not appear in estimators_.

named_estimators_Bunch

Attribute to access any fitted sub-estimators by name.

final_estimator_estimator

The classifier which predicts given the output of estimators_.

stack_method_list of str

The method used by each base estimator.

Notes

When predict_proba is used by each estimator (i.e. most of the time for stack_method=’auto’ or specifically for stack_method=’predict_proba’), The first column predicted by each estimator will be dropped in the case of a binary classification problem. Indeed, both feature will be perfectly collinear.

References

1

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.svm import LinearSVC
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.ensemble import StackingClassifier
>>> X, y = load_iris(return_X_y=True)
>>> estimators = [
...     ('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
...     ('svr', make_pipeline(StandardScaler(),
...                           LinearSVC(random_state=42)))
... ]
>>> clf = StackingClassifier(
...     estimators=estimators, final_estimator=LogisticRegression()
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, stratify=y, random_state=42
... )
>>> clf.fit(X_train, y_train).score(X_test, y_test)
0.9...

Full API documentation: StackingClassifierScikitsLearnNode

class mdp.nodes.StackingRegressorScikitsLearnNode

Stack of estimators with a final regressor. This node has been automatically generated by wrapping the sklearn.ensemble._stacking.StackingRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.

New in version 0.22.

Read more in the User Guide.

Parameters

estimatorslist of (str, estimator)

Base estimators which will be stacked together. Each element of the list is defined as a tuple of string (i.e. name) and an estimator instance. An estimator can be set to ‘drop’ using set_params.

final_estimatorestimator, default=None

A regressor which will be used to combine the base estimators. The default regressor is a RidgeCV.

cvint, cross-validation generator or an iterable, default=None

Determines the cross-validation splitting strategy used in cross_val_predict to train final_estimator. Possible inputs for cv are:

  • None, to use the default 5-fold cross validation,

  • integer, to specify the number of folds in a (Stratified) KFold,

  • An object to be used as a cross-validation generator,

  • An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Note

  • A larger number of split will provide no benefits if the number

  • of training samples is large enough. Indeed, the training time

  • will increase. cv is not used for model evaluation but for

  • prediction.

n_jobsint, default=None

The number of jobs to run in parallel for fit of all estimators. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

passthroughbool, default=False

When False, only the predictions of estimators will be used as training data for final_estimator. When True, the final_estimator is trained on the predictions as well as the original training data.

verboseint, default=0

Verbosity level.

Attributes

estimators_list of estimator

The elements of the estimators parameter, having been fitted on the training data. If an estimator has been set to ‘drop’, it will not appear in estimators_.

named_estimators_Bunch

Attribute to access any fitted sub-estimators by name.

final_estimator_estimator

The regressor to stacked the base estimators fitted.

References

1

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.svm import LinearSVR
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import StackingRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> estimators = [
...     ('lr', RidgeCV()),
...     ('svr', LinearSVR(random_state=42))
... ]
>>> reg = StackingRegressor(
...     estimators=estimators,
...     final_estimator=RandomForestRegressor(n_estimators=10,
...                                           random_state=42)
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=42
... )
>>> reg.fit(X_train, y_train).score(X_test, y_test)
0.3...

Full API documentation: StackingRegressorScikitsLearnNode

class mdp.nodes.DictVectorizerScikitsLearnNode

Transforms lists of feature-value mappings to vectors. This node has been automatically generated by wrapping the sklearn.feature_extraction._dict_vectorizer.DictVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.

However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int, the DictVectorizer can be followed by sklearn.preprocessing.OneHotEncoder to complete binary one-hot encoding.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Read more in the User Guide.

Parameters

dtypedtype, default=np.float64

The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument.

separatorstr, default=”=”

Separator string used when constructing new features for one-hot coding.

sparsebool, default=True

Whether transform should produce scipy.sparse matrices.

sortbool, default=True

Whether feature_names_ and vocabulary_ should be sorted when fitting.

Attributes

vocabulary_dict

A dictionary mapping feature names to feature indices.

feature_names_list

A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).

Examples

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
>>> X = v.fit_transform(D)
>>> X
array([[2., 0., 1.],
       [0., 1., 3.]])
>>> v.inverse_transform(X) ==         [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]
True
>>> v.transform({'foo': 4, 'unseen_feature': 3})
array([[0., 0., 4.]])

See also

FeatureHasher : performs vectorization using only a hash function. sklearn.preprocessing.OrdinalEncoder : handles nominal/categorical

features encoded as columns of arbitrary data types.

Full API documentation: DictVectorizerScikitsLearnNode

class mdp.nodes.FeatureHasherScikitsLearnNode

Implements feature hashing, aka the hashing trick. This node has been automatically generated by wrapping the sklearn.feature_extraction._hash.FeatureHasher class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This class turns sequences of symbolic feature names (strings) into scipy.sparse matrices, using a hash function to compute the matrix column corresponding to a name. The hash function employed is the signed 32-bit version of Murmurhash3.

Feature names of type byte string are used as-is. Unicode strings are converted to UTF-8 first, but no Unicode normalization is done. Feature values must be (finite) numbers.

This class is a low-memory alternative to DictVectorizer and CountVectorizer, intended for large-scale (online) learning and situations where memory is tight, e.g. when running prediction code on embedded devices.

Read more in the User Guide.

New in version 0.13.

Parameters

n_featuresint, default=2**20

The number of features (columns) in the output matrices. Small numbers of features are likely to cause hash collisions, but large numbers will cause larger coefficient dimensions in linear learners.

input_type{“dict”, “pair”}, default=”dict”

Either “dict” (the default) to accept dictionaries over (feature_name, value); “pair” to accept pairs of (feature_name, value); or “string” to accept single strings. feature_name should be a string, while value should be a number. In the case of “string”, a value of 1 is implied. The feature_name is hashed to find the appropriate column for the feature. The value’s sign might be flipped in the output (but see non_negative, below).

dtypenumpy dtype, default=np.float64

The type of feature values. Passed to scipy.sparse matrix constructors as the dtype argument. Do not set this to bool, np.boolean or any unsigned integer type.

alternate_signbool, default=True

When True, an alternating sign is added to the features as to approximately conserve the inner product in the hashed space even for small n_features. This approach is similar to sparse random projection.

Changed in version 0.19: alternate_sign replaces the now deprecated non_negative parameter.

Examples

>>> from sklearn.feature_extraction import FeatureHasher
>>> h = FeatureHasher(n_features=10)
>>> D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}]
>>> f = h.transform(D)
>>> f.toarray()
array([[ 0.,  0., -4., -1.,  0.,  0.,  0.,  0.,  0.,  2.],
       [ 0.,  0.,  0., -2., -5.,  0.,  0.,  0.,  0.,  0.]])

See also

DictVectorizer : vectorizes string-valued features using a hash table. sklearn.preprocessing.OneHotEncoder : handles nominal/categorical features.

Full API documentation: FeatureHasherScikitsLearnNode

class mdp.nodes.PatchExtractorScikitsLearnNode

Extracts patches from a collection of images This node has been automatically generated by wrapping the sklearn.feature_extraction.image.PatchExtractor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

New in version 0.9.

Parameters

patch_sizetuple of int (patch_height, patch_width)

The dimensions of one patch.

max_patchesint or float, default=None

The maximum number of patches per image to extract. If max_patches is a float in (0, 1), it is taken to mean a proportion of the total number of patches.

random_stateint, RandomState instance, default=None

Determines the random number generator used for random sampling when max_patches is not None. Use an int to make the randomness deterministic. See Glossary.

Examples

>>> from sklearn.datasets import load_sample_images
>>> from sklearn.feature_extraction import image
>>> # Use the array data from the second image in this dataset:
>>> X = load_sample_images().images[1]
>>> print('Image shape: {}'.format(X.shape))
Image shape: (427, 640, 3)
>>> pe = image.PatchExtractor(patch_size=(2, 2))
>>> pe_fit = pe.fit(X)
>>> pe_trans = pe.transform(X)
>>> print('Patches shape: {}'.format(pe_trans.shape))
Patches shape: (545706, 2, 2)

Full API documentation: PatchExtractorScikitsLearnNode

class mdp.nodes.HashingVectorizerScikitsLearnNode

Convert a collection of text documents to a matrix of token occurrences This node has been automatically generated by wrapping the sklearn.feature_extraction.text.HashingVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. It turns a collection of text documents into a scipy.sparse matrix holding token occurrence counts (or binary occurrence information), possibly normalized as token frequencies if norm=’l1’ or projected on the euclidean unit sphere if norm=’l2’.

This text vectorizer implementation uses the hashing trick to find the token string name to feature integer index mapping.

This strategy has several advantages:

  • it is very low memory scalable to large datasets as there is no need to store a vocabulary dictionary in memory

  • it is fast to pickle and un-pickle as it holds no state besides the constructor parameters

  • it can be used in a streaming (partial fit) or parallel pipeline as there is no state computed during fit.

There are also a couple of cons (vs using a CountVectorizer with an in-memory vocabulary):

  • there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model.

  • there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if n_features is large enough (e.g. 2 ** 18 for text classification problems).

  • no IDF weighting as this would render the transformer stateful.

The hash function employed is the signed 32-bit version of Murmurhash3.

Read more in the User Guide.

Parameters

inputstring {‘filename’, ‘file’, ‘content’}, default=’content’

If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.

If ‘file’, the sequence items must have a ‘read’ method (file-like object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that can be of type string or byte.

encodingstring, default=’utf-8’

If bytes or files are given to analyze, this encoding is used to decode.

decode_error{‘strict’, ‘ignore’, ‘replace’}, default=’strict’

Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.

strip_accents{‘ascii’, ‘unicode’}, default=None

Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.

Both ‘ascii’ and ‘unicode’ use NFKD normalization from unicodedata.normalize().

lowercasebool, default=True

Convert all characters to lowercase before tokenizing.

preprocessorcallable, default=None

Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps. Only applies if analyzer is not callable.

tokenizercallable, default=None

Override the string tokenization step while preserving the preprocessing and n-grams generation steps. Only applies if analyzer == 'word'.

stop_wordsstring {‘english’}, list, default=None

If ‘english’, a built-in stop word list for English is used. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if analyzer == 'word'.

token_patternstring

Regular expression denoting what constitutes a “token”, only used if analyzer == 'word'. The default regexp selects tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).

ngram_rangetuple (min_n, max_n), default=(1, 1)

The lower and upper boundary of the range of n-values for different n-grams to be extracted. All values of n such that min_n <= n <= max_n will be used. For example an ngram_range of (1, 1) means only unigrams, (1, 2) means unigrams and bigrams, and (2, 2) means only bigrams. Only applies if analyzer is not callable.

analyzerstring, {‘word’, ‘char’, ‘char_wb’} or callable, default=’word’

Whether the feature should be made of word or character n-grams. Option ‘char_wb’ creates character n-grams only from text inside word boundaries; n-grams at the edges of words are padded with space.

If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.

Changed in version 0.21.

Since v0.21, if input is filename or file, the data is first read from the file and then passed to the given callable analyzer.

n_featuresint, default=(2 ** 20)

The number of features (columns) in the output matrices. Small numbers of features are likely to cause hash collisions, but large numbers will cause larger coefficient dimensions in linear learners.

binarybool, default=False.

If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.

norm{‘l1’, ‘l2’}, default=’l2’

Norm used to normalize term vectors. None for no normalization.

alternate_signbool, default=True

When True, an alternating sign is added to the features as to approximately conserve the inner product in the hashed space even for small n_features. This approach is similar to sparse random projection.

New in version 0.19.

dtypetype, default=np.float64

Type of the matrix returned by fit_transform() or transform().

Examples

>>> from sklearn.feature_extraction.text import HashingVectorizer
>>> corpus = [
...     'This is the first document.',
...     'This document is the second document.',
...     'And this is the third one.',
...     'Is this the first document?',
... ]
>>> vectorizer = HashingVectorizer(n_features=2**4)
>>> X = vectorizer.fit_transform(corpus)
>>> print(X.shape)
(4, 16)

See Also

CountVectorizer, TfidfVectorizer

Full API documentation: HashingVectorizerScikitsLearnNode

class mdp.nodes.CountVectorizerScikitsLearnNode

Convert a collection of text documents to a matrix of token counts This node has been automatically generated by wrapping the sklearn.feature_extraction.text.CountVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This implementation produces a sparse representation of the counts using scipy.sparse.csr_matrix.

If you do not provide an a-priori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analyzing the data.

Read more in the User Guide.

Parameters

inputstring {‘filename’, ‘file’, ‘content’}, default=’content’

If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.

If ‘file’, the sequence items must have a ‘read’ method (file-like object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that can be of type string or byte.

encodingstring, default=’utf-8’

If bytes or files are given to analyze, this encoding is used to decode.

decode_error{‘strict’, ‘ignore’, ‘replace’}, default=’strict’

Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.

strip_accents{‘ascii’, ‘unicode’}, default=None

Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.

Both ‘ascii’ and ‘unicode’ use NFKD normalization from unicodedata.normalize().

lowercasebool, default=True

Convert all characters to lowercase before tokenizing.

preprocessorcallable, default=None

Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps. Only applies if analyzer is not callable.

tokenizercallable, default=None

Override the string tokenization step while preserving the preprocessing and n-grams generation steps. Only applies if analyzer == 'word'.

stop_wordsstring {‘english’}, list, default=None

If ‘english’, a built-in stop word list for English is used. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if analyzer == 'word'.

If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

token_patternstring

Regular expression denoting what constitutes a “token”, only used if analyzer == 'word'. The default regexp select tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).

ngram_rangetuple (min_n, max_n), default=(1, 1)

The lower and upper boundary of the range of n-values for different word n-grams or char n-grams to be extracted. All values of n such such that min_n <= n <= max_n will be used. For example an ngram_range of (1, 1) means only unigrams, (1, 2) means unigrams and bigrams, and (2, 2) means only bigrams. Only applies if analyzer is not callable.

analyzerstring, {‘word’, ‘char’, ‘char_wb’} or callable, default=’word’

Whether the feature should be made of word n-gram or character n-grams. Option ‘char_wb’ creates character n-grams only from text inside word boundaries; n-grams at the edges of words are padded with space.

If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.

Changed in version 0.21.

Since v0.21, if input is filename or file, the data is first read from the file and then passed to the given callable analyzer.

max_dffloat in range [0.0, 1.0] or int, default=1.0

When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpus-specific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.

min_dffloat in range [0.0, 1.0] or int, default=1

When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.

max_featuresint, default=None

If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.

This parameter is ignored if vocabulary is not None.

vocabularyMapping or iterable, default=None

Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents. Indices in the mapping should not be repeated and should not have any gap between 0 and the largest index.

binarybool, default=False

If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.

dtypetype, default=np.int64

Type of the matrix returned by fit_transform() or transform().

Attributes

vocabulary_dict

A mapping of terms to feature indices.

fixed_vocabulary_: boolean

True if a fixed vocabulary of term to indices mapping is provided by the user

stop_words_set

Terms that were ignored because they either:

  • occurred in too many documents (max_df)

  • occurred in too few documents (min_df)

  • were cut off by feature selection (max_features).

This is only available if no vocabulary was given.

Examples

>>> from sklearn.feature_extraction.text import CountVectorizer
>>> corpus = [
...     'This is the first document.',
...     'This document is the second document.',
...     'And this is the third one.',
...     'Is this the first document?',
... ]
>>> vectorizer = CountVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> print(vectorizer.get_feature_names())
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
>>> print(X.toarray())
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]
>>> vectorizer2 = CountVectorizer(analyzer='word', ngram_range=(2, 2))
>>> X2 = vectorizer2.fit_transform(corpus)
>>> print(vectorizer2.get_feature_names())
['and this', 'document is', 'first document', 'is the', 'is this',
'second document', 'the first', 'the second', 'the third', 'third one',
 'this document', 'this is', 'this the']
 >>> print(X2.toarray())
 [[0 0 1 1 0 0 1 0 0 0 0 1 0]
 [0 1 0 1 0 1 0 1 0 0 1 0 0]
 [1 0 0 1 0 0 0 0 1 1 0 1 0]
 [0 0 1 0 1 0 1 0 0 0 0 0 1]]

See Also

HashingVectorizer, TfidfVectorizer

Notes

The stop_words_ attribute can get large and increase the model size when pickling. This attribute is provided only for introspection and can be safely removed using delattr or set to None before pickling.

Full API documentation: CountVectorizerScikitsLearnNode

class mdp.nodes.TfidfTransformerScikitsLearnNode

Transform a count matrix to a normalized tf or tf-idf representation This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfTransformer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Tf means term-frequency while tf-idf means term-frequency times inverse document-frequency. This is a common term weighting scheme in information retrieval, that has also found good use in document classification.

The goal of using tf-idf instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.

The formula that is used to compute the tf-idf for a term t of a document d in a document set is tf-idf(t, d) = tf(t, d) * idf(t), and the idf is computed as idf(t) = log [ n / df(t) ] + 1 (if smooth_idf=False), where n is the total number of documents in the document set and df(t) is the document frequency of t; the document frequency is the number of documents in the document set that contain the term t. The effect of adding “1” to the idf in the equation above is that terms with zero idf, i.e., terms that occur in all documents in a training set, will not be entirely ignored. (Note that the idf formula above differs from the standard textbook notation that defines the idf as idf(t) = log [ n / (df(t) + 1) ]).

If smooth_idf=True (the default), the constant “1” is added to the numerator and denominator of the idf as if an extra document was seen containing every term in the collection exactly once, which prevents zero divisions: idf(t) = log [ (1 + n) / (1 + df(t)) ] + 1.

Furthermore, the formulas used to compute tf and idf depend on parameter settings that correspond to the SMART notation used in IR as follows:

Tf is “n” (natural) by default, “l” (logarithmic) when sublinear_tf=True. Idf is “t” when use_idf is given, “n” (none) otherwise. Normalization is “c” (cosine) when norm='l2', “n” (none) when norm=None.

Read more in the User Guide.

Parameters

norm{‘l1’, ‘l2’}, default=’l2’

Each output row will have unit norm, either:

    • ‘l2’: Sum of squares of vector elements is 1. The cosine

  • similarity between two vectors is their dot product when l2 norm has

  • been applied.

    • ‘l1’: Sum of absolute values of vector elements is 1.

  • See preprocessing.normalize()

use_idfbool, default=True

Enable inverse-document-frequency reweighting.

smooth_idfbool, default=True

Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.

sublinear_tfbool, default=False

Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).

Attributes

idf_array of shape (n_features)

The inverse document frequency (IDF) vector; only defined if use_idf is True.

New in version 0.20.

Examples

>>> from sklearn.feature_extraction.text import TfidfTransformer
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> from sklearn.pipeline import Pipeline
>>> import numpy as np
>>> corpus = ['this is the first document',
...           'this document is the second document',
...           'and this is the third one',
...           'is this the first document']
>>> vocabulary = ['this', 'document', 'first', 'is', 'second', 'the',
...               'and', 'one']
>>> pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)),
...                  ('tfid', TfidfTransformer())]).fit(corpus)
>>> pipe['count'].transform(corpus).toarray()
array([[1, 1, 1, 1, 0, 1, 0, 0],
       [1, 2, 0, 1, 1, 1, 0, 0],
       [1, 0, 0, 1, 0, 1, 1, 1],
       [1, 1, 1, 1, 0, 1, 0, 0]])
>>> pipe['tfid'].idf_
array([1.        , 1.22314355, 1.51082562, 1.        , 1.91629073,
       1.        , 1.91629073, 1.91629073])
>>> pipe.transform(corpus).shape
(4, 8)

References

Yates2011

R. Baeza-Yates and B. Ribeiro-Neto (2011). Modern Information Retrieval. Addison Wesley, pp. 68-74.

MRS2008

C.D. Manning, P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 118-120.

Full API documentation: TfidfTransformerScikitsLearnNode

class mdp.nodes.TfidfVectorizerScikitsLearnNode

Convert a collection of raw documents to a matrix of TF-IDF features. This node has been automatically generated by wrapping the sklearn.feature_extraction.text.TfidfVectorizer class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Equivalent to CountVectorizer followed by TfidfTransformer.

Read more in the User Guide.

Parameters

input{‘filename’, ‘file’, ‘content’}, default=’content’

If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.

If ‘file’, the sequence items must have a ‘read’ method (file-like object) that is called to fetch the bytes in memory.

Otherwise the input is expected to be a sequence of items that can be of type string or byte.

encodingstr, default=’utf-8’

If bytes or files are given to analyze, this encoding is used to decode.

decode_error{‘strict’, ‘ignore’, ‘replace’}, default=’strict’

Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.

strip_accents{‘ascii’, ‘unicode’}, default=None

Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.

Both ‘ascii’ and ‘unicode’ use NFKD normalization from unicodedata.normalize().

lowercasebool, default=True

Convert all characters to lowercase before tokenizing.

preprocessorcallable, default=None

Override the preprocessing (string transformation) stage while preserving the tokenizing and n-grams generation steps. Only applies if analyzer is not callable.

tokenizercallable, default=None

Override the string tokenization step while preserving the preprocessing and n-grams generation steps. Only applies if analyzer == 'word'.

analyzer{‘word’, ‘char’, ‘char_wb’} or callable, default=’word’

Whether the feature should be made of word or character n-grams. Option ‘char_wb’ creates character n-grams only from text inside word boundaries; n-grams at the edges of words are padded with space.

If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.

Changed in version 0.21.

Since v0.21, if input is filename or file, the data is first read from the file and then passed to the given callable analyzer.

stop_words{‘english’}, list, default=None

If a string, it is passed to _check_stop_list and the appropriate stop list is returned. ‘english’ is currently the only supported string value. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).

If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if analyzer == 'word'.

If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.

token_patternstr

Regular expression denoting what constitutes a “token”, only used if analyzer == 'word'. The default regexp selects tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).

ngram_rangetuple (min_n, max_n), default=(1, 1)

The lower and upper boundary of the range of n-values for different n-grams to be extracted. All values of n such that min_n <= n <= max_n will be used. For example an ngram_range of (1, 1) means only unigrams, (1, 2) means unigrams and bigrams, and (2, 2) means only bigrams. Only applies if analyzer is not callable.

max_dffloat or int, default=1.0

When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpus-specific stop words). If float in range [0.0, 1.0], the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.

min_dffloat or int, default=1

When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cut-off in the literature. If float in range of [0.0, 1.0], the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.

max_featuresint, default=None

If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.

This parameter is ignored if vocabulary is not None.

vocabularyMapping or iterable, default=None

Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents.

binarybool, default=False

If True, all non-zero term counts are set to 1. This does not mean outputs will have only 0/1 values, only that the tf term in tf-idf is binary. (Set idf and normalization to False to get 0/1 outputs).

dtypedtype, default=float64

Type of the matrix returned by fit_transform() or transform().

norm{‘l1’, ‘l2’}, default=’l2’

Each output row will have unit norm, either:

    • ‘l2’: Sum of squares of vector elements is 1. The cosine

  • similarity between two vectors is their dot product when l2 norm has

  • been applied.

    • ‘l1’: Sum of absolute values of vector elements is 1.

  • See preprocessing.normalize().

use_idfbool, default=True

Enable inverse-document-frequency reweighting.

smooth_idfbool, default=True

Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.

sublinear_tfbool, default=False

Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).

Attributes

vocabulary_dict

A mapping of terms to feature indices.

fixed_vocabulary_: bool

True if a fixed vocabulary of term to indices mapping is provided by the user

idf_array of shape (n_features,)

The inverse document frequency (IDF) vector; only defined if use_idf is True.

stop_words_set

Terms that were ignored because they either:

  • occurred in too many documents (max_df)

  • occurred in too few documents (min_df)

  • were cut off by feature selection (max_features).

This is only available if no vocabulary was given.

See Also

CountVectorizer : Transforms text into a sparse matrix of n-gram counts.

TfidfTransformerPerforms the TF-IDF transformation from a provided

matrix of counts.

Notes

The stop_words_ attribute can get large and increase the model size when pickling. This attribute is provided only for introspection and can be safely removed using delattr or set to None before pickling.

Examples

>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
...     'This is the first document.',
...     'This document is the second document.',
...     'And this is the third one.',
...     'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> X = vectorizer.fit_transform(corpus)
>>> print(vectorizer.get_feature_names())
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
>>> print(X.shape)
(4, 9)

Full API documentation: TfidfVectorizerScikitsLearnNode

class mdp.nodes.SelectPercentileScikitsLearnNode

Select features according to a percentile of the highest scores. This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.SelectPercentile class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See also”). The default function only works with classification tasks.

New in version 0.18.

percentileint, optional, default=10

Percent of features to keep.

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores, None if score_func returned only scores.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectPercentile, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y)
>>> X_new.shape
(1797, 7)

Notes

Ties between features with equal scores will be broken in an unspecified way.

See also

f_classif: ANOVA F-value between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chi-squared stats of non-negative features for classification tasks. f_regression: F-value between label/feature for regression tasks. mutual_info_regression: Mutual information for a continuous target. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on family-wise error rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.

Full API documentation: SelectPercentileScikitsLearnNode

class mdp.nodes.SelectKBestScikitsLearnNode

Select features according to the k highest scores. This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.SelectKBest class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See also”). The default function only works with classification tasks.

New in version 0.18.

kint or “all”, optional, default=10

Number of top features to select. The “all” option bypasses selection, for use in a parameter search.

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores, None if score_func returned only scores.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.feature_selection import SelectKBest, chi2
>>> X, y = load_digits(return_X_y=True)
>>> X.shape
(1797, 64)
>>> X_new = SelectKBest(chi2, k=20).fit_transform(X, y)
>>> X_new.shape
(1797, 20)

Notes

Ties between features with equal scores will be broken in an unspecified way.

See also

f_classif: ANOVA F-value between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chi-squared stats of non-negative features for classification tasks. f_regression: F-value between label/feature for regression tasks. mutual_info_regression: Mutual information for a continuous target. SelectPercentile: Select features based on percentile of the highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on family-wise error rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.

Full API documentation: SelectKBestScikitsLearnNode

class mdp.nodes.SelectFprScikitsLearnNode

Filter: Select the pvalues below alpha based on a FPR test. This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.SelectFpr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. FPR test stands for False Positive Rate test. It controls the total amount of false detections.

Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below “See also”). The default function only works with classification tasks.

alphafloat, optional

The highest p-value for features to be kept.

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFpr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFpr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)

See also

f_classif: ANOVA F-value between label/feature for classification tasks. chi2: Chi-squared stats of non-negative features for classification tasks. mutual_info_classif:

f_regression: F-value between label/feature for regression tasks. mutual_info_regression: Mutual information between features and the target. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on family-wise error rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.

Full API documentation: SelectFprScikitsLearnNode

class mdp.nodes.SelectFdrScikitsLearnNode

Filter: Select the p-values for an estimated false discovery rate This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.SelectFdr class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This uses the Benjamini-Hochberg procedure. alpha is an upper bound on the expected false discovery rate.

Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below “See also”). The default function only works with classification tasks.

alphafloat, optional

The highest uncorrected p-value for features to keep.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFdr, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFdr(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 16)

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores.

References

https://en.wikipedia.org/wiki/False_discovery_rate

See also

f_classif: ANOVA F-value between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chi-squared stats of non-negative features for classification tasks. f_regression: F-value between label/feature for regression tasks. mutual_info_regression: Mutual information for a contnuous target. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFwe: Select features based on family-wise error rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.

Full API documentation: SelectFdrScikitsLearnNode

class mdp.nodes.SelectFweScikitsLearnNode

Filter: Select the p-values corresponding to Family-wise error rate This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.SelectFwe class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below “See also”). The default function only works with classification tasks.

alphafloat, optional

The highest uncorrected p-value for features to keep.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import SelectFwe, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> X_new = SelectFwe(chi2, alpha=0.01).fit_transform(X, y)
>>> X_new.shape
(569, 15)

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores.

See also

f_classif: ANOVA F-value between label/feature for classification tasks. chi2: Chi-squared stats of non-negative features for classification tasks. f_regression: F-value between label/feature for regression tasks. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.

Full API documentation: SelectFweScikitsLearnNode

class mdp.nodes.GenericUnivariateSelectScikitsLearnNode

Univariate feature selector with configurable strategy. This node has been automatically generated by wrapping the sklearn.feature_selection._univariate_selection.GenericUnivariateSelect class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

score_funccallable

Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes ‘percentile’ or ‘kbest’ it can return a single array scores.

mode{‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}

Feature selection mode.

paramfloat or int depending on the feature selection mode

Parameter of the corresponding mode.

Attributes

scores_array-like of shape (n_features,)

Scores of features.

pvalues_array-like of shape (n_features,)

p-values of feature scores, None if score_func returned scores only.

Examples

>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.feature_selection import GenericUnivariateSelect, chi2
>>> X, y = load_breast_cancer(return_X_y=True)
>>> X.shape
(569, 30)
>>> transformer = GenericUnivariateSelect(chi2, mode='k_best', param=20)
>>> X_new = transformer.fit_transform(X, y)
>>> X_new.shape
(569, 20)

See also

f_classif: ANOVA F-value between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chi-squared stats of non-negative features for classification tasks. f_regression: F-value between label/feature for regression tasks. mutual_info_regression: Mutual information for a continuous target. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on family-wise error rate.

Full API documentation: GenericUnivariateSelectScikitsLearnNode

class mdp.nodes.VarianceThresholdScikitsLearnNode

Feature selector that removes all low-variance features. This node has been automatically generated by wrapping the sklearn.feature_selection._variance_threshold.VarianceThreshold class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.

Read more in the User Guide.

Parameters

thresholdfloat, optional

Features with a training-set variance lower than this threshold will be removed. The default is to keep all features with non-zero variance, i.e. remove the features that have the same value in all samples.

Attributes

variances_array, shape (n_features,)

Variances of individual features.

Notes

Allows NaN in the input.

Examples

The following dataset has integer features, two of which are the same in every sample. These are removed with the default setting for threshold:

>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]]
>>> selector = VarianceThreshold()
>>> selector.fit_transform(X)
array([[2, 0],
       [1, 4],
       [1, 1]])

Full API documentation: VarianceThresholdScikitsLearnNode

class mdp.nodes.RFEScikitsLearnNode

Feature ranking with recursive feature elimination. This node has been automatically generated by wrapping the sklearn.feature_selection._rfe.RFE class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Read more in the User Guide.

Parameters

estimatorobject

A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.

n_features_to_selectint or None (default=None)

The number of features to select. If None, half of the features are selected.

stepint or float, optional (default=1)

If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration.

verboseint, (default=0)

Controls verbosity of output.

Attributes

n_features_int

The number of selected features.

support_array of shape [n_features]

The mask of selected features.

ranking_array of shape [n_features]

The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

estimator_object

The external estimator fit on the reduced dataset.

Examples

The following example shows how to retrieve the 5 most informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFE
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFE(estimator, n_features_to_select=5, step=1)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

See also

RFECVRecursive feature elimination with built-in cross-validated

selection of the best number of features

References

1

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFEScikitsLearnNode

class mdp.nodes.RFECVScikitsLearnNode

Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. This node has been automatically generated by wrapping the sklearn.feature_selection._rfe.RFECV class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. See glossary entry for cross-validation estimator.

Read more in the User Guide.

Parameters

estimatorobject

A supervised learning estimator with a fit method that provides information about feature importance either through a coef_ attribute or through a feature_importances_ attribute.

stepint or float, optional (default=1)

If greater than or equal to 1, then step corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then step corresponds to the percentage (rounded down) of features to remove at each iteration. Note that the last iteration may remove fewer than step features in order to reach min_features_to_select.

min_features_to_selectint, (default=1)

The minimum number of features to be selected. This number of features will always be scored, even if the difference between the original feature count and min_features_to_select isn’t divisible by step.

New in version 0.20.

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • integer, to specify the number of folds.

  • CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For integer/None inputs, if y is binary or multiclass, sklearn.model_selection.StratifiedKFold is used. If the estimator is a classifier or if y is neither binary nor multiclass, sklearn.model_selection.KFold is used.

Refer User Guide for the various cross-validation strategies that can be used here.

Changed in version 0.22: cv default value of None changed from 3-fold to 5-fold.

scoringstring, callable or None, optional, (default=None)

A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

verboseint, (default=0)

Controls verbosity of output.

n_jobsint or None, optional (default=None)

Number of cores to run in parallel while fitting across folds. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

New in version 0.18.

Attributes

n_features_int

The number of selected features with cross-validation.

support_array of shape [n_features]

The mask of selected features.

ranking_array of shape [n_features]

The feature ranking, such that ranking_[i] corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.

grid_scores_array of shape [n_subsets_of_features]

The cross-validation scores such that grid_scores_[i] corresponds to the CV score of the i-th subset of features.

estimator_object

The external estimator fit on the reduced dataset.

Notes

The size of grid_scores_ is equal to ceil((n_features - min_features_to_select) / step) + 1, where step is the number of features removed at each iteration.

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.

>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True,  True,  True,  True,  True, False, False, False, False,
       False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

See also

RFE : Recursive feature elimination

References

1

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(1-3), 389–422, 2002.

Full API documentation: RFECVScikitsLearnNode

class mdp.nodes.SelectFromModelScikitsLearnNode

Meta-transformer for selecting features based on importance weights. This node has been automatically generated by wrapping the sklearn.feature_selection._from_model.SelectFromModel class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. .. versionadded:: 0.17

Parameters

estimatorobject

The base estimator from which the transformer is built. This can be both a fitted (if prefit is set to True) or a non-fitted estimator. The estimator must have either a feature_importances_ or coef_ attribute after fitting.

thresholdstring, float, optional default None

The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.

prefitbool, default False

Whether a prefit model is expected to be passed into the constructor directly or not. If True, transform must be called directly and SelectFromModel cannot be used with cross_val_score, GridSearchCV and similar utilities that clone the estimator. Otherwise train the model using fit and then transform to do feature selection.

norm_ordernon-zero int, inf, -inf, default 1

Order of the norm used to filter the vectors of coefficients below threshold in the case where the coef_ attribute of the estimator is of dimension 2.

max_featuresint or None, optional

The maximum number of features to select. To only select based on max_features, set threshold=-np.inf.

New in version 0.20.

Attributes

estimator_an estimator

The base estimator from which the transformer is built. This is stored only when a non-fitted estimator is passed to the SelectFromModel, i.e when prefit is False.

threshold_float

The threshold value used for feature selection.

Notes

Allows NaN/Inf in the input if the underlying estimator does as well.

Examples

>>> from sklearn.feature_selection import SelectFromModel
>>> from sklearn.linear_model import LogisticRegression
>>> X = [[ 0.87, -1.34,  0.31 ],
...      [-2.79, -0.02, -0.85 ],
...      [-1.34, -0.48, -2.55 ],
...      [ 1.92,  1.48,  0.65 ]]
>>> y = [0, 1, 0, 1]
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y)
>>> selector.estimator_.coef_
array([[-0.3252302 ,  0.83462377,  0.49750423]])
>>> selector.threshold_
0.55245...
>>> selector.get_support()
array([False,  True, False])
>>> selector.transform(X)
array([[-1.34],
       [-0.02],
       [-0.48],
       [ 1.48]])

Full API documentation: SelectFromModelScikitsLearnNode

class mdp.nodes.OneVsRestClassifierScikitsLearnNode

One-vs-the-rest (OvR) multiclass/multilabel strategy This node has been automatically generated by wrapping the sklearn.multiclass.OneVsRestClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Also known as one-vs-all, this strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy for multiclass classification and is a fair default choice.

This strategy can also be used for multilabel learning, where a classifier is used to predict multiple labels for instance, by fitting on a 2-d matrix in which cell [i, j] is 1 if sample i has label j and 0 otherwise.

In the multilabel learning literature, OvR is also known as the binary relevance method.

Read more in the User Guide.

Parameters

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

n_jobsint or None, optional (default=None)

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Changed in version v0.20: n_jobs default changed from 1 to None

Attributes

estimators_list of n_classes estimators

Estimators used for predictions.

classes_array, shape = [n_classes]

Class labels.

n_classes_int

Number of classes.

label_binarizer_LabelBinarizer object

Object used to transform multiclass labels to binary labels and vice-versa.

multilabel_boolean

Whether a OneVsRestClassifier is a multilabel classifier.

Examples

>>> import numpy as np
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.svm import SVC
>>> X = np.array([
...     [10, 10],
...     [8, 10],
...     [-5, 5.5],
...     [-5.4, 5.5],
...     [-20, -20],
...     [-15, -20]
... ])
>>> y = np.array([0, 0, 1, 1, 2, 2])
>>> clf = OneVsRestClassifier(SVC()).fit(X, y)
>>> clf.predict([[-19, -20], [9, 9], [-5, 5]])
array([2, 0, 1])

Full API documentation: OneVsRestClassifierScikitsLearnNode

class mdp.nodes.OneVsOneClassifierScikitsLearnNode

One-vs-one multiclass strategy This node has been automatically generated by wrapping the sklearn.multiclass.OneVsOneClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This strategy consists in fitting one classifier per class pair. At prediction time, the class which received the most votes is selected. Since it requires to fit n_classes * (n_classes - 1) / 2 classifiers, this method is usually slower than one-vs-the-rest, due to its O(n_classes^2) complexity. However, this method may be advantageous for algorithms such as kernel algorithms which don’t scale well with n_samples. This is because each individual learning problem only involves a small subset of the data whereas, with one-vs-the-rest, the complete dataset is used n_classes times.

Read more in the User Guide.

Parameters

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

n_jobsint or None, optional (default=None)

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

estimators_list of n_classes * (n_classes - 1) / 2 estimators

Estimators used for predictions.

classes_numpy array of shape [n_classes]

Array containing labels.

n_classes_int

Number of classes

pairwise_indices_list, length = len(estimators_), or None

Indices of samples used when training the estimators. None when estimator does not have _pairwise attribute.

Full API documentation: OneVsOneClassifierScikitsLearnNode

class mdp.nodes.OutputCodeClassifierScikitsLearnNode

(Error-Correcting) Output-Code multiclass strategy This node has been automatically generated by wrapping the sklearn.multiclass.OutputCodeClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Output-code based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.

Read more in the User Guide.

Parameters

estimatorestimator object

An estimator object implementing fit and one of decision_function or predict_proba.

code_sizefloat

Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. A number greater than 1 will require more classifiers than one-vs-the-rest.

random_stateint, RandomState instance or None, optional, default: None

The generator used to initialize the codebook. Pass an int for reproducible output across multiple function calls. See Glossary.

n_jobsint or None, optional (default=None)

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

estimators_list of int(n_classes * code_size) estimators

Estimators used for predictions.

classes_numpy array of shape [n_classes]

Array containing labels.

code_book_numpy array of shape [n_classes, code_size]

Binary array containing the code of each class.

Examples

>>> from sklearn.multiclass import OutputCodeClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = OutputCodeClassifier(
...     estimator=RandomForestClassifier(random_state=0),
...     random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])

References

1

“Solving multiclass learning problems via error-correcting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995.

2

“The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998.

3

“The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008.

Full API documentation: OutputCodeClassifierScikitsLearnNode

class mdp.nodes.GaussianProcessRegressorScikitsLearnNode

Gaussian process regression (GPR). This node has been automatically generated by wrapping the sklearn.gaussian_process._gpr.GaussianProcessRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.

In addition to standard scikit-learn estimator API, GaussianProcessRegressor:

  • allows prediction without prior fitting (based on the GP prior)

  • provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs

  • exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.

Read more in the User Guide.

New in version 0.18.

Parameters

kernelkernel instance, default=None

The kernel specifying the covariance function of the GP. If None is passed, the kernel “1.0 * RBF(1.0)” is used as default. Note that the kernel’s hyperparameters are optimized during fitting.

alphafloat or array-like of shape (n_samples), default=1e-10

Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations. This can also prevent a potential numerical issue during fitting, by ensuring that the calculated values form a positive definite matrix. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

optimizer“fmin_l_bfgs_b” or callable, default=”fmin_l_bfgs_b”

Can either be one of the internally supported optimizers for optimizing the kernel’s parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:

def optimizer(obj_func, initial_theta, bounds):

    - # * 'obj_func' is the objective function to be minimized, which
    - #   takes the hyperparameters theta as parameter and an
    - #   optional flag eval_gradient, which determines if the
    - #   gradient is returned additionally to the function value
    - # * 'initial_theta': the initial value for theta, which can be
    - #   used by local optimizers
    - # * 'bounds': the bounds on the values of theta
    - ....
    - # Returned are the best found hyperparameters theta and
    - # the corresponding value of the target function.
    - return theta_opt, func_min

Per default, the ‘L-BGFS-B’ algorithm from scipy.optimize.minimize is used. If None is passed, the kernel’s parameters are kept fixed. Available internal optimizers are:

'fmin_l_bfgs_b'
n_restarts_optimizerint, default=0

The number of restarts of the optimizer for finding the kernel’s parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel’s initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.

normalize_yboolean, optional (default: False)

Whether the target values y are normalized, the mean and variance of the target values are set equal to 0 and 1 respectively. This is recommended for cases where zero-mean, unit-variance priors are used. Note that, in this implementation, the normalisation is reversed before the GP predictions are reported.

Changed in version 0.23.

copy_X_trainbool, default=True

If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

random_stateint or RandomState, default=None

Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. See :term: Glossary <random_state>.

Attributes

X_train_array-like of shape (n_samples, n_features) or list of object

Feature vectors or other representations of training data (also required for prediction).

y_train_array-like of shape (n_samples,) or (n_samples, n_targets)

Target values in training data (also required for prediction)

kernel_kernel instance

The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters

L_array-like of shape (n_samples, n_samples)

Lower-triangular Cholesky decomposition of the kernel in X_train_

alpha_array-like of shape (n_samples,)

Dual coefficients of training data points in kernel space

log_marginal_likelihood_value_float

The log-marginal-likelihood of self.kernel_.theta

Examples

>>> from sklearn.datasets import make_friedman2
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
>>> X, y = make_friedman2(n_samples=500, noise=0, random_state=0)
>>> kernel = DotProduct() + WhiteKernel()
>>> gpr = GaussianProcessRegressor(kernel=kernel,
...         random_state=0).fit(X, y)
>>> gpr.score(X, y)
0.3680...
>>> gpr.predict(X[:2,:], return_std=True)
(array([653.0..., 592.1...]), array([316.6..., 316.6...]))

Full API documentation: GaussianProcessRegressorScikitsLearnNode

class mdp.nodes.GaussianProcessClassifierScikitsLearnNode

Gaussian process classification (GPC) based on Laplace approximation. This node has been automatically generated by wrapping the sklearn.gaussian_process._gpc.GaussianProcessClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The implementation is based on Algorithm 3.1, 3.2, and 5.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.

Internally, the Laplace approximation is used for approximating the non-Gaussian posterior by a Gaussian.

Currently, the implementation is restricted to using the logistic link function. For multi-class classification, several binary one-versus rest classifiers are fitted. Note that this class thus does not implement a true multi-class Laplace approximation.

Read more in the User Guide.

Parameters

kernelkernel instance, default=None

The kernel specifying the covariance function of the GP. If None is passed, the kernel “1.0 * RBF(1.0)” is used as default. Note that the kernel’s hyperparameters are optimized during fitting.

optimizer‘fmin_l_bfgs_b’ or callable, default=’fmin_l_bfgs_b’

Can either be one of the internally supported optimizers for optimizing the kernel’s parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:

def optimizer(obj_func, initial_theta, bounds):

    - # * 'obj_func' is the objective function to be maximized, which
    - #   takes the hyperparameters theta as parameter and an
    - #   optional flag eval_gradient, which determines if the
    - #   gradient is returned additionally to the function value
    - # * 'initial_theta': the initial value for theta, which can be
    - #   used by local optimizers
    - # * 'bounds': the bounds on the values of theta
    - ....
    - # Returned are the best found hyperparameters theta and
    - # the corresponding value of the target function.
    - return theta_opt, func_min

Per default, the ‘L-BFGS-B’ algorithm from scipy.optimize.minimize is used. If None is passed, the kernel’s parameters are kept fixed. Available internal optimizers are:

'fmin_l_bfgs_b'
n_restarts_optimizerint, default=0

The number of restarts of the optimizer for finding the kernel’s parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel’s initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer=0 implies that one run is performed.

max_iter_predictint, default=100

The maximum number of iterations in Newton’s method for approximating the posterior during predict. Smaller values will reduce computation time at the cost of worse results.

warm_startbool, default=False

If warm-starts are enabled, the solution of the last Newton iteration on the Laplace approximation of the posterior mode is used as initialization for the next call of _posterior_mode(). This can speed up convergence when _posterior_mode is called several times on similar problems as in hyperparameter optimization. See the Glossary.

copy_X_trainbool, default=True

If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

random_stateint or RandomState, default=None

Determines random number generation used to initialize the centers. Pass an int for reproducible results across multiple function calls. See :term: Glossary <random_state>.

multi_class{‘one_vs_rest’, ‘one_vs_one’}, default=’one_vs_rest’

Specifies how multi-class classification problems are handled. Supported are ‘one_vs_rest’ and ‘one_vs_one’. In ‘one_vs_rest’, one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In ‘one_vs_one’, one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multi-class predictions. Note that ‘one_vs_one’ does not support predicting probability estimates.

n_jobsint, default=None

The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

kernel_kernel instance

The kernel used for prediction. In case of binary classification, the structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters. In case of multi-class classification, a CompoundKernel is returned which consists of the different kernels used in the one-versus-rest classifiers.

log_marginal_likelihood_value_float

The log-marginal-likelihood of self.kernel_.theta

classes_array-like of shape (n_classes,)

Unique class labels.

n_classes_int

The number of classes in the training data

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.gaussian_process import GaussianProcessClassifier
>>> from sklearn.gaussian_process.kernels import RBF
>>> X, y = load_iris(return_X_y=True)
>>> kernel = 1.0 * RBF(1.0)
>>> gpc = GaussianProcessClassifier(kernel=kernel,
...         random_state=0).fit(X, y)
>>> gpc.score(X, y)
0.9866...
>>> gpc.predict_proba(X[:2,:])
array([[0.83548752, 0.03228706, 0.13222543],
       [0.79064206, 0.06525643, 0.14410151]])

New in version 0.18.

Full API documentation: GaussianProcessClassifierScikitsLearnNode

class mdp.nodes.RBFSamplerScikitsLearnNode

Approximates feature map of an RBF kernel by Monte Carlo approximation of its Fourier transform. This node has been automatically generated by wrapping the sklearn.kernel_approximation.RBFSampler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. It implements a variant of Random Kitchen Sinks.[1]

Read more in the User Guide.

Parameters

gammafloat

Parameter of RBF kernel: exp(-gamma * x^2)

n_componentsint

Number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space.

random_stateint, RandomState instance or None, optional (default=None)

Pseudo-random number generator to control the generation of the random weights and random offset when fitting the training data. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

random_offset_ndarray of shape (n_components,), dtype=float64

Random offset used to compute the projection in the n_components dimensions of the feature space.

random_weights_ndarray of shape (n_features, n_components), dtype=float64

Random projection directions drawn from the Fourier transform of the RBF kernel.

Examples

>>> from sklearn.kernel_approximation import RBFSampler
>>> from sklearn.linear_model import SGDClassifier
>>> X = [[0, 0], [1, 1], [1, 0], [0, 1]]
>>> y = [0, 0, 1, 1]
>>> rbf_feature = RBFSampler(gamma=1, random_state=1)
>>> X_features = rbf_feature.fit_transform(X)
>>> clf = SGDClassifier(max_iter=5, tol=1e-3)
>>> clf.fit(X_features, y)
SGDClassifier(max_iter=5)
>>> clf.score(X_features, y)
1.0

Notes

See “Random Features for Large-Scale Kernel Machines” by A. Rahimi and Benjamin Recht.

[1] “Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning” by A. Rahimi and Benjamin Recht. (https://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf)

Full API documentation: RBFSamplerScikitsLearnNode

class mdp.nodes.SkewedChi2SamplerScikitsLearnNode

Approximates feature map of the “skewed chi-squared” kernel by Monte Carlo approximation of its Fourier transform. This node has been automatically generated by wrapping the sklearn.kernel_approximation.SkewedChi2Sampler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

skewednessfloat

“skewedness” parameter of the kernel. Needs to be cross-validated.

n_componentsint

number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space.

random_stateint, RandomState instance or None, optional (default=None)

Pseudo-random number generator to control the generation of the random weights and random offset when fitting the training data. Pass an int for reproducible output across multiple function calls. See Glossary.

Examples

>>> from sklearn.kernel_approximation import SkewedChi2Sampler
>>> from sklearn.linear_model import SGDClassifier
>>> X = [[0, 0], [1, 1], [1, 0], [0, 1]]
>>> y = [0, 0, 1, 1]
>>> chi2_feature = SkewedChi2Sampler(skewedness=.01,
...                                  n_components=10,
...                                  random_state=0)
>>> X_features = chi2_feature.fit_transform(X, y)
>>> clf = SGDClassifier(max_iter=10, tol=1e-3)
>>> clf.fit(X_features, y)
SGDClassifier(max_iter=10)
>>> clf.score(X_features, y)
1.0

References

See “Random Fourier Approximations for Skewed Multiplicative Histogram Kernels” by Fuxin Li, Catalin Ionescu and Cristian Sminchisescu.

See also

AdditiveChi2SamplerA different approach for approximating an additive

variant of the chi squared kernel.

sklearn.metrics.pairwise.chi2_kernel : The exact chi squared kernel.

Full API documentation: SkewedChi2SamplerScikitsLearnNode

class mdp.nodes.AdditiveChi2SamplerScikitsLearnNode

Approximate feature map for additive chi2 kernel. This node has been automatically generated by wrapping the sklearn.kernel_approximation.AdditiveChi2Sampler class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Uses sampling the fourier transform of the kernel characteristic at regular intervals.

Since the kernel that is to be approximated is additive, the components of the input vectors can be treated separately. Each entry in the original space is transformed into 2*sample_steps+1 features, where sample_steps is a parameter of the method. Typical values of sample_steps include 1, 2 and 3.

Optimal choices for the sampling interval for certain data ranges can be computed (see the reference). The default values should be reasonable.

Read more in the User Guide.

Parameters

sample_stepsint, optional

Gives the number of (complex) sampling points.

sample_intervalfloat, optional

Sampling interval. Must be specified when sample_steps not in {1,2,3}.

Attributes

sample_interval_float

Stored sampling interval. Specified as a parameter if sample_steps not in {1,2,3}.

Examples

>>> from sklearn.datasets import load_digits
>>> from sklearn.linear_model import SGDClassifier
>>> from sklearn.kernel_approximation import AdditiveChi2Sampler
>>> X, y = load_digits(return_X_y=True)
>>> chi2sampler = AdditiveChi2Sampler(sample_steps=2)
>>> X_transformed = chi2sampler.fit_transform(X, y)
>>> clf = SGDClassifier(max_iter=5, random_state=0, tol=1e-3)
>>> clf.fit(X_transformed, y)
SGDClassifier(max_iter=5, random_state=0)
>>> clf.score(X_transformed, y)
0.9499...

Notes

This estimator approximates a slightly different version of the additive chi squared kernel then metric.additive_chi2 computes.

See also

SkewedChi2SamplerA Fourier-approximation to a non-additive variant of

the chi squared kernel.

sklearn.metrics.pairwise.chi2_kernel : The exact chi squared kernel.

sklearn.metrics.pairwise.additive_chi2_kernelThe exact additive chi

squared kernel.

References

See “Efficient additive kernels via explicit feature maps” A. Vedaldi and A. Zisserman, Pattern Analysis and Machine Intelligence, 2011

Full API documentation: AdditiveChi2SamplerScikitsLearnNode

class mdp.nodes.NystroemScikitsLearnNode

Approximate a kernel map using a subset of the training data. This node has been automatically generated by wrapping the sklearn.kernel_approximation.Nystroem class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Constructs an approximate feature map for an arbitrary kernel using a subset of the data as basis.

Read more in the User Guide.

New in version 0.13.

Parameters

kernelstring or callable, default=”rbf”

Kernel map to be approximated. A callable should accept two arguments and the keyword arguments passed to this object as kernel_params, and should return a floating point number.

gammafloat, default=None

Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.

coef0float, default=None

Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.

degreefloat, default=None

Degree of the polynomial kernel. Ignored by other kernels.

kernel_paramsmapping of string to any, optional

Additional parameters (keyword arguments) for kernel function passed as callable object.

n_componentsint

Number of features to construct. How many data points will be used to construct the mapping.

random_stateint, RandomState instance or None, optional (default=None)

Pseudo-random number generator to control the uniform sampling without replacement of n_components of the training data to construct the basis kernel. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

components_array, shape (n_components, n_features)

Subset of training points used to construct the feature map.

component_indices_array, shape (n_components)

Indices of components_ in the training set.

normalization_array, shape (n_components, n_components)

Normalization matrix needed for embedding. Square root of the kernel matrix on components_.

Examples

>>> from sklearn import datasets, svm
>>> from sklearn.kernel_approximation import Nystroem
>>> X, y = datasets.load_digits(n_class=9, return_X_y=True)
>>> data = X / 16.
>>> clf = svm.LinearSVC()
>>> feature_map_nystroem = Nystroem(gamma=.2,
...                                 random_state=1,
...                                 n_components=300)
>>> data_transformed = feature_map_nystroem.fit_transform(data)
>>> clf.fit(data_transformed, y)
LinearSVC()
>>> clf.score(data_transformed, y)
0.9987...

References

  • Williams, C.K.I. and Seeger, M. “Using the Nystroem method to speed up kernel machines”, Advances in neural information processing systems 2001

  • T. Yang, Y. Li, M. Mahdavi, R. Jin and Z. Zhou “Nystroem Method vs Random Fourier Features: A Theoretical and Empirical Comparison”, Advances in Neural Information Processing Systems 2012

See also

RBFSamplerAn approximation to the RBF kernel using random Fourier

features.

sklearn.metrics.pairwise.kernel_metrics : List of built-in kernels.

Full API documentation: NystroemScikitsLearnNode

class mdp.nodes.KernelRidgeScikitsLearnNode

Kernel ridge regression. This node has been automatically generated by wrapping the sklearn.kernel_ridge.KernelRidge class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Kernel ridge regression (KRR) combines ridge regression (linear least squares with l2-norm regularization) with the kernel trick. It thus learns a linear function in the space induced by the respective kernel and the data. For non-linear kernels, this corresponds to a non-linear function in the original space.

The form of the model learned by KRR is identical to support vector regression (SVR). However, different loss functions are used: KRR uses squared error loss while support vector regression uses epsilon-insensitive loss, both combined with l2 regularization. In contrast to SVR, fitting a KRR model can be done in closed-form and is typically faster for medium-sized datasets. On the other hand, the learned model is non-sparse and thus slower than SVR, which learns a sparse model for epsilon > 0, at prediction-time.

This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_targets]).

Read more in the User Guide.

Parameters

alphafloat or array-like of shape (n_targets,)

Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to 1 / (2C) in other linear models such as LogisticRegression or sklearn.svm.LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number. See ridge_regression for formula.

kernelstring or callable, default=”linear”

Kernel mapping used internally. This parameter is directly passed to sklearn.metrics.pairwise.pairwise_kernel. If kernel is a string, it must be one of the metrics in pairwise.PAIRWISE_KERNEL_FUNCTIONS. If kernel is “precomputed”, X is assumed to be a kernel matrix. Alternatively, if kernel is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from X as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.

gammafloat, default=None

Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.

degreefloat, default=3

Degree of the polynomial kernel. Ignored by other kernels.

coef0float, default=1

Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.

kernel_paramsmapping of string to any, optional

Additional parameters (keyword arguments) for kernel function passed as callable object.

Attributes

dual_coef_ndarray of shape (n_samples,) or (n_samples, n_targets)

Representation of weight vector(s) in kernel space

X_fit_{ndarray, sparse matrix} of shape (n_samples, n_features)

Training data, which is also required for prediction. If kernel == “precomputed” this is instead the precomputed training matrix, of shape (n_samples, n_samples).

References

  • Kevin P. Murphy “Machine Learning: A Probabilistic Perspective”, The MIT Press chapter 14.4.3, pp. 492-493

See also

sklearn.linear_model.Ridge:

  • Linear ridge regression.

sklearn.svm.SVR:

  • Support Vector Regression implemented using libsvm.

Examples

>>> from sklearn.kernel_ridge import KernelRidge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> rng = np.random.RandomState(0)
>>> y = rng.randn(n_samples)
>>> X = rng.randn(n_samples, n_features)
>>> clf = KernelRidge(alpha=1.0)
>>> clf.fit(X, y)
KernelRidge(alpha=1.0)

Full API documentation: KernelRidgeScikitsLearnNode

class mdp.nodes.GaussianMixtureScikitsLearnNode

Gaussian Mixture. This node has been automatically generated by wrapping the sklearn.mixture._gaussian_mixture.GaussianMixture class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution.

Read more in the User Guide.

New in version 0.18.

Parameters

n_componentsint, defaults to 1.

The number of mixture components.

covariance_type{‘full’ (default), ‘tied’, ‘diag’, ‘spherical’}

String describing the type of covariance parameters to use. Must be one of:

‘full’

each component has its own general covariance matrix

‘tied’

all components share the same general covariance matrix

‘diag’

each component has its own diagonal covariance matrix

‘spherical’

each component has its own single variance

tolfloat, defaults to 1e-3.

The convergence threshold. EM iterations will stop when the lower bound average gain is below this threshold.

reg_covarfloat, defaults to 1e-6.

Non-negative regularization added to the diagonal of covariance. Allows to assure that the covariance matrices are all positive.

max_iterint, defaults to 100.

The number of EM iterations to perform.

n_initint, defaults to 1.

The number of initializations to perform. The best results are kept.

init_params{‘kmeans’, ‘random’}, defaults to ‘kmeans’.

The method used to initialize the weights, the means and the precisions. Must be one of:

'kmeans' : responsibilities are initialized using kmeans.
'random' : responsibilities are initialized randomly.
weights_initarray-like, shape (n_components, ), optional

The user-provided initial weights, defaults to None. If it None, weights are initialized using the init_params method.

means_initarray-like, shape (n_components, n_features), optional

The user-provided initial means, defaults to None, If it None, means are initialized using the init_params method.

precisions_initarray-like, optional.

The user-provided initial precisions (inverse of the covariance matrices), defaults to None. If it None, precisions are initialized using the ‘init_params’ method. The shape depends on ‘covariance_type’:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
random_stateint, RandomState instance or None, optional (default=None)

Controls the random seed given to the method chosen to initialize the parameters (see init_params). In addition, it controls the generation of random samples from the fitted distribution (see the method sample). Pass an int for reproducible output across multiple function calls. See Glossary.

warm_startbool, default to False.

If ‘warm_start’ is True, the solution of the last fitting is used as initialization for the next call of fit(). This can speed up convergence when fit is called several times on similar problems. In that case, ‘n_init’ is ignored and only a single initialization occurs upon the first call. See the Glossary.

verboseint, default to 0.

Enable verbose output. If 1 then it prints the current initialization and each iteration step. If greater than 1 then it prints also the log probability and the time needed for each step.

verbose_intervalint, default to 10.

Number of iteration done before the next print.

Attributes

weights_array-like, shape (n_components,)

The weights of each mixture components.

means_array-like, shape (n_components, n_features)

The mean of each mixture component.

covariances_array-like

The covariance of each mixture component. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
precisions_array-like

The precision matrices for each component in the mixture. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
precisions_cholesky_array-like

The cholesky decomposition of the precision matrices of each mixture component. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
converged_bool

True when convergence was reached in fit(), False otherwise.

n_iter_int

Number of step used by the best fit of EM to reach the convergence.

lower_bound_float

Lower bound value on the log-likelihood (of the training data with respect to the model) of the best fit of EM.

See Also

BayesianGaussianMixtureGaussian mixture model fit with a variational

inference.

Full API documentation: GaussianMixtureScikitsLearnNode

class mdp.nodes.BayesianGaussianMixtureScikitsLearnNode

Variational Bayesian estimation of a Gaussian mixture. This node has been automatically generated by wrapping the sklearn.mixture._bayesian_mixture.BayesianGaussianMixture class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This class allows to infer an approximate posterior distribution over the parameters of a Gaussian mixture distribution. The effective number of components can be inferred from the data.

This class implements two types of prior for the weights distribution: a finite mixture model with Dirichlet distribution and an infinite mixture model with the Dirichlet Process. In practice Dirichlet Process inference algorithm is approximated and uses a truncated distribution with a fixed maximum number of components (called the Stick-breaking representation). The number of components actually used almost always depends on the data.

New in version 0.18.

Read more in the User Guide.

Parameters

n_componentsint, defaults to 1.

The number of mixture components. Depending on the data and the value of the weight_concentration_prior the model can decide to not use all the components by setting some component weights_ to values very close to zero. The number of effective components is therefore smaller than n_components.

covariance_type{‘full’, ‘tied’, ‘diag’, ‘spherical’}, defaults to ‘full’

String describing the type of covariance parameters to use. Must be one of:

'full' (each component has its own general covariance matrix),
'tied' (all components share the same general covariance matrix),
'diag' (each component has its own diagonal covariance matrix),
'spherical' (each component has its own single variance).
tolfloat, defaults to 1e-3.

The convergence threshold. EM iterations will stop when the lower bound average gain on the likelihood (of the training data with respect to the model) is below this threshold.

reg_covarfloat, defaults to 1e-6.

Non-negative regularization added to the diagonal of covariance. Allows to assure that the covariance matrices are all positive.

max_iterint, defaults to 100.

The number of EM iterations to perform.

n_initint, defaults to 1.

The number of initializations to perform. The result with the highest lower bound value on the likelihood is kept.

init_params{‘kmeans’, ‘random’}, defaults to ‘kmeans’.

The method used to initialize the weights, the means and the covariances. Must be one of:

'kmeans' : responsibilities are initialized using kmeans.
'random' : responsibilities are initialized randomly.
weight_concentration_prior_typestr, defaults to ‘dirichlet_process’.

String describing the type of the weight concentration prior. Must be one of:

'dirichlet_process' (using the Stick-breaking representation),
'dirichlet_distribution' (can favor more uniform weights).
weight_concentration_priorfloat | None, optional.

The dirichlet concentration of each component on the weight distribution (Dirichlet). This is commonly called gamma in the literature. The higher concentration puts more mass in the center and will lead to more components being active, while a lower concentration parameter will lead to more mass at the edge of the mixture weights simplex. The value of the parameter must be greater than 0. If it is None, it’s set to 1. / n_components.

mean_precision_priorfloat | None, optional.

The precision prior on the mean distribution (Gaussian). Controls the extent of where means can be placed. Larger values concentrate the cluster means around mean_prior. The value of the parameter must be greater than 0. If it is None, it is set to 1.

mean_priorarray-like, shape (n_features,), optional

The prior on the mean distribution (Gaussian). If it is None, it is set to the mean of X.

degrees_of_freedom_priorfloat | None, optional.

The prior of the number of degrees of freedom on the covariance distributions (Wishart). If it is None, it’s set to n_features.

covariance_priorfloat or array-like, optional

The prior on the covariance distribution (Wishart). If it is None, the emiprical covariance prior is initialized using the covariance of X. The shape depends on covariance_type:

(n_features, n_features) if 'full',
(n_features, n_features) if 'tied',
(n_features)             if 'diag',
float                    if 'spherical'
random_stateint, RandomState instance or None, optional (default=None)

Controls the random seed given to the method chosen to initialize the parameters (see init_params). In addition, it controls the generation of random samples from the fitted distribution (see the method sample). Pass an int for reproducible output across multiple function calls. See Glossary.

warm_startbool, default to False.

If ‘warm_start’ is True, the solution of the last fitting is used as initialization for the next call of fit(). This can speed up convergence when fit is called several times on similar problems. See the Glossary.

verboseint, default to 0.

Enable verbose output. If 1 then it prints the current initialization and each iteration step. If greater than 1 then it prints also the log probability and the time needed for each step.

verbose_intervalint, default to 10.

Number of iteration done before the next print.

Attributes

weights_array-like, shape (n_components,)

The weights of each mixture components.

means_array-like, shape (n_components, n_features)

The mean of each mixture component.

covariances_array-like

The covariance of each mixture component. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
precisions_array-like

The precision matrices for each component in the mixture. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
precisions_cholesky_array-like

The cholesky decomposition of the precision matrices of each mixture component. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'
converged_bool

True when convergence was reached in fit(), False otherwise.

n_iter_int

Number of step used by the best fit of inference to reach the convergence.

lower_bound_float

Lower bound value on the likelihood (of the training data with respect to the model) of the best fit of inference.

weight_concentration_prior_tuple or float

The dirichlet concentration of each component on the weight distribution (Dirichlet). The type depends on weight_concentration_prior_type:

(float, float) if 'dirichlet_process' (Beta parameters),
float          if 'dirichlet_distribution' (Dirichlet parameters).

The higher concentration puts more mass in the center and will lead to more components being active, while a lower concentration parameter will lead to more mass at the edge of the simplex.

weight_concentration_array-like, shape (n_components,)

The dirichlet concentration of each component on the weight distribution (Dirichlet).

mean_precision_prior_float

The precision prior on the mean distribution (Gaussian). Controls the extent of where means can be placed. Larger values concentrate the cluster means around mean_prior. If mean_precision_prior is set to None, mean_precision_prior_ is set to 1.

mean_precision_array-like, shape (n_components,)

The precision of each components on the mean distribution (Gaussian).

mean_prior_array-like, shape (n_features,)

The prior on the mean distribution (Gaussian).

degrees_of_freedom_prior_float

The prior of the number of degrees of freedom on the covariance distributions (Wishart).

degrees_of_freedom_array-like, shape (n_components,)

The number of degrees of freedom of each components in the model.

covariance_prior_float or array-like

The prior on the covariance distribution (Wishart). The shape depends on covariance_type:

(n_features, n_features) if 'full',
(n_features, n_features) if 'tied',
(n_features)             if 'diag',
float                    if 'spherical'

See Also

GaussianMixture : Finite Gaussian mixture fit with EM.

References

1

Bishop, Christopher M. (2006). “Pattern recognition and machine learning”. Vol. 4 No. 4. New York: Springer.

2

Hagai Attias. (2000). “A Variational Bayesian Framework for Graphical Models”. In Advances in Neural Information Processing Systems 12.

3

Blei, David M. and Michael I. Jordan. (2006). “Variational inference for Dirichlet process mixtures”. Bayesian analysis 1.1

Full API documentation: BayesianGaussianMixtureScikitsLearnNode

class mdp.nodes.GaussianNBScikitsLearnNode

Gaussian Naive Bayes (GaussianNB) This node has been automatically generated by wrapping the sklearn.naive_bayes.GaussianNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Can perform online updates to model parameters via partial_fit(). For details on algorithm used to update feature means and variance online, see Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:

Read more in the User Guide.

Parameters

priorsarray-like of shape (n_classes,)

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

var_smoothingfloat, default=1e-9

Portion of the largest variance of all features that is added to variances for calculation stability.

New in version 0.20.

Attributes

class_count_ndarray of shape (n_classes,)

number of training samples observed in each class.

class_prior_ndarray of shape (n_classes,)

probability of each class.

classes_ndarray of shape (n_classes,)

class labels known to the classifier

epsilon_float

absolute additive value to variances

sigma_ndarray of shape (n_classes, n_features)

variance of each feature per class

theta_ndarray of shape (n_classes, n_features)

mean of each feature per class

Examples

>>> import numpy as np
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> Y = np.array([1, 1, 1, 2, 2, 2])
>>> from sklearn.naive_bayes import GaussianNB
>>> clf = GaussianNB()
>>> clf.fit(X, Y)
GaussianNB()
>>> print(clf.predict([[-0.8, -1]]))
[1]
>>> clf_pf = GaussianNB()
>>> clf_pf.partial_fit(X, Y, np.unique(Y))
GaussianNB()
>>> print(clf_pf.predict([[-0.8, -1]]))
[1]

Full API documentation: GaussianNBScikitsLearnNode

class mdp.nodes.MultinomialNBScikitsLearnNode

Naive Bayes classifier for multinomial models This node has been automatically generated by wrapping the sklearn.naive_bayes.MultinomialNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorbool, default=True

Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

Attributes

class_count_ndarray of shape (n_classes,)

Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

class_log_prior_ndarray of shape (n_classes, )

Smoothed empirical log probability for each class.

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

coef_ndarray of shape (n_classes, n_features)

Mirrors feature_log_prob_ for interpreting MultinomialNB as a linear model.

feature_count_ndarray of shape (n_classes, n_features)

Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

feature_log_prob_ndarray of shape (n_classes, n_features)

Empirical log probability of features given a class, P(x_i|y).

intercept_ndarray of shape (n_classes, )

Mirrors class_log_prior_ for interpreting MultinomialNB as a linear model.

n_features_int

Number of features of each sample.

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]

Notes

For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.

References

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Full API documentation: MultinomialNBScikitsLearnNode

class mdp.nodes.ComplementNBScikitsLearnNode

The Complement Naive Bayes classifier described in Rennie et al. (2003). This node has been automatically generated by wrapping the sklearn.naive_bayes.ComplementNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

Read more in the User Guide.

New in version 0.20.

Parameters

alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorbool, default=True

Only used in edge case with a single class in the training set.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. Not used.

normbool, default=False

Whether or not a second normalization of the weights is performed. The default behavior mirrors the implementations found in Mahout and Weka, which do not follow the full algorithm described in Table 9 of the paper.

Attributes

class_count_ndarray of shape (n_classes,)

Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

class_log_prior_ndarray of shape (n_classes,)

Smoothed empirical log probability for each class. Only used in edge case with a single class in the training set.

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_all_ndarray of shape (n_features,)

Number of samples encountered for each feature during fitting. This value is weighted by the sample weight when provided.

feature_count_ndarray of shape (n_classes, n_features)

Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

feature_log_prob_ndarray of shape (n_classes, n_features)

Empirical weights for class complements.

n_features_int

Number of features of each sample.

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]

References

Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Full API documentation: ComplementNBScikitsLearnNode

class mdp.nodes.BernoulliNBScikitsLearnNode

Naive Bayes classifier for multivariate Bernoulli models. This node has been automatically generated by wrapping the sklearn.naive_bayes.BernoulliNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

binarizefloat or None, default=0.0

Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.

fit_priorbool, default=True

Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

Attributes

class_count_ndarray of shape (n_classes)

Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

class_log_prior_ndarray of shape (n_classes)

Log probability of each class (smoothed).

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_count_ndarray of shape (n_classes, n_features)

Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.

feature_log_prob_ndarray of shape (n_classes, n_features)

Empirical log probability of features given a class, P(x_i|y).

n_features_int

Number of features of each sample.

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]

References

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

Full API documentation: BernoulliNBScikitsLearnNode

class mdp.nodes.CategoricalNBScikitsLearnNode

Naive Bayes classifier for categorical features This node has been automatically generated by wrapping the sklearn.naive_bayes.CategoricalNB class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The categorical Naive Bayes classifier is suitable for classification with discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution.

Read more in the User Guide.

Parameters

alphafloat, default=1.0

Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).

fit_priorbool, default=True

Whether to learn class prior probabilities or not. If false, a uniform prior will be used.

class_priorarray-like of shape (n_classes,), default=None

Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

Attributes

category_count_list of arrays of shape (n_features,)

Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the number of samples encountered for each class and category of the specific feature.

class_count_ndarray of shape (n_classes,)

Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.

class_log_prior_ndarray of shape (n_classes,)

Smoothed empirical log probability for each class.

classes_ndarray of shape (n_classes,)

Class labels known to the classifier

feature_log_prob_list of arrays of shape (n_features,)

Holds arrays of shape (n_classes, n_categories of respective feature) for each feature. Each array provides the empirical log probability of categories given the respective feature and class, P(x_i|y).

n_features_int

Number of features of each sample.

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import CategoricalNB
>>> clf = CategoricalNB()
>>> clf.fit(X, y)
CategoricalNB()
>>> print(clf.predict(X[2:3]))
[3]

Full API documentation: CategoricalNBScikitsLearnNode

class mdp.nodes.BernoulliRBMScikitsLearnNode

Bernoulli Restricted Boltzmann Machine (RBM). This node has been automatically generated by wrapping the sklearn.neural_network._rbm.BernoulliRBM class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. A Restricted Boltzmann Machine with binary visible units and binary hidden units. Parameters are estimated using Stochastic Maximum Likelihood (SML), also known as Persistent Contrastive Divergence (PCD) [2].

The time complexity of this implementation is O(d ** 2) assuming d ~ n_features ~ n_components.

Read more in the User Guide.

Parameters

n_componentsint, default=256

Number of binary hidden units.

learning_ratefloat, default=0.1

The learning rate for weight updates. It is highly recommended to tune this hyper-parameter. Reasonable values are in the 10**[0., -3.] range.

batch_sizeint, default=10

Number of examples per minibatch.

n_iterint, default=10

Number of iterations/sweeps over the training dataset to perform during training.

verboseint, default=0

The verbosity level. The default, zero, means silent mode.

random_stateinteger or RandomState, default=None

Determines random number generation for:

  • Gibbs sampling from visible and hidden layers.

  • Initializing components, sampling from layers during fit.

  • Corrupting the data when scoring samples.

Pass an int for reproducible results across multiple function calls. See Glossary.

Attributes

intercept_hidden_array-like, shape (n_components,)

Biases of the hidden units.

intercept_visible_array-like, shape (n_features,)

Biases of the visible units.

components_array-like, shape (n_components, n_features)

Weight matrix, where n_features in the number of visible units and n_components is the number of hidden units.

h_samples_array-like, shape (batch_size, n_components)

Hidden Activation sampled from the model distribution, where batch_size in the number of examples per minibatch and n_components is the number of hidden units.

Examples

>>> import numpy as np
>>> from sklearn.neural_network import BernoulliRBM
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> model = BernoulliRBM(n_components=2)
>>> model.fit(X)
BernoulliRBM(n_components=2)

References

[1] Hinton, G. E., Osindero, S. and Teh, Y. A fast learning algorithm for

deep belief nets. Neural Computation 18, pp 1527-1554. https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf

[2] Tieleman, T. Training Restricted Boltzmann Machines using

Approximations to the Likelihood Gradient. International Conference on Machine Learning (ICML) 2008

Full API documentation: BernoulliRBMScikitsLearnNode

class mdp.nodes.MLPClassifierScikitsLearnNode

Multi-layer Perceptron classifier. This node has been automatically generated by wrapping the sklearn.neural_network._multilayer_perceptron.MLPClassifier class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

New in version 0.18.

Parameters

hidden_layer_sizestuple, length = n_layers - 2, default=(100,)

The ith element represents the number of neurons in the ith hidden layer.

activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=’relu’

Activation function for the hidden layer.

  • ‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x

  • ‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).

  • ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).

  • ‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

solver{‘lbfgs’, ‘sgd’, ‘adam’}, default=’adam’

The solver for weight optimization.

  • ‘lbfgs’ is an optimizer in the family of quasi-Newton methods.

  • ‘sgd’ refers to stochastic gradient descent.

  • ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba

Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

alphafloat, default=0.0001

L2 penalty (regularization term) parameter.

batch_sizeint, default=’auto’

Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

learning_rate{‘constant’, ‘invscaling’, ‘adaptive’}, default=’constant’

Learning rate schedule for weight updates.

  • ‘constant’ is a constant learning rate given by ‘learning_rate_init’.

  • ‘invscaling’ gradually decreases the learning rate at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)

  • ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.

Only used when solver='sgd'.

learning_rate_initdouble, default=0.001

The initial learning rate used. It controls the step-size in updating the weights. Only used when solver=’sgd’ or ‘adam’.

power_tdouble, default=0.5

The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.

max_iterint, default=200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

shufflebool, default=True

Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

random_stateint, RandomState instance, default=None

Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver=’sgd’ or ‘adam’. Pass an int for reproducible results across multiple function calls. See Glossary.

tolfloat, default=1e-4

Tolerance for the optimization. When the loss or score is not improving by at least tol for n_iter_no_change consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.

verbosebool, default=False

Whether to print progress messages to stdout.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

momentumfloat, default=0.9

Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

nesterovs_momentumboolean, default=True

Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs. The split is stratified, except in a multilabel setting. Only effective when solver=’sgd’ or ‘adam’

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True

beta_1float, default=0.9

Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’

beta_2float, default=0.999

Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’

epsilonfloat, default=1e-8

Value for numerical stability in adam. Only used when solver=’adam’

n_iter_no_changeint, default=10

Maximum number of epochs to not meet tol improvement. Only effective when solver=’sgd’ or ‘adam’

New in version 0.20.

max_funint, default=15000

Only used when solver=’lbfgs’. Maximum number of loss function calls. The solver iterates until convergence (determined by ‘tol’), number of iterations reaches max_iter, or this number of loss function calls. Note that number of loss function calls will be greater than or equal to the number of iterations for the MLPClassifier.

New in version 0.22.

Attributes

classes_ndarray or list of ndarray of shape (n_classes,)

Class labels for each output.

loss_float

The current loss computed with the loss function.

coefs_list, length n_layers - 1

The ith element in the list represents the weight matrix corresponding to layer i.

intercepts_list, length n_layers - 1

The ith element in the list represents the bias vector corresponding to layer i + 1.

n_iter_int,

The number of iterations the solver has ran.

n_layers_int

Number of layers.

n_outputs_int

Number of outputs.

out_activation_string

Name of the output activation function.

Examples

>>> from sklearn.neural_network import MLPClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, random_state=1)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
...                                                     random_state=1)
>>> clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)
>>> clf.predict_proba(X_test[:1])
array([[0.038..., 0.961...]])
>>> clf.predict(X_test[:5, :])
array([1, 0, 1, 0, 1])
>>> clf.score(X_test, y_test)
0.8...

Notes

MLPClassifier trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters.

It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.

This implementation works with data represented as dense numpy arrays or sparse scipy arrays of floating point values.

References

Hinton, Geoffrey E.

“Connectionist learning procedures.” Artificial intelligence 40.1 (1989): 185-234.

Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of

training deep feedforward neural networks.” International Conference on Artificial Intelligence and Statistics. 2010.

He, Kaiming, et al. “Delving deep into rectifiers: Surpassing human-level

performance on imagenet classification.” arXiv preprint arXiv:1502.01852 (2015).

Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic

optimization.” arXiv preprint arXiv:1412.6980 (2014).

Full API documentation: MLPClassifierScikitsLearnNode

class mdp.nodes.MLPRegressorScikitsLearnNode

Multi-layer Perceptron regressor. This node has been automatically generated by wrapping the sklearn.neural_network._multilayer_perceptron.MLPRegressor class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

New in version 0.18.

Parameters

hidden_layer_sizestuple, length = n_layers - 2, default=(100,)

The ith element represents the number of neurons in the ith hidden layer.

activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default=’relu’

Activation function for the hidden layer.

  • ‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x

  • ‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).

  • ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).

  • ‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

solver{‘lbfgs’, ‘sgd’, ‘adam’}, default=’adam’

The solver for weight optimization.

  • ‘lbfgs’ is an optimizer in the family of quasi-Newton methods.

  • ‘sgd’ refers to stochastic gradient descent.

  • ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba

Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

alphafloat, default=0.0001

L2 penalty (regularization term) parameter.

batch_sizeint, default=’auto’

Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)

learning_rate{‘constant’, ‘invscaling’, ‘adaptive’}, default=’constant’

Learning rate schedule for weight updates.

  • ‘constant’ is a constant learning rate given by ‘learning_rate_init’.

  • ‘invscaling’ gradually decreases the learning rate learning_rate_ at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)

  • ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.

Only used when solver=’sgd’.

learning_rate_initdouble, default=0.001

The initial learning rate used. It controls the step-size in updating the weights. Only used when solver=’sgd’ or ‘adam’.

power_tdouble, default=0.5

The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.

max_iterint, default=200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

shufflebool, default=True

Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

random_stateint, RandomState instance, default=None

Determines random number generation for weights and bias initialization, train-test split if early stopping is used, and batch sampling when solver=’sgd’ or ‘adam’. Pass an int for reproducible results across multiple function calls. See Glossary.

tolfloat, default=1e-4

Tolerance for the optimization. When the loss or score is not improving by at least tol for n_iter_no_change consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.

verbosebool, default=False

Whether to print progress messages to stdout.

warm_startbool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

momentumfloat, default=0.9

Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

nesterovs_momentumboolean, default=True

Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.

early_stoppingbool, default=False

Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10% of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs. Only effective when solver=’sgd’ or ‘adam’

validation_fractionfloat, default=0.1

The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True

beta_1float, default=0.9

Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’

beta_2float, default=0.999

Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’

epsilonfloat, default=1e-8

Value for numerical stability in adam. Only used when solver=’adam’

n_iter_no_changeint, default=10

Maximum number of epochs to not meet tol improvement. Only effective when solver=’sgd’ or ‘adam’

New in version 0.20.

max_funint, default=15000

Only used when solver=’lbfgs’. Maximum number of function calls. The solver iterates until convergence (determined by ‘tol’), number of iterations reaches max_iter, or this number of function calls. Note that number of function calls will be greater than or equal to the number of iterations for the MLPRegressor.

New in version 0.22.

Attributes

loss_float

The current loss computed with the loss function.

coefs_list, length n_layers - 1

The ith element in the list represents the weight matrix corresponding to layer i.

intercepts_list, length n_layers - 1

The ith element in the list represents the bias vector corresponding to layer i + 1.

n_iter_int,

The number of iterations the solver has ran.

n_layers_int

Number of layers.

n_outputs_int

Number of outputs.

out_activation_string

Name of the output activation function.

Examples

>>> from sklearn.neural_network import MLPRegressor
>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_regression(n_samples=200, random_state=1)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
...                                                     random_state=1)
>>> regr = MLPRegressor(random_state=1, max_iter=500).fit(X_train, y_train)
>>> regr.predict(X_test[:2])
array([-0.9..., -7.1...])
>>> regr.score(X_test, y_test)
0.4...

Notes

MLPRegressor trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters.

It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.

This implementation works with data represented as dense and sparse numpy arrays of floating point values.

References

Hinton, Geoffrey E.

“Connectionist learning procedures.” Artificial intelligence 40.1 (1989): 185-234.

Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of

training deep feedforward neural networks.” International Conference on Artificial Intelligence and Statistics. 2010.

He, Kaiming, et al. “Delving deep into rectifiers: Surpassing human-level

performance on imagenet classification.” arXiv preprint arXiv:1502.01852 (2015).

Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic

optimization.” arXiv preprint arXiv:1412.6980 (2014).

Full API documentation: MLPRegressorScikitsLearnNode

class mdp.nodes.GaussianRandomProjectionScikitsLearnNode

Reduce dimensionality through Gaussian random projection This node has been automatically generated by wrapping the sklearn.random_projection.GaussianRandomProjection class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. The components of the random matrix are drawn from N(0, 1 / n_components).

Read more in the User Guide.

New in version 0.13.

Parameters

n_componentsint or ‘auto’, optional (default = ‘auto’)

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

epsstrictly positive float, optional (default=0.1)

Parameter to control the quality of the embedding according to the Johnson-Lindenstrauss lemma when n_components is set to ‘auto’.

Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.

random_stateint, RandomState instance or None, optional (default=None)

Controls the pseudo random number generator used to generate the projection matrix at fit time. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

n_components_int

Concrete number of components computed when n_components=”auto”.

components_numpy array of shape [n_components, n_features]

Random matrix used for the projection.

Examples

>>> import numpy as np
>>> from sklearn.random_projection import GaussianRandomProjection
>>> rng = np.random.RandomState(42)
>>> X = rng.rand(100, 10000)
>>> transformer = GaussianRandomProjection(random_state=rng)
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(100, 3947)

See Also

SparseRandomProjection

Full API documentation: GaussianRandomProjectionScikitsLearnNode

class mdp.nodes.SparseRandomProjectionScikitsLearnNode

Reduce dimensionality through sparse random projection This node has been automatically generated by wrapping the sklearn.random_projection.SparseRandomProjection class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Sparse random matrix is an alternative to dense random projection matrix that guarantees similar embedding quality while being much more memory efficient and allowing faster computation of the projected data.

If we note s = 1 / density the components of the random matrix are drawn from:

  • -sqrt(s) / sqrt(n_components) with probability 1 / 2s

  • 0 with probability 1 - 1 / s

  • +sqrt(s) / sqrt(n_components) with probability 1 / 2s

Read more in the User Guide.

New in version 0.13.

Parameters

n_componentsint or ‘auto’, optional (default = ‘auto’)

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

densityfloat in range ]0, 1], optional (default=’auto’)

Ratio of non-zero component in the random projection matrix.

If density = ‘auto’, the value is set to the minimum density as recommended by Ping Li et al.: 1 / sqrt(n_features).

Use density = 1 / 3.0 if you want to reproduce the results from Achlioptas, 2001.

epsstrictly positive float, optional, (default=0.1)

Parameter to control the quality of the embedding according to the Johnson-Lindenstrauss lemma when n_components is set to ‘auto’.

Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.

dense_outputboolean, optional (default=False)

If True, ensure that the output of the random projection is a dense numpy array even if the input and random projection matrix are both sparse. In practice, if the number of components is small the number of zero components in the projected data will be very small and it will be more CPU and memory efficient to use a dense representation.

If False, the projected data uses a sparse representation if the input is sparse.

random_stateint, RandomState instance or None, optional (default=None)

Controls the pseudo random number generator used to generate the projection matrix at fit time. Pass an int for reproducible output across multiple function calls. See Glossary.

Attributes

n_components_int

Concrete number of components computed when n_components=”auto”.

components_CSR matrix with shape [n_components, n_features]

Random matrix used for the projection.

density_float in range 0.0 - 1.0

Concrete density computed from when density = “auto”.

Examples

>>> import numpy as np
>>> from sklearn.random_projection import SparseRandomProjection
>>> rng = np.random.RandomState(42)
>>> X = rng.rand(100, 10000)
>>> transformer = SparseRandomProjection(random_state=rng)
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(100, 3947)
>>> # very few components are non-zero
>>> np.mean(transformer.components_ != 0)
0.0100...

See Also

GaussianRandomProjection

References

1

Ping Li, T. Hastie and K. W. Church, 2006, “Very Sparse Random Projections”. https://web.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf

2

D. Achlioptas, 2001, “Database-friendly random projections”, https://users.soe.ucsc.edu/~optas/papers/jl.pdf

Full API documentation: SparseRandomProjectionScikitsLearnNode

class mdp.nodes.LabelPropagationScikitsLearnNode

Label Propagation classifier This node has been automatically generated by wrapping the sklearn.semi_supervised._label_propagation.LabelPropagation class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. Read more in the User Guide.

Parameters

kernel{‘knn’, ‘rbf’} or callable, default=’rbf’

String identifier for kernel function to use or the kernel function itself. Only ‘rbf’ and ‘knn’ strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.

gammafloat, default=20

Parameter for rbf kernel.

n_neighborsint, default=7

Parameter for knn kernel which need to be strictly positive.

max_iterint, default=1000

Change maximum number of iterations allowed.

tolfloat, 1e-3

Convergence tolerance: threshold to consider the system at steady state.

n_jobsint, default=None

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

X_ndarray of shape (n_samples, n_features)

Input array.

classes_ndarray of shape (n_classes,)

The distinct labels used in classifying instances.

label_distributions_ndarray of shape (n_samples, n_classes)

Categorical distribution for each item.

transduction_ndarray of shape (n_samples)

Label assigned to each item via the transduction.

n_iter_int

Number of iterations run.

Examples

>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelPropagation
>>> label_prop_model = LabelPropagation()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelPropagation(...)

References

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

See Also

LabelSpreading : Alternate label propagation strategy more robust to noise

Full API documentation: LabelPropagationScikitsLearnNode

class mdp.nodes.LabelSpreadingScikitsLearnNode

LabelSpreading model for semi-supervised learning This node has been automatically generated by wrapping the sklearn.semi_supervised._label_propagation.LabelSpreading class from the sklearn library. The wrapped instance can be accessed through the scikits_alg attribute. This model is similar to the basic Label Propagation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.

Read more in the User Guide.

Parameters

kernel{‘knn’, ‘rbf’} or callable, default=’rbf’

String identifier for kernel function to use or the kernel function itself. Only ‘rbf’ and ‘knn’ strings are valid inputs. The function passed should take two inputs, each of shape (n_samples, n_features), and return a (n_samples, n_samples) shaped weight matrix.

gammafloat, default=20

Parameter for rbf kernel.

n_neighborsint, default=7

Parameter for knn kernel which is a strictly positive integer.

alphafloat, default=0.2

Clamping factor. A value in (0, 1) that specifies the relative amount that an instance should adopt the information from its neighbors as opposed to its initial label. alpha=0 means keeping the initial label information; alpha=1 means replacing all initial information.

max_iterint, default=30

Maximum number of iterations allowed.

tolfloat, default=1e-3

Convergence tolerance: threshold to consider the system at steady state.

n_jobsint, default=None

The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

Attributes

X_ndarray of shape (n_samples, n_features)

Input array.

classes_ndarray of shape (n_classes,)

The distinct labels used in classifying instances.

label_distributions_ndarray of shape (n_samples, n_classes)

Categorical distribution for each item.

transduction_ndarray of shape (n_samples,)

Label assigned to each item via the transduction.

n_iter_int

Number of iterations run.

Examples

>>> import numpy as np
>>> from sklearn import datasets
>>> from sklearn.semi_supervised import LabelSpreading
>>> label_prop_model = LabelSpreading()
>>> iris = datasets.load_iris()
>>> rng = np.random.RandomState(42)
>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
>>> labels = np.copy(iris.target)
>>> labels[random_unlabeled_points] = -1
>>> label_prop_model.fit(iris.data, labels)
LabelSpreading(...)

References

Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schoelkopf. Learning with local and global consistency (2004) http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219

See Also

LabelPropagation : Unregularized graph based semi-supervised learning

Full API documentation: LabelSpreadingScikitsLearnNode