sknn.mlp
— MultiLayer Perceptrons¶
In this module, a neural network is made up of multiple layers — hence the name multilayer perceptron! You need to specify these layers by instantiating one of two types of specifications:
sknn.mlp.Layer
: A standard feedforward layer that can use linear or nonlinear activations.sknn.mlp.Convolution
: An imagebased convolve operation with shared weights, linear or not.
In practice, you need to create a list of these specifications and provide them as the layers
parameter to the sknn.mlp.Regressor
or sknn.mlp.Classifier
constructors.
Layer Specifications¶

class
sknn.mlp.
Layer
(type, warning=None, name=None, units=None, weight_decay=None, dropout=None, normalize=None, frozen=False)¶ Specification for a layer to be passed to the neural network during construction. This includes a variety of parameters to configure each layer based on its activation type.
Parameters: type: str
Select which activation function this layer should use, as a string. Specifically, options are
Rectifier
,Sigmoid
,Tanh
, andExpLin
for nonlinear layers andLinear
orSoftmax
for output layers.name: str, optional
You optionally can specify a name for this layer, and its parameters will then be accessible to scikitlearn via a nested subobject. For example, if name is set to
layer1
, then the parameterlayer1__units
from the network is bound to this layer’sunits
variable.The name defaults to
hiddenN
where N is the integer index of that layer, and the final layer is alwaysoutput
without an index.units: int
The number of units (also known as neurons) in this layer. This applies to all layer types except for convolution.
weight_decay: float, optional
The coefficient for L1 or L2 regularization of the weights. For example, a value of 0.0001 is multiplied by the L1 or L2 weight decay equation.
dropout: float, optional
The ratio of inputs to drop out for this layer during training. For example, 0.25 means that 25% of the inputs will be excluded for each training sample, with the remaining inputs being renormalized accordingly.
normalize: str, optional
Enable normalization of this layer. Can be either batch for batch normalization or (soon) weights for weight normalization. Default is no normalization.
frozen: bool, optional
Specify whether to freeze a layer’s parameters so they are not adjusted during the training. This is useful when relying on pretrained neural networks.
warning: None
You should use keyword arguments after type when initializing this object. If not, the code will raise an AssertionError.
Methods
set_params

class
sknn.mlp.
Convolution
(type, warning=None, name=None, channels=None, kernel_shape=None, kernel_stride=None, border_mode=u'valid', pool_shape=None, pool_type=None, scale_factor=None, weight_decay=None, dropout=None, normalize=None, frozen=False)¶ Specification for a convolution layer to be passed to the neural network in construction. This includes a variety of convolutionspecific parameters to configure each layer, as well as activationspecific parameters.
Parameters: type: str
Select which activation function this convolution layer should use, as a string. For hidden layers, you can use the following convolution types
Rectifier
,ExpLin
,Sigmoid
,Tanh
orLinear
.name: str, optional
You optionally can specify a name for this layer, and its parameters will then be accessible to scikitlearn via a nested subobject. For example, if name is set to
layer1
, then the parameterlayer1__units
from the network is bound to this layer’sunits
variable.The name defaults to
hiddenN
where N is the integer index of that layer, and the final layer is alwaysoutput
without an index.channels: int
Number of output channels for the convolution layers. Each channel has its own set of shared weights which are trained by applying the kernel over the image.
kernel_shape: tuple of ints
A twodimensional tuple of integers corresponding to the shape of the kernel when convolution is used. For example, this could be a square kernel (3,3) or a full horizontal or vertical kernel on the input matrix, e.g. (N,1) or (1,N).
kernel_stride: tuple of ints, optional
A twodimensional tuple of integers that represents the steps taken by the kernel through the input image. By default, this is set to (1,1) and can be customized separately to pooling.
border_mode: str
String indicating the way borders in the image should be processed, one of two options:
 valid — Only pixels from input where the kernel fits within bounds are processed.
 full — All pixels from input are processed, and the boundaries are zeropadded.
 same — The output resolution is set to the exact same as the input.
The size of the output will depend on this mode, for full it’s identical to the input, but for valid (default) it will be smaller or equal.
pool_shape: tuple of ints, optional
A twodimensional tuple of integers corresponding to the pool size for downsampling. This should be square, for example (2,2) to reduce the size by half, or (4,4) to make the output a quarter of the original.
Pooling is applied after the convolution and calculation of its activation.
pool_type: str, optional
Type of the pooling to be used; can be either max or mean. If a pool_shape is specified the default is to take the maximum value of all inputs that fall into this pool. Otherwise, the default is None and no pooling is used for performance.
scale_factor: tuple of ints, optional
A twodimensional tuple of integers corresponding to upscaling ration. This should be square, for example (2,2) to increase the size by double, or (4,4) to make the output four times the original.
Upscaling is applied before the convolution and calculation of its activation.
weight_decay: float, optional
The coefficient for L1 or L2 regularization of the weights. For example, a value of 0.0001 is multiplied by the L1 or L2 weight decay equation.
dropout: float, optional
The ratio of inputs to drop out for this layer during training. For example, 0.25 means that 25% of the inputs will be excluded for each training sample, with the remaining inputs being renormalized accordingly.
normalize: str, optional
Enable normalization of this layer. Can be either batch for batch normalization or (soon) weights for weight normalization. Default is no normalization.
frozen: bool, optional
Specify whether to freeze a layer’s parameters so they are not adjusted during the training. This is useful when relying on pretrained neural networks.
warning: None
You should use keyword arguments after type when initializing this object. If not, the code will raise an AssertionError.
Methods
set_params
MultiLayerPerceptron¶
Most of the functionality provided to simulate and train multilayer perceptron is implemented in the (abstract) class sknn.mlp.MultiLayerPerceptron
. This class documents all the construction parameters for Regressor and Classifier derived classes (see below), as well as their various helper functions.

class
sknn.mlp.
MultiLayerPerceptron
(layers, warning=None, parameters=None, random_state=None, learning_rule=u'sgd', learning_rate=0.01, learning_momentum=0.9, normalize=None, regularize=None, weight_decay=None, dropout_rate=None, batch_size=1, n_iter=None, n_stable=10, f_stable=0.001, valid_set=None, valid_size=0.0, loss_type=None, callback=None, debug=False, verbose=None, **params)¶ Abstract base class for wrapping all neural network functionality from PyLearn2, common to multilayer perceptrons in
sknn.mlp
and autoencoders in insknn.ae
.Parameters: layers: list of Layer
An iterable sequence of each layer each as a
sknn.mlp.Layer
instance that contains its type, optional name, and any paramaters required. For hidden layers, you can use the following layer types:
Rectifier
,ExpLin
,Sigmoid
,Tanh
, orConvolution
.  For output layers, you can use the following layer types:
Linear
orSoftmax
.
It’s possible to mix and match any of the layer types, though most often you should probably use hidden and output types as recommended here. Typically, the last entry in this
layers
list should containLinear
for regression, orSoftmax
for classification.random_state: int, optional
Seed for the initialization of the neural network parameters (e.g. weights and biases). This is fully deterministic.
parameters: list of tuple of arraylike, optional
A list of
(weights, biases)
tuples to be reloaded for each layer, in the same order aslayers
was specified. Useful for initializing with pretrained networks.learning_rule: str, optional
Name of the learning rule used during stochastic gradient descent, one of
sgd
,momentum
,nesterov
,adadelta
,adagrad
orrmsprop
at the moment. The default is vanillasgd
.learning_rate: float, optional
Real number indicating the default/starting rate of adjustment for the weights during gradient descent. Different learning rules may take this into account differently. Default is
0.01
.learning_momentum: float, optional
Real number indicating the momentum factor to be used for the learning rule ‘momentum’. Default is
0.9
.batch_size: int, optional
Number of training samples to group together when performing stochastic gradient descent (technically, a “minibatch”). By default each sample is treated on its own, with
batch_size=1
. Larger batches are usually faster.n_iter: int, optional
The number of iterations of gradient descent to perform on the neural network’s weights when training with
fit()
.n_stable: int, optional
Number of interations after which training should return when the validation error remains (near) constant. This is usually a sign that the data has been fitted, or that optimization may have stalled. If no validation set is specified, then stability is judged based on the training error. Default is
10
.f_stable: float, optional
Threshold under which the validation error change is assumed to be stable, to be used in combination with n_stable. This is calculated as a relative ratio of improvement, so if the results are only 0.1% better training is considered stable. The training set is used as fallback if there’s no validation set. Default is ``0.001`.
valid_set: tuple of arraylike, optional
Validation set (X_v, y_v) to be used explicitly while training. Both arrays should have the same size for the first dimention, and the second dimention should match with the training data specified in
fit()
.valid_size: float, optional
Ratio of the training data to be used for validation. 0.0 means no validation, and 1.0 would mean there’s no training data! Common values are 0.1 or 0.25.
normalize: string, optional
Enable normalization for all layers. Can be either batch for batch normalization or (soon) weights for weight normalization. Default is no normalization.
regularize: string, optional
Which regularization technique to use on the weights, for example
L2
(most common) orL1
(quite rare), as well asdropout
. By default, there’s no regularization, unless another parameter implies it should be enabled, e.g. ifweight_decay
ordropout_rate
are specified.weight_decay: float, optional
The coefficient used to multiply either
L1
orL2
equations when computing the weight decay for regularization. Ifregularize
is specified, this defaults to 0.0001.dropout_rate: float, optional
What rate to use for dropout training in the inputs (jittering) and the hidden layers, for each training example. Specify this as a ratio of inputs to be randomly excluded during training, e.g. 0.75 means only 25% of inputs will be included in the training.
loss_type: string, optional
The cost function to use when training the network. There are two valid options:
mse
— Use mean squared error, for learning to predict the mean of the data.mae
— Use mean average error, for learning to predict the median of the data.mcc
— Use mean categorical crossentropy, particularly for classifiers.
The default option is
mse
for regressors andmcc
for classifiers, butmae
can only be applied to layers of typeLinear
orGaussian
and they must be used as the output layer (PyLearn2 only).callback: callable or dict, optional
An observer mechanism that exposes information about the inner training loop. This is either a single function that takes
cbs(event, **variables)
as a parameter, or a dictionary of functions indexed by on event string that conforms tocb(**variables)
.There are multiple events sent from the inner training loop:
on_train_start
— Called when the main training function is entered.on_epoch_start
— Called the first thing when a new iteration starts.on_batch_start
— Called before an individual batch is processed.on_batch_finish
— Called after that individual batch is processed.on_epoch_finish
— Called the first last when the iteration is done.on_train_finish
— Called just before the training function exits.
For each function, the
variables
dictionary passed contains all local variables within the training implementation.debug: bool, optional
Should the underlying training algorithms perform validation on the data as it’s optimizing the model? This makes things slower, but errors can be caught more effectively. Default is off.
verbose: bool, optional
How to initialize the logging to display the results during training. If there is already a logger initialized, either
sknn
or the root logger, then this function does nothing. Otherwise:False
— Setup new logger that shows only warnings and errors.True
— Setup a new logger that displays all debug messages.None
— Don’t setup a new logger under any condition (default).
Using the builtin python
logging
module, you can control the detail and style of output by customising the verbosity level and formatter forsknn
logger.warning: None
You should use keyword arguments after layers when initializing this object. If not, the code will raise an AssertionError.
Attributes
is_classifier
is_initialized
Methods
get_parameters
get_params
is_convolution
set_parameters
 For hidden layers, you can use the following layer types:
When using the multilayer perceptron, you should initialize a Regressor or a Classifier directly.
Regressor¶
See the class sknn.mlp.MultiLayerPerceptron
for inherited construction parameters.

class
sknn.mlp.
Regressor
(layers, warning=None, parameters=None, random_state=None, learning_rule=u'sgd', learning_rate=0.01, learning_momentum=0.9, normalize=None, regularize=None, weight_decay=None, dropout_rate=None, batch_size=1, n_iter=None, n_stable=10, f_stable=0.001, valid_set=None, valid_size=0.0, loss_type=None, callback=None, debug=False, verbose=None, **params)¶ Attributes
is_classifier
is_initialized
Methods
fit
get_parameters
get_params
is_convolution
predict
set_parameters

fit
(X, y, w=None)¶ Fit the neural network to the given continuous data as a regression problem.
Parameters: X : arraylike, shape (n_samples, n_inputs)
Training vectors as real numbers, where n_samples is the number of samples and n_inputs is the number of input features.
y : arraylike, shape (n_samples, n_outputs)
Target values are real numbers used as regression targets.
w : arraylike (optional), shape (n_samples)
Floating point weights for each of the training samples, used as mask to modify the cost function during optimization.
Returns: self : object
Returns this instance.

get_parameters
()¶ Extract the neural networks weights and biases layer by layer. Only valid once the neural network has been initialized, for example via fit() function.
Returns: params : list of tuples
For each layer in the order they are passed to the constructor, a namedtuple of three items weights, biases (both numpy arrays) and name (string) in that order.

is_convolution
(input=None, output=False)¶ Check whether this neural network includes convolution layers in the first or last position.
Parameters: input : boolean, optional
Whether the first layer should be checked for convolution. Default True.
output : boolean, optional
Whether the last layer should be checked for convolution. Default False.
Returns: is_conv : boolean
True if either of the specified layers are indeed convolution, False otherwise.

is_initialized
¶ Check if the neural network was setup already.

predict
(X)¶ Calculate predictions for specified inputs.
Parameters: X : arraylike, shape (n_samples, n_inputs)
The input samples as real numbers.
Returns: y : array, shape (n_samples, n_outputs)
The predicted values as real numbers.

set_parameters
(storage)¶ Store the given weighs and biases into the neural network. If the neural network has not been initialized, use the weights list as construction parameter instead. Otherwise if the neural network is initialized, this function will extract the parameters from the input list or dictionary and store them accordingly.
Parameters: storage : list of tuples, or dictionary of tuples
Either a list of tuples for each layer, storing two items weights and biases in the exact same order as construction. Alternatively, if this is a dictionary, a string to tuple mapping for each layer also storing weights and biases but not necessarily for all layers.

Classifier¶
Also check the sknn.mlp.MultiLayerPerceptron
class for inherited construction parameters.

class
sknn.mlp.
Classifier
(layers, warning=None, parameters=None, random_state=None, learning_rule=u'sgd', learning_rate=0.01, learning_momentum=0.9, normalize=None, regularize=None, weight_decay=None, dropout_rate=None, batch_size=1, n_iter=None, n_stable=10, f_stable=0.001, valid_set=None, valid_size=0.0, loss_type=None, callback=None, debug=False, verbose=None, **params)¶ Attributes
classes_
is_classifier
is_initialized
Methods
fit
get_parameters
get_params
is_convolution
partial_fit
predict
predict_proba
set_parameters

classes_
¶ Return a list of class labels used for each feature. For single feature classification, the index of the label in the array is the same as returned by predict_proba() (e.g. labels [1, 0, +1] mean indices [0, 1, 2]).
In the case of multiple feature classification, the index of the label must be offset by the number of labels for previous features. For example, if the second feature also has labels [1, 0, +1] its indicies will be [3, 4, 5] resuming from the first feature in the array returned by predict_proba().
Returns: c : list of array, shape (n_classes, n_labels)
List of the labels as integers used for each feature.

fit
(X, y, w=None)¶ Fit the neural network to symbolic labels as a classification problem.
Parameters: X : arraylike, shape (n_samples, n_features)
Training vectors as real numbers, where n_samples is the number of samples and n_inputs is the number of input features.
y : arraylike, shape (n_samples, n_classes)
Target values as integer symbols, for either single or multioutput classification problems.
w : arraylike (optional), shape (n_samples)
Floating point weights for each of the training samples, used as mask to modify the cost function during optimization.
Returns: self : object
Returns this instance.

get_parameters
()¶ Extract the neural networks weights and biases layer by layer. Only valid once the neural network has been initialized, for example via fit() function.
Returns: params : list of tuples
For each layer in the order they are passed to the constructor, a namedtuple of three items weights, biases (both numpy arrays) and name (string) in that order.

is_convolution
(input=None, output=False)¶ Check whether this neural network includes convolution layers in the first or last position.
Parameters: input : boolean, optional
Whether the first layer should be checked for convolution. Default True.
output : boolean, optional
Whether the last layer should be checked for convolution. Default False.
Returns: is_conv : boolean
True if either of the specified layers are indeed convolution, False otherwise.

is_initialized
¶ Check if the neural network was setup already.

predict
(X)¶ Predict class by converting the problem to a regression problem.
Parameters: X : arraylike of shape (n_samples, n_features)
The input data.
Returns: y : arraylike, shape (n_samples,) or (n_samples, n_classes)
The predicted classes, or the predicted values.

predict_proba
(X, collapse=True)¶ Calculate probability estimates based on these input features.
Parameters: X : arraylike of shape [n_samples, n_features]
The input data as a numpy array.
Returns: y_prob : list of arrays of shape [n_samples, n_features, n_classes]
The predicted probability of the sample for each class in the model, in the same order as the classes.

set_parameters
(storage)¶ Store the given weighs and biases into the neural network. If the neural network has not been initialized, use the weights list as construction parameter instead. Otherwise if the neural network is initialized, this function will extract the parameters from the input list or dictionary and store them accordingly.
Parameters: storage : list of tuples, or dictionary of tuples
Either a list of tuples for each layer, storing two items weights and biases in the exact same order as construction. Alternatively, if this is a dictionary, a string to tuple mapping for each layer also storing weights and biases but not necessarily for all layers.
