`lasagne.nonlinearities`¶

Non-linear activation functions for artificial neurons.

`sigmoid`(x)	Sigmoid activation function \(\varphi(x) = \frac{1}{1 + e^{-x}}\)
`softmax`(x)	Softmax activation function \(\varphi(\mathbf{x})_j = \frac{e^{\mathbf{x}_j}}{\sum_{k=1}^K e^{\mathbf{x}_k}}\) where \(K\) is the total number of neurons in the layer.
`tanh`(x)	Tanh activation function \(\varphi(x) = \tanh(x)\)
`ScaledTanH`([scale_in, scale_out])	Scaled tanh \(\varphi(x) = \tanh(\alpha \cdot x) \cdot \beta\)
`rectify`(x)	Rectify activation function \(\varphi(x) = \max(0, x)\)
`LeakyRectify`([leakiness])	Leaky rectifier \(\varphi(x) = (x > 0)? x : \alpha \cdot x\)
`leaky_rectify`(x)	Instance of `LeakyRectify` with leakiness \(\alpha=0.01\)
`very_leaky_rectify`(x)	Instance of `LeakyRectify` with leakiness \(\alpha=1/3\)
`elu`(x)	Exponential Linear Unit \(\varphi(x) = (x > 0) ? x : e^x - 1\)
`SELU`([scale, scale_neg])	Scaled Exponential Linear Unit
`selu`(x)	Instance of `SELU` with :math:`alphaapprox 1.6733,
`softplus`(x)	Softplus activation function \(\varphi(x) = \log(1 + e^x)\)
`linear`(x)	Linear activation function \(\varphi(x) = x\)
`identity`(x)	Linear activation function \(\varphi(x) = x\)

Detailed description¶

lasagne.nonlinearities.sigmoid(x)[source]¶

Sigmoid activation function \(\varphi(x) = \frac{1}{1 + e^{-x}}\)

Parameters:	x : float32 The activation (the summed, weighted input of a neuron).
Returns:	float32 in [0, 1] The output of the sigmoid function applied to the activation.

lasagne.nonlinearities.softmax(x)[source]¶

Softmax activation function \(\varphi(\mathbf{x})_j = \frac{e^{\mathbf{x}_j}}{\sum_{k=1}^K e^{\mathbf{x}_k}}\) where \(K\) is the total number of neurons in the layer. This activation function gets applied row-wise.

Parameters:	x : float32 The activation (the summed, weighted input of a neuron).
Returns:	float32 where the sum of the row is 1 and each single value is in [0, 1] The output of the softmax function applied to the activation.

lasagne.nonlinearities.tanh(x)[source]¶

Tanh activation function \(\varphi(x) = \tanh(x)\)

Parameters:	x : float32 The activation (the summed, weighted input of a neuron).
Returns:	float32 in [-1, 1] The output of the tanh function applied to the activation.

class lasagne.nonlinearities.ScaledTanH(scale_in=1, scale_out=1)[source]¶

Scaled tanh \(\varphi(x) = \tanh(\alpha \cdot x) \cdot \beta\)

This is a modified tanh function which allows to rescale both the input and the output of the activation.

Scaling the input down will result in decreasing the maximum slope of the tanh and as a result it will be in the linear regime in a larger interval of the input space. Scaling the input up will increase the maximum slope of the tanh and thus bring it closer to a step function.

Scaling the output changes the output interval to \([-\beta,\beta]\).

Parameters:	scale_in : float32 The scale parameter \(\alpha\) for the input scale_out : float32 The scale parameter \(\beta\) for the output

Notes

LeCun et al. (in [1], Section 4.4) suggest scale_in=2./3 and scale_out=1.7159, which has \(\varphi(\pm 1) = \pm 1\), maximum second derivative at 1, and an effective gain close to 1.

By carefully matching \(\alpha\) and \(\beta\), the nonlinearity can also be tuned to preserve the mean and variance of its input:

scale_in=0.5, scale_out=2.4: If the input is a random normal variable, the output will have zero mean and unit variance.

scale_in=1, scale_out=1.6: Same property, but with a smaller linear regime in input space.

scale_in=0.5, scale_out=2.27: If the input is a uniform normal variable, the output will have zero mean and unit variance.

scale_in=1, scale_out=1.48: Same property, but with a smaller linear regime in input space.

References

[1]	(1, 2) LeCun, Yann A., et al. (1998): Efficient BackProp, http://link.springer.com/chapter/10.1007/3-540-49430-8_2, http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

[2]	Masci, Jonathan, et al. (2011): Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction, http://link.springer.com/chapter/10.1007/978-3-642-21735-7_7, http://people.idsia.ch/~ciresan/data/icann2011.pdf

Examples

In contrast to other activation functions in this module, this is a class that needs to be instantiated to obtain a callable:

>>> from lasagne.layers import InputLayer, DenseLayer
>>> l_in = InputLayer((None, 100))
>>> from lasagne.nonlinearities import ScaledTanH
>>> scaled_tanh = ScaledTanH(scale_in=0.5, scale_out=2.27)
>>> l1 = DenseLayer(l_in, num_units=200, nonlinearity=scaled_tanh)

Methods

__call__(x)

Apply the scaled tanh function to the activation x.

lasagne.nonlinearities.ScaledTanh[source]¶: alias of ScaledTanH

lasagne.nonlinearities.rectify(x)[source]¶

Rectify activation function \(\varphi(x) = \max(0, x)\)

Parameters:	x : float32 The activation (the summed, weighted input of a neuron).
Returns:	float32 The output of the rectify function applied to the activation.

class lasagne.nonlinearities.LeakyRectify(leakiness=0.01)[source]¶

Leaky rectifier \(\varphi(x) = (x > 0)? x : \alpha \cdot x\)

The leaky rectifier was introduced in [1]. Compared to the standard rectifier rectify(), it has a nonzero gradient for negative input, which often helps convergence.

Parameters:	leakiness : float Slope for negative input, usually between 0 and 1. A leakiness of 0 will lead to the standard rectifier, a leakiness of 1 will lead to a linear activation function, and any value in between will give a leaky rectifier.

See also

leaky_rectify: Instance with default leakiness of 0.01, as in [1].
very_leaky_rectify: Instance with high leakiness of 1/3, as in [2].

References

[1]	(1, 2, 3) Maas et al. (2013): Rectifier Nonlinearities Improve Neural Network Acoustic Models, http://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf

[2]	(1, 2) Graham, Benjamin (2014): Spatially-sparse convolutional neural networks, http://arxiv.org/abs/1409.6070

Examples

In contrast to other activation functions in this module, this is a class that needs to be instantiated to obtain a callable:

>>> from lasagne.layers import InputLayer, DenseLayer
>>> l_in = InputLayer((None, 100))
>>> from lasagne.nonlinearities import LeakyRectify
>>> custom_rectify = LeakyRectify(0.1)
>>> l1 = DenseLayer(l_in, num_units=200, nonlinearity=custom_rectify)

Alternatively, you can use the provided instance for leakiness=0.01:

>>> from lasagne.nonlinearities import leaky_rectify
>>> l2 = DenseLayer(l_in, num_units=200, nonlinearity=leaky_rectify)

Or the one for a high leakiness of 1/3:

>>> from lasagne.nonlinearities import very_leaky_rectify
>>> l3 = DenseLayer(l_in, num_units=200, nonlinearity=very_leaky_rectify)

Methods

__call__(x)

Apply the leaky rectify function to the activation x.

lasagne.nonlinearities.leaky_rectify(x)[source]¶: Instance of LeakyRectify with leakiness \(\alpha=0.01\)

lasagne.nonlinearities.very_leaky_rectify(x)[source]¶: Instance of LeakyRectify with leakiness \(\alpha=1/3\)

lasagne.nonlinearities.elu(x)[source]¶

Exponential Linear Unit \(\varphi(x) = (x > 0) ? x : e^x - 1\)

The Exponential Linear Unit (ELU) was introduced in [1]. Compared to the linear rectifier rectify(), it has a mean activation closer to zero and nonzero gradient for negative input, which can help convergence. Compared to the leaky rectifier LeakyRectify, it saturates for highly negative inputs.

Parameters:	x : float32 The activation (the summed, weighed input of a neuron).
Returns:	float32 The output of the exponential linear unit for the activation.

Notes

In [1], an additional parameter \(\alpha\) controls the (negative) saturation value for negative inputs, but is set to 1 for all experiments. It is omitted here.

References

[1]	(1, 2, 3) Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter (2015): Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), http://arxiv.org/abs/1511.07289

class lasagne.nonlinearities.SELU(scale=1, scale_neg=1)[source]¶

Scaled Exponential Linear Unit \(\varphi(x)=\lambda \left[(x>0) ? x : \alpha(e^x-1)\right]\)

The Scaled Exponential Linear Unit (SELU) was introduced in [1] as an activation function that allows the construction of self-normalizing neural networks.

Parameters:	scale : float32 The scale parameter \(\lambda\) for scaling all output. scale_neg : float32 The scale parameter \(\alpha\) for scaling output for nonpositive argument values.

lasagne.nonlinearities¶

Detailed description¶

`lasagne.nonlinearities`¶