Академический Документы
Профессиональный Документы
Культура Документы
activations
“local gradient”
gradients
Sigmoid
Maxout
tanh tanh(x)
ELU
ReLU max(0,x)
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
Sigmoid
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
3 problems:
3 problems:
hypothetical
What can we say about the gradients on w? optimal w
vector
Always all positive or all negative :(
(this is also why you want zero-mean data!)
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating “firing rate” of a neuron
3 problems:
tanh(x)
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)
ReLU
(Rectified Linear Unit)
- Computes f(x) = max(0,x)
Activation Functions
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)
dead ReLU
will never activate
=> never update
active ReLU
DATA CLOUD
Leaky ReLU
Activation Functions
- Does not saturate
- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not “die”.