elektronn2.neuromancer.optimiser module¶
-
class
elektronn2.neuromancer.optimiser.
AdaDelta
(inputs, loss, grads, params, extra_updates, additional_outputs=None)[source]¶ Bases:
elektronn2.neuromancer.optimiser.Optimiser
AdaDelta optimiser (See https://arxiv.org/abs/1212.5701).
Like AdaGrad, but accumulate squared only over window The delta part is some diagonal hessian approximation. Claims to be robust against sudden large gradients because then the denominator explodes, but this explosion is persistent for a while… (and this argumentation is true for any method accumulating squared grads).
-
class
elektronn2.neuromancer.optimiser.
AdaGrad
(inputs, loss, grads, params, extra_updates, additional_outputs=None)[source]¶ Bases:
elektronn2.neuromancer.optimiser.Optimiser
AdaGrad optimiser (See http://jmlr.org/papers/v12/duchi11a.html).
Tries to favor making faster progress on parameters with usually small gradients (but does somehow ignore their actual direction, i.e. a parameter which has a lot of small gradients in the same direction and one that has many small gradients in opposite directions have both a high LR !
-
class
elektronn2.neuromancer.optimiser.
Adam
(inputs, loss, grads, params, extra_updates, additional_outputs=None)[source]¶ Bases:
elektronn2.neuromancer.optimiser.Optimiser
Adam optimiser (See https://arxiv.org/abs/1412.6980v9).
Like AdaGrad with windowed squared_accum and with momentum and a bias for the initial phase (t). The normalisation of Adam and AdaGrad (and RMSProp) does not damp but exaggerates sudden steep gradients (their squared_accum is small and their current grad is large).
-
class
elektronn2.neuromancer.optimiser.
Optimiser
(inputs, loss, grads, params, additional_outputs)[source]¶ Bases:
object
Returns new shared variables matching the shape of params/gradients
-
global_lr
= lr¶
-
global_mom
= mom¶
-
global_weight_decay
= weight_decay¶
-
class
elektronn2.neuromancer.optimiser.
SGD
(inputs, loss, grads, params, extra_updates, additional_outputs=None)[source]¶ Bases:
elektronn2.neuromancer.optimiser.Optimiser
SGD optimiser (See https://en.wikipedia.org/wiki/Stochastic_gradient_descent).