Optimizers: How Variables Learn
Variables
are tensors you can update. Optimizers
compute updates from gradients:
SGD
:w ← w − η g
Momentum
: adds velocity to smooth updatesAdam
: adaptive learning rates per parameter (mean + variance estimates)
Under the hood, these are additional ops in the graph that read gradients and write new values.