Mark As Completed Discussion

Optimizers: How Variables Learn

Variables are tensors you can update. Optimizers compute updates from gradients:

  • SGD: w ← w − η g
  • Momentum: adds velocity to smooth updates
  • Adam: adaptive learning rates per parameter (mean + variance estimates)

Under the hood, these are additional ops in the graph that read gradients and write new values.