Mark As Completed Discussion

Hardware and Complexity

  • Training cost grows with data size, model size, and sequence/image resolution.
  • Batch size: how many samples per gradient step. Larger batches use more memory.
  • Epoch: one full pass over training data.
  • Typical accelerators: GPUs/TPUs; but conceptually all you need is the math we wrote.