The Math You Really Need
Here's the mathematical terms at play:
Weights
(W
) andbiases
(b
): the parameters we learn.Activation function
φ
: adds non-linearity (e.g.,ReLU(x) = max(0,x)
).Loss
: scalar measuring error, e.g.,MSE
for regression,cross-entropy
for classification.Gradient
: vector of partial derivatives that tells us how to tweak parameters to reduce loss.Gradient descent
: update ruleθ ← θ − η ∇θ L
withlearning rate
η
.