Understanding Machine Learning
You just built a complete machine learning algorithm! Machine learning programs keep an internal state of the program and update this state to get to a better, more accurate answer when you give the program data to learn from. Let's look at an analogy of this program using concepts from machine learning.
1. Internal State/Weight: The guess
variable is the internal state that is updated when you provide new data for the program to learn. All machine learning algorithms must have internal states to maintain during the learning and prediction process. Some deep learning models even have gigabytes of weights to learn from data (we will learn more about weights soon).
2. Loss: Although we do not directly have a loss value or a loss function in our program, jump
can be thought of as a loss value. The further we are from the answer, the bigger the loss. We can see that the loss value (jump) is decreasing slowly. This means that the algorithm is getting better at predicting the correct number. Every machine learning algorithm's main objective is to decrement/optimize the loss value as much as possible.
3. Learning Rate: The learning rate is the parameter that defines how fast the machine learning algorithm learns. The rate can't be too high or too low. In our program, decrement_jump
can be thought of as the learning rate. If the learning rate is too high, then we will never converge and only oscillate around the minima. If the learning rate is too low, then it will take a long time for the algorithm to converge.
4. Optimizer/Optimizing Algorithm: At this point, we know that a machine learning algorithm has data, weights, loss, and a learning rate. Remember, the main objective of a machine learning algorithm is to optimize the loss function. To optimize the loss function, we need to update the weights somehow with the help of the loss value and the learning rate. This is what the optimizer does. In our program above, the incrementing and decrementing code inside the conditionals as well as the code that uses last_comp
to change the jump direction is the optimizing algorithm. We will learn about different optimizers and gradient descent algorithms in another lesson.
5. Convergence Threshold: Our program must meet certain criteria to exit the while loop (i.e., when the computer has guessed the number or a value close enough to the number). This criterion is the convergence criteria. Most of the time, a value is preset such that when the loss becomes smaller than the preset value, then the convergence criteria are met and the machine learning algorithm stops learning. The value at which this occurs is known as the convergence threshold.
6. Learning Iteration: With every iteration of the while loop, we are providing data (to guess bigger or smaller) or a batch of data to the model. The model then calculates the loss with the given data and weights. Each time the algorithm updates its weights, then one learning iteration has passed.
7. Epoch: If an algorithm is provided a big dataset, then the algorithm has to go over the data several times to improve its learning capabilities. When all of the data is provided beforehand, this is known as offline learning. In offline learning, when the machine learning algorithm goes over all the data once, then one epoch has passed.
8. Accuracy: After training a model (like we did in the program above), we have to test the model to determine how effective the algorithm is. We can do this by determining the accuracy of the model. Accuracy can be calculated differently based on different objective functions. In our guessing game program, the accuracy is calculated by the equation: (answer-guess)/answer x 100%
. Most of the time, accuracy is calculated with a percentage.
9. Activation Function: This function is not available in our guessing game program, but the job of the activation function is to post-process the output of machine learning algorithms to keep it in appropriate shape. Some machine learning algorithms try to make the output too big, too small, negative, or any undesired value. In those situations, activation functions help to keep the output within a boundary.