Mark As Completed Discussion

The ROC Curve

The idea of the Receiver Operating Characteristic (ROC) curve is to illustrate the performance of a model at all possible thresholds. By doing this, we can find the threshold that would separate the classes the best.

What do we mean by setting the threshold? Well, every time a new sample comes, the model calculates the probability of its label belonging in any of the possible classes. Based on that probability and the specified threshold, it assigns the label. For example, if we try to classify if a person is obese or not, and the threshold is set at 0.5, every time the probability of that person being obese is over 0.5, the model classifies them as obese.

Even though the intuitive approach when setting a threshold is to put it at 0.5, it is sometimes more convenient to put the threshold lower or higher like when classifying patients as sick from some disease. In such a case, in order to correctly classify all patients that are sick, we might have to lower the threshold and get a higher number of false-positive predictions.

So, by using the ROC curve, we can determine the optimal threshold for our approach. We do this by plotting the true positive rate (TPR):

TPR=TPTP+FN

against the false positive rate (FPR):

FPR=FPFP+TN:

on a graph and drawing a line where each point of this line represents the ratio of TPR and FPR for a specific threshold.