Post

IML L3.2 ROC curve

ROC curve

Let us consider the cancer data sample again

img

Quality metrics

  • the performance of a binary classifier can be described by the confusion matrix
 true value is positivetrue value is negative
predicted positivetrue positive TPfalse positive FP
predicted negativefalse negative FNTrue negative TN
  • From this matrix we can define several metrics to quantify the quality of the classification.

true positive $rate=\frac{TP}{TP+FN}$

and

false positive $rate=\frac{FP}{FP+TN}$

we can see how well the prediction works by plotting the true value as a function of $z$ for each data point in the training sample:

img

  • The points with$z>0$ are assigned to the$y=1$ class
    • they correspond to $p> \frac{1}{2} $
  • those with $z<0$ to the $y=0$ class
    • they correspond to $p< \frac{1}{2} $

The different categories (TP, FP, TN, FN) can be visualised on this plot:

img

If we are more worried about false negative than about false positive, we can move the decision boundary to the left:

img

Of course if means more false positives…

If we are more worried about false positive than about false negative, we can move the decision boundary to the right:

img

Of course if means more false negatives…

The curve describing this trade-off is the ROC curve (Receiver Operating Characteristic). It is the collection of (FP rate, TP rate) values for all values of the decision boundary.

img

Move the threshold to the left:

  • more true positives
  • more false positive

img

Move the threshold to the right:

  • less true positives
  • less false positive

img

This post is licensed under CC BY 4.0 by the author.

Trending Tags