Confusion matrix

David Wells 24/01/2021

Home page

When we present a binary classification model we don't just report overall accuracy because false positives and false negatives are not necessarily equally important. This is a quick explanation of how we calculate some of the measures and a cheatsheet for looking up which is which because they all have multiple names.

We make a simple logistic regression to classify iris samples as virginica or another species.

In [2]:
%load_ext rpy2.ipython
from rpy2.robjects import r
In [5]:
%%R
packages <- c("ggplot2", "MASS", "plot.matrix")
lapply(packages, require, character.only=T)

#Create a binary prediction variable
iris$target <- iris$Species == "virginica"

m1 <- glm(target~Sepal.Length + Petal.Width , family="binomial", data=iris)
pred <- predict(m1, type="response") >= 0.5

For binary classification, a confusion matrix lets us visualise they types or predictions we make. Our result can be Positive (we predicted the sample is virginica) or negative (we predict the sample is not virginica). Each sample has an actual category, separate from (but hopefully the same as) our predicted category. These are True (the sample is actually virginica) or False (the sample is actually something else). This gives us four types of prediction:

  • True positive: we predict the sample is virginica and we're correct
  • False negative: the sample is virginica but we predict it's something else
  • False positive: we predict the sample is virginica but we're wrong
  • True negative: we predict the sample is something else and we're correct.

In each case positive/negative is the type of prediction we make and true/false is whether that prediction was correct (not whether the sample was actually true or false).

We can count the number of samples that fall into these 4 categories and plot them as a confusion matrix.

In [6]:
%%R

tp <- sum(iris$target & pred)
fn <- sum(iris$target & !pred)
fp <- sum(!iris$target & pred)
tn <- sum(!iris$target & !pred)

#True Positive Rate, Sensitivity, recall
tpr <- tp/(tp+fn)
# True Negative Rate, specificity
tnr <- tn/(tn+fp)

precision <- tp/(tp+fp)

confuse <- matrix(c(tp,fn,fp,tn), nrow=2, byrow=T)

colnames(confuse) <- c("True","False")
rownames(confuse) <- c("True","False")
names(dimnames(confuse)) <- c("Actual", "Predicted")

plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")

The confusion matrix above shows that our model correctly identified 46 virginica samples, and 98 non-virginica samples. It also shows that there were 4 virginica missclassified as something else (False negative), and 2 samples incorrectly classified as virginica (false positive).

True positive rate

AKA sensitivity or recall, is the proportion of samples which are actually true which are predicted true. $$\frac{TP}{TP+FN}$$ Below I have highlighted which two parts of the confusion matrix are used to calculate this.

In [7]:
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)

True negative rate

AKA specificity, is the proportion of samples which are actually false which are predicted false. $$\frac{TN}{TN+FP}$$

In [14]:
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")

Precision

AKA positive predicitve value, is the proportion of samples which are predicted true which are actually true. $$\frac{TP}{TP+FP}$$

In [9]:
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")

Reporting

Rather than reporting accuracy of a binary classification model we often report

  • sensitivity and specificity which are the proportion of actually true and actually false samples which are predicted correctly.

or

  • precision and recall which are the proportion of predicted Trues which are correct and the proportion of actual Trues which are correct respectively.
In [15]:
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")

text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)

text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")

text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")
In [ ]: