David Wells 24/01/2021
When we present a binary classification model we don't just report overall accuracy because false positives and false negatives are not necessarily equally important. This is a quick explanation of how we calculate some of the measures and a cheatsheet for looking up which is which because they all have multiple names.
We make a simple logistic regression to classify iris samples as virginica or another species.
%load_ext rpy2.ipython
from rpy2.robjects import r
%%R
packages <- c("ggplot2", "MASS", "plot.matrix")
lapply(packages, require, character.only=T)
#Create a binary prediction variable
iris$target <- iris$Species == "virginica"
m1 <- glm(target~Sepal.Length + Petal.Width , family="binomial", data=iris)
pred <- predict(m1, type="response") >= 0.5
For binary classification, a confusion matrix lets us visualise they types or predictions we make. Our result can be Positive (we predicted the sample is virginica) or negative (we predict the sample is not virginica). Each sample has an actual category, separate from (but hopefully the same as) our predicted category. These are True (the sample is actually virginica) or False (the sample is actually something else). This gives us four types of prediction:
In each case positive/negative is the type of prediction we make and true/false is whether that prediction was correct (not whether the sample was actually true or false).
We can count the number of samples that fall into these 4 categories and plot them as a confusion matrix.
%%R
tp <- sum(iris$target & pred)
fn <- sum(iris$target & !pred)
fp <- sum(!iris$target & pred)
tn <- sum(!iris$target & !pred)
#True Positive Rate, Sensitivity, recall
tpr <- tp/(tp+fn)
# True Negative Rate, specificity
tnr <- tn/(tn+fp)
precision <- tp/(tp+fp)
confuse <- matrix(c(tp,fn,fp,tn), nrow=2, byrow=T)
colnames(confuse) <- c("True","False")
rownames(confuse) <- c("True","False")
names(dimnames(confuse)) <- c("Actual", "Predicted")
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
The confusion matrix above shows that our model correctly identified 46 virginica samples, and 98 non-virginica samples. It also shows that there were 4 virginica missclassified as something else (False negative), and 2 samples incorrectly classified as virginica (false positive).
AKA sensitivity or recall, is the proportion of samples which are actually true which are predicted true. $$\frac{TP}{TP+FN}$$ Below I have highlighted which two parts of the confusion matrix are used to calculate this.
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)
AKA specificity, is the proportion of samples which are actually false which are predicted false. $$\frac{TN}{TN+FP}$$
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")
AKA positive predicitve value, is the proportion of samples which are predicted true which are actually true. $$\frac{TP}{TP+FP}$$
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")
Rather than reporting accuracy of a binary classification model we often report
or
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)
text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")
text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")