David Wells *24/01/2021*

When we present a binary classification model we don't just report overall accuracy because false positives and false negatives are not necessarily equally important. This is a quick explanation of how we calculate some of the measures and a cheatsheet for looking up which is which because they all have multiple names.

We make a simple logistic regression to classify iris samples as virginica or another species.

In [2]:

```
%load_ext rpy2.ipython
from rpy2.robjects import r
```

In [5]:

```
%%R
packages <- c("ggplot2", "MASS", "plot.matrix")
lapply(packages, require, character.only=T)
#Create a binary prediction variable
iris$target <- iris$Species == "virginica"
m1 <- glm(target~Sepal.Length + Petal.Width , family="binomial", data=iris)
pred <- predict(m1, type="response") >= 0.5
```

For binary classification, a confusion matrix lets us visualise they types or predictions we make. Our result can be *Positive* (we predicted the sample is virginica) or *negative* (we predict the sample is not virginica). Each sample has an actual category, separate from (but hopefully the same as) our predicted category. These are *True* (the sample is actually virginica) or *False* (the sample is actually something else). This gives us four types of prediction:

- True positive: we predict the sample is virginica and we're correct
- False negative: the sample is virginica but we predict it's something else
- False positive: we predict the sample is virginica but we're wrong
- True negative: we predict the sample is something else and we're correct.

In each case positive/negative is the type of prediction we make and true/false is whether that prediction was correct (not whether the sample was actually true or false).

We can count the number of samples that fall into these 4 categories and plot them as a confusion matrix.

In [6]:

```
%%R
tp <- sum(iris$target & pred)
fn <- sum(iris$target & !pred)
fp <- sum(!iris$target & pred)
tn <- sum(!iris$target & !pred)
#True Positive Rate, Sensitivity, recall
tpr <- tp/(tp+fn)
# True Negative Rate, specificity
tnr <- tn/(tn+fp)
precision <- tp/(tp+fp)
confuse <- matrix(c(tp,fn,fp,tn), nrow=2, byrow=T)
colnames(confuse) <- c("True","False")
rownames(confuse) <- c("True","False")
names(dimnames(confuse)) <- c("Actual", "Predicted")
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
```

The confusion matrix above shows that our model correctly identified 46 virginica samples, and 98 non-virginica samples. It also shows that there were 4 virginica missclassified as something else (False negative), and 2 samples incorrectly classified as virginica (false positive).

AKA sensitivity or recall, is the proportion of samples which are actually true which are predicted true. $$\frac{TP}{TP+FN}$$ Below I have highlighted which two parts of the confusion matrix are used to calculate this.

In [7]:

```
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)
```

AKA specificity, is the proportion of samples which are actually false which are predicted false. $$\frac{TN}{TN+FP}$$

In [14]:

```
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")
```

AKA positive predicitve value, is the proportion of samples which are predicted true which are actually true. $$\frac{TP}{TP+FP}$$

In [9]:

```
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")
```

Rather than reporting accuracy of a binary classification model we often report

- sensitivity and specificity which are the proportion of actually true and actually false samples which are predicted correctly.

or

- precision and recall which are the proportion of predicted Trues which are correct and the proportion of actual Trues which are correct respectively.

In [15]:

```
%%R
plot(confuse, fmt.cell="%.0f", cex=2, col=colorRampPalette(c("azure", "lightseagreen"))(20), key=NULL, main="")
text(1.5,2.1,"True Positive Rate", col="darkviolet", cex=2)
segments(1.1,2,1.9,2, col="darkviolet", lwd=5)
text(1.5,1.1,"True Negative Rate", col="#ff0000", cex=2)
segments(1.1,1,1.9,1, lwd=5, col="#ff0000")
text(.9,1.5,"Precision", srt=-90, col="mediumblue", cex=2)
segments(1,1.9,1,1.1, lwd=5, col="mediumblue")
```

In [ ]:

```
```