David Wells 28/10/2020
Expectation maximisation is a method for assigning samples to an unobserved category based on observed variables. It's a form of unsupervised classification that works well if your data can be explained by a well defined parametric model. The expectation maximisation algorithm has two major steps: expectation and maximisation, which are repeated again and again until convergence.
There is some data generating process which differs between classes, you assign samples to the different groups based on which group you believe (or expect) them to belong to. Then you estimate the parameters of the data generating process which maximises the likelihood of the observed data. Then you reassign the samples to classes you expect them to have come from based on the new parameters. This goes on and on until the parameters of the data generating process have converged.
Here we're going through a simple example where there are two unobserved classes. Each sample has two observed variables drawn from a bivariate gaussian distribution with separate means and covariances for the two unobserved classes. In this case the parameters we estimate in order to maximise likelihood are the mean and covariances of the two classes. This is called a gaussian mixture model.
This blog post is a walkthrough of the underlying maths in a gaussian mixture model. And it also covers in more detail how classes are assigned to different classes including "soft assignments".
FlexMix guide and specific example.
Scikit explaination.
packages <- c("MASS", "ggplot2", "flexmix")
lapply(packages, require, character.only=T)
Specify the number of samples drawn from each class. We also need to set the means and covariances of the observed variables which in our case is different for the two unobserved classes.
# Number of samples in the two classes
n1 <- 100
n2 <- 100
# Vector of means for the two observed variables
mu1 <- c(3, 3)
mu2 <- c(6, 6)
# Covariance matrices of the two classes
sigma1 <- matrix(c(1,0.5,0.5,1), 2,2)
sigma2 <- matrix(c(2,1,1,2), 2,2)
# Simulate data
class1 <- mvrnorm(n1, mu=mu1, Sigma=sigma1)
class2 <- mvrnorm(n2, mu=mu2, Sigma=sigma2)
# Bind the data from the two classes together
simData <- as.data.frame(rbind(class1, class2))
options(repr.plot.width = 5, repr.plot.height = 4)
ggplot(simData, aes(x=V1, y=V2))+
geom_point(size=5, colour="grey")+
theme_classic()
The figure above shows the data and it is not immediately obvious where the distinction between the two classes are. By fitting a gaussian mixure model we will estimate the model which generated the data and the probabilty that each point belongs to each class.
#FlexMix Unstructured cov matrix model
model <- flexmix(cbind(V1,V2)~1, k=2, data=simData, model=FLXMCmvnorm(diag=F))
#Plot the 50% and 95% confidence intervals with the assigned classes.
plotEll(model, simData[,c('V1','V2')], col=c("orange","darkcyan"))
Visualise the probabilities of assignment. If there is good separation there should be peaks at 1 for each class.
plot(model)
#View estimated means and covariances of the two groups
model@components
#Get the probability of belonging to each group for each sample 
# posterior(model)
This is some inelegant code to generate a surface plot of the difference in component specific likelihoods. Although not especially interpretable it does illustrate that near the means the classification is clearly alternatives whereas away from the center of the clusters it is less clear which cluster samples belong too. This is not seen when plotting the probability unscaled=False as even far from the centre of clusters assignments are very confident.
options(repr.plot.width = 7, repr.plot.height = 7)
nx <- 15
ny <- 15
x <- seq(0,8, length.out=nx)
y <- seq(0,8, length.out=ny)
newdata <- expand.grid(V1=x, V2=y)
#Likelihoods rather than probability
prob <- posterior(model, newdata=newdata, unscaled=T)
z <- matrix(prob[,1]-prob[,2], nx, ny)
nrz <- nrow(z)
ncz <- ncol(z)
# Generate the desired number of colors from this palette
colfunc <- colorRampPalette(c("darkcyan", "orange"))
nbcol <- 100
color <- colfunc(nbcol)
# Compute the z-value at the facet centres
zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
# Recode facet z-values into color indices
facetcol <- cut(zfacet, nbcol)
persp(x,y,z, phi=15, theta=-20, col = color[facetcol])
As we have simulated this data we can visualise the true classes by drawing elipses based on the true classes and colouring by the probability of belonging to one class.
Note that the stat_ellipse() function draws a data elipse so plotEll() is probability better for inference in most cases.
simData$class <- rep(c("Negative", "Positive"), c(n1, n2))
simData$prob1 <- posterior(model)[,1]
options(repr.plot.width = 5, repr.plot.height = 4)
ggplot(simData, aes(x=V1, y=V2, colour=prob1))+
geom_point(size=5)+
stat_ellipse(aes(group=class))+
scale_colour_gradientn(colors=c("darkcyan", "orange"))+
theme_classic()