David Wells 25/06/2020
When a response variable is ordinal (discrete but ordered) there are several possible approaches to modelling, see Anath et al 1997. Performing several bivariate analyses looses much of the power. A proportional odds model (AKA cummulative logit) is discussed in this notebook and this stack answer and this blog.
Do note however, that the Anath paper advocate either the cumulative logit or the continuous ratio model based on the assumptions and central question. Cumulative logit is best where the response is some arbitrary grouping and it is overall trends that you are interested in.
Ben Bolker has a post about calculating confidence intervals for ordinal probability predictions here.
Ananth, C. V., & Kleinbaum, D. G. (1997). Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology, 26(6), 1323–1333. https://doi.org/10.1093/ije/26.6.1323.
%load_ext rpy2.ipython
from rpy2.robjects import r
%%R
packages <- c("ggplot2", "dplyr", "MASS", "diosR", "dotwhisker", "broom", "VGAM")
lapply(packages, require, character.only=T)
In binomial regression the response variable is the log odds of success. In a proportional odds model the response variable is the log odds of a given category or any lower category (hence the alternative model name cummulative logit). A separate intercept is used for each individual category but the same coefficients are used. This demonstrates the assumption of proportional odds, the effect of the covariates do not differ between the response categories.
%%R
data(housing)
m1 <- polr(Sat ~ Infl + Type + Cont, data=housing, weights=Freq)
summary(m1)
The coefficients are listed just as for any regression in R, the intercepts are slightly different, they are cutpoints along the response variable between the ordinal groups. The intercept used to calculate the cumulative probability of being in the medium or low group is 0.6907. For the low group (or lower but it is the lowest group) the equivalent value is -0.4961. You use either or, they are not summed.
The model assumes that the relationship between all pairs of levels is the same and so there is only one set of parameters. This assumption can be assessed graphically with a parallel slopes plot. I.E. the effect on the log odds ratio of a change in predictor variables should not depend on the response category. If we estimate a separate binary logistic regression for each cut point the coefficients should be the same in each case. The example in the UCLA blog post is equivalent to this but they plot the difference between estimated coefficients, also they are comparing predicted values but it simplifies to comparing coefficients.
Here all of the estimated coefficients are very similar between the models, indicating that the assumption of proportional odds holds.
%%R
m1a <- glm(I(as.numeric(Sat)>=2) ~ Infl + Type + Cont, family="binomial", data=housing, weights=Freq)
m1b <- glm(I(as.numeric(Sat)>=3) ~ Infl + Type + Cont, family="binomial", data=housing, weights=Freq)
#Note that dwplot() doesn't seem to have the correct magnitude of the effects
dwplot(list(m1a, m1b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()
Simulate data with separate probabilities for each class, in this case 0.6, 0.3, 0.1. This is what we will estimate with our model, but we have to calculate the cumulative log odds ourselves to know what the model should be returning. Calculate the cumulative probability of being in any class up to any including the specified class, 0.6, 0.9, 1, and the probability of being in a higher class 0.4, 0.1, 0. The cumulative odds is the ratio of these values 0.6/0.4, 0.9/0.1, and 1/0. Note that the final odds are undefined and ignored because the probability of getting none of the classes is zero. The log of these ratios are the cutoff points which we are estimating as intercepts.
%%R
print(log(0.6/0.4))
print(log(0.9/0.1))
%%R
#3 levels s, m, l. Probability of each 0.6, 0.3, 0.1
n <- 10000
y <- rmultinom(n, 1, prob=c(.6,.3,.1))
#Convert these into numeric as there's a single draw for each sample
y <-apply(y,2,function(col){which(col==1)})
#Convert to ordered factor
y <- ordered(y)
%%R
s1 <- polr(y~1, Hess=T)
summary(s1)
Simulate a coefficient effect which meets the assumption of proportional odds. There's quite a lot in definingy
so lets walkthrough it.
The base probability of each class is prob
, crob
is the cumulative probability. This is transformed onto the logit scale to make logit_crob
. Then we modify the logit cumulative probability of each group based on the covariate x
and its coefficient b
, this is called logit_crobx
. This is transformed back to the cumulative probability scale, crobx
. Finally the cumulative probability is converted to probability, giving us probx
the probability of a sample belonging to each category based on their value of x. probx
is then used to simulate y
.
%%R
n <- 10000
x <- rnorm(n)
b <- 2
# Probability of each class .6,.3,.1
prob <- c(0.6,0.3,0.1)
#Cumulative pROBability
crob <- cumsum(prob)
#Logit of cumulative probability
logit_crob <- qlogis(crob)
# Logit cumulative probability after accounting for the effect of x
logit_crobx <- sapply(x, function(value){logit_crob-(b*value)})
#Transform to the cumulative probability
crobx <- plogis(logit_crobx)
# Convert to probability
probx <- apply(crobx, 2, function(col){c(col[1],diff(col))})
y <- apply(probx, 2, function(p){rmultinom(1, 1, p)})
y <- apply(y, 2, function(col){which(col==1)})
y <- ordered(y)
data <- data.frame(x=x, y=y)
%%R
s2 <- polr(y~x, data=data, Hess=T)
summary(s2)
%%R
#test the assumption of proportional odds which should be met
s2a <- glm(I(as.numeric(y)>=2) ~ x, family="binomial", data=data)
s2b <- glm(I(as.numeric(y)>=3) ~ x, family="binomial", data=data)
dwplot(list(s2a,s2b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()
Note that VGAM uses a different model parameterisation, see this for a brief mathematical summary.
%%R
library(VGAM)
v2 <- vglm(y ~ x, family=propodds, data=data)
summary(v2)
There are probably several ways to do this but the easiest way seems to me to be have a second round of effects on y. Level 2 y
s have a probability of going to level 3 dependant on their value of x
. Therefore, the effect of x
depends on y
and the proportional odds assumption is violated.
%%R
# If y==2 they get a second chance to be 3
# The probability of success for this second round is plogis(b*x)
data$z <- data$y
data$z[data$y==2] <- 2 + rbinom(sum(data$y==2), 1, plogis(b*data$x[data$y==2]))
%%R
s3 <- polr(z~x, data=data, Hess=T)
summary(s3)
%%R
#test the assumption of proportional odds which should be met
s3a <- glm(I(as.numeric(z)>=2) ~ x, family="binomial", data=data)
s3b <- glm(I(as.numeric(z)>=3) ~ x, family="binomial", data=data)
dwplot(list(s3a,s3b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()
Proportional odds models predict the log odds ratio (logit) of the cumulative probability of belonging to group i or lower. Each category has its own intercept and they are not summed for subsequent categories. The model assumes that the effect of coefficients is the same regardless of category. We can validate this assumption by fitting models to predict probability of belonging to group i or higher and comparing the estimated coefficients.
Final figure illustrates how the probabilities of the three classes change with x
in model s2
.
%%R
pd <- data.frame(x=seq(-10,10, length.out=999))
pd <- cbind(pd, predict(s2, newdata=pd, type="probs"))
names(pd)[2:4] <- paste0("p", names(pd)[2:4])
ggplot(pd, aes(x=x, y=p1))+
geom_line(colour="#FFE300", size=3, se=F)+
geom_line(aes(y=p2), colour="#FF8F00", size=3, se=F)+
geom_line(aes(y=p3), colour="#FF3600", size=3, se=F)+
ylab("Probability")+
theme_classic()