Ordinal GLM

David Wells 25/06/2020

Home page

When a response variable is ordinal (discrete but ordered) there are several possible approaches to modelling, see Anath et al 1997. Performing several bivariate analyses looses much of the power. A proportional odds model (AKA cummulative logit) is discussed in this notebook and this stack answer and this blog.

Do note however, that the Anath paper advocate either the cumulative logit or the continuous ratio model based on the assumptions and central question. Cumulative logit is best where the response is some arbitrary grouping and it is overall trends that you are interested in.

Ben Bolker has a post about calculating confidence intervals for ordinal probability predictions here.

Ananth, C. V., & Kleinbaum, D. G. (1997). Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology, 26(6), 1323–1333. https://doi.org/10.1093/ije/26.6.1323.

In [1]:
%load_ext rpy2.ipython
from rpy2.robjects import r
In [15]:
%%R
packages <- c("ggplot2", "dplyr", "MASS", "diosR", "dotwhisker", "broom", "VGAM")
lapply(packages, require, character.only=T)
[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

[[5]]
[1] TRUE

[[6]]
[1] TRUE

[[7]]
[1] TRUE

In binomial regression the response variable is the log odds of success. In a proportional odds model the response variable is the log odds of a given category or any lower category (hence the alternative model name cummulative logit). A separate intercept is used for each individual category but the same coefficients are used. This demonstrates the assumption of proportional odds, the effect of the covariates do not differ between the response categories.

In [16]:
%%R
data(housing)

m1 <- polr(Sat ~ Infl + Type + Cont, data=housing, weights=Freq)
summary(m1)
Call:
polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq)

Coefficients:
                Value Std. Error t value
InflMedium     0.5664    0.10465   5.412
InflHigh       1.2888    0.12716  10.136
TypeApartment -0.5724    0.11924  -4.800
TypeAtrium    -0.3662    0.15517  -2.360
TypeTerrace   -1.0910    0.15149  -7.202
ContHigh       0.3603    0.09554   3.771

Intercepts:
            Value   Std. Error t value
Low|Medium  -0.4961  0.1248    -3.9739
Medium|High  0.6907  0.1255     5.5049

Residual Deviance: 3479.149 
AIC: 3495.149 

The coefficients are listed just as for any regression in R, the intercepts are slightly different, they are cutpoints along the response variable between the ordinal groups. The intercept used to calculate the cumulative probability of being in the medium or low group is 0.6907. For the low group (or lower but it is the lowest group) the equivalent value is -0.4961. You use either or, they are not summed.

Assumptions

The model assumes that the relationship between all pairs of levels is the same and so there is only one set of parameters. This assumption can be assessed graphically with a parallel slopes plot. I.E. the effect on the log odds ratio of a change in predictor variables should not depend on the response category. If we estimate a separate binary logistic regression for each cut point the coefficients should be the same in each case. The example in the UCLA blog post is equivalent to this but they plot the difference between estimated coefficients, also they are comparing predicted values but it simplifies to comparing coefficients.

Here all of the estimated coefficients are very similar between the models, indicating that the assumption of proportional odds holds.

In [17]:
%%R
m1a <- glm(I(as.numeric(Sat)>=2) ~ Infl + Type + Cont, family="binomial", data=housing, weights=Freq)
m1b <- glm(I(as.numeric(Sat)>=3) ~ Infl + Type + Cont, family="binomial", data=housing, weights=Freq)

#Note that dwplot() doesn't seem to have the correct magnitude of the effects
dwplot(list(m1a, m1b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()

Try a simulated example

Simulate data with separate probabilities for each class, in this case 0.6, 0.3, 0.1. This is what we will estimate with our model, but we have to calculate the cumulative log odds ourselves to know what the model should be returning. Calculate the cumulative probability of being in any class up to any including the specified class, 0.6, 0.9, 1, and the probability of being in a higher class 0.4, 0.1, 0. The cumulative odds is the ratio of these values 0.6/0.4, 0.9/0.1, and 1/0. Note that the final odds are undefined and ignored because the probability of getting none of the classes is zero. The log of these ratios are the cutoff points which we are estimating as intercepts.

In [5]:
%%R
print(log(0.6/0.4))
print(log(0.9/0.1))
[1] 0.4054651
[1] 2.197225
In [6]:
%%R
#3 levels s, m, l. Probability of each 0.6, 0.3, 0.1
n <- 10000
y <- rmultinom(n, 1, prob=c(.6,.3,.1))
#Convert these into numeric as there's a single draw for each sample
y <-apply(y,2,function(col){which(col==1)})
#Convert to ordered factor
y <- ordered(y)
In [7]:
%%R
s1 <- polr(y~1, Hess=T)
summary(s1)
Call:
polr(formula = y ~ 1, Hess = T)

No coefficients

Intercepts:
    Value   Std. Error t value
1|2  0.4238  0.0205    20.7246
2|3  2.1862  0.0332    65.8754

Residual Deviance: 17918.49 
AIC: 17922.49 

Simulate a coefficient

Simulate a coefficient effect which meets the assumption of proportional odds. There's quite a lot in definingy so lets walkthrough it.

The base probability of each class is prob, crob is the cumulative probability. This is transformed onto the logit scale to make logit_crob. Then we modify the logit cumulative probability of each group based on the covariate x and its coefficient b, this is called logit_crobx. This is transformed back to the cumulative probability scale, crobx. Finally the cumulative probability is converted to probability, giving us probx the probability of a sample belonging to each category based on their value of x. probx is then used to simulate y.

In [8]:
%%R
n <- 10000
x <- rnorm(n)
b <- 2
# Probability of each class .6,.3,.1
prob <- c(0.6,0.3,0.1)
#Cumulative pROBability
crob <- cumsum(prob)
#Logit of cumulative probability
logit_crob <- qlogis(crob)
# Logit cumulative probability after accounting for the effect of x
logit_crobx <- sapply(x, function(value){logit_crob-(b*value)})
#Transform to the cumulative probability
crobx <- plogis(logit_crobx)
# Convert to probability
probx <- apply(crobx, 2, function(col){c(col[1],diff(col))})

y <- apply(probx, 2, function(p){rmultinom(1, 1, p)})
y <- apply(y, 2, function(col){which(col==1)})
y <- ordered(y)
data <- data.frame(x=x, y=y)
In [9]:
%%R
s2 <- polr(y~x, data=data, Hess=T)
summary(s2)
Call:
polr(formula = y ~ x, data = data, Hess = T)

Coefficients:
  Value Std. Error t value
x 1.987    0.03439   57.78

Intercepts:
    Value   Std. Error t value
1|2  0.4208  0.0264    15.9532
2|3  2.2144  0.0363    61.0386

Residual Deviance: 14221.73 
AIC: 14227.73 
In [10]:
%%R
#test the assumption of proportional odds which should be met
s2a <- glm(I(as.numeric(y)>=2) ~ x, family="binomial", data=data)
s2b <- glm(I(as.numeric(y)>=3) ~ x, family="binomial", data=data)

dwplot(list(s2a,s2b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()

Note that VGAM uses a different model parameterisation, see this for a brief mathematical summary.

In [11]:
%%R
library(VGAM)
v2 <- vglm(y ~ x, family=propodds, data=data)
summary(v2)
Call:
vglm(formula = y ~ x, family = propodds, data = data)

Pearson residuals:
                      Min      1Q  Median       3Q   Max
logitlink(P[Y>=2]) -6.880 -0.5257 -0.1658  0.47935 10.45
logitlink(P[Y>=3]) -7.528 -0.3869 -0.1445 -0.04406 12.16

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1 -0.42076    0.02639  -15.95   <2e-16 ***
(Intercept):2 -2.21437    0.03625  -61.09   <2e-16 ***
x              1.98697    0.03438   57.79   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Names of linear predictors: logitlink(P[Y>=2]), logitlink(P[Y>=3])

Residual deviance: 14221.73 on 19997 degrees of freedom

Log-likelihood: -7110.864 on 19997 degrees of freedom

Number of Fisher scoring iterations: 6 

No Hauck-Donner effect found in any of the estimates


Exponentiated coefficients:
       x 
7.293389 

Violating the proportional odds assumption?

There are probably several ways to do this but the easiest way seems to me to be have a second round of effects on y. Level 2 ys have a probability of going to level 3 dependant on their value of x. Therefore, the effect of x depends on y and the proportional odds assumption is violated.

In [12]:
%%R

# If y==2 they get a second chance to be 3
# The probability of success for this second round is plogis(b*x)
data$z <- data$y
data$z[data$y==2] <- 2 + rbinom(sum(data$y==2), 1, plogis(b*data$x[data$y==2]))
In [13]:
%%R
s3 <- polr(z~x, data=data, Hess=T)
summary(s3)
Call:
polr(formula = z ~ x, data = data, Hess = T)

Coefficients:
  Value Std. Error t value
x 2.156    0.03992      54

Intercepts:
    Value   Std. Error t value
1|2  0.3932  0.0268    14.6571
2|3  1.0172  0.0289    35.1890

Residual Deviance: 12523.85 
AIC: 12529.85 
In [14]:
%%R
#test the assumption of proportional odds which should be met
s3a <- glm(I(as.numeric(z)>=2) ~ x, family="binomial", data=data)
s3b <- glm(I(as.numeric(z)>=3) ~ x, family="binomial", data=data)

dwplot(list(s3a,s3b)) +
geom_vline(xintercept=0, lty=2)+
theme_classic()

Conclusion

Proportional odds models predict the log odds ratio (logit) of the cumulative probability of belonging to group i or lower. Each category has its own intercept and they are not summed for subsequent categories. The model assumes that the effect of coefficients is the same regardless of category. We can validate this assumption by fitting models to predict probability of belonging to group i or higher and comparing the estimated coefficients.

Final figure illustrates how the probabilities of the three classes change with x in model s2.

In [57]:
%%R
pd <- data.frame(x=seq(-10,10, length.out=999))

pd <- cbind(pd, predict(s2, newdata=pd, type="probs"))
names(pd)[2:4] <- paste0("p", names(pd)[2:4])

ggplot(pd, aes(x=x, y=p1))+
geom_line(colour="#FFE300", size=3, se=F)+
geom_line(aes(y=p2), colour="#FF8F00", size=3, se=F)+
geom_line(aes(y=p3), colour="#FF3600", size=3, se=F)+
ylab("Probability")+
theme_classic()
In [ ]: