Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between get_predicted and get_predicted_ci for mixed models #797

Open
fhui28 opened this issue Aug 16, 2023 · 3 comments · Fixed by #814
Open

Differences between get_predicted and get_predicted_ci for mixed models #797

fhui28 opened this issue Aug 16, 2023 · 3 comments · Fixed by #814
Assignees
Labels
3 investigators ❔❓ Need to look further into this issue

Comments

@fhui28
Copy link

fhui28 commented Aug 16, 2023

This is something that came up during a visit to @bbolker with @emitanaka. Basically, for (G)LMMs, the standard errors returned by get_predicted and get_predicted_ci differ.

An example for LMMs first:

library(tidyverse)
library(lme4)
library(glmmTMB)
library(insight)

m1 <- lmer(Petal.Width ~ Sepal.Length + (1 | Species), iris)
get_predicted(m1, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m1)
get_predicted_ci(m1, predictions) %>% as.data.frame() %>% head 
#' Matches. Currently, both of these standard errors are actually wrong because they 
#' do not account for the uncertainty in the predicted random effects. 
#' This is an lme4 thing though, and Ben Bolker among others is aware of this. 


m2 <- glmmTMB(Petal.Width ~ Sepal.Length + (1 | Species), iris, REML = TRUE)
get_predicted(m2, ci = 0.95) %>% as.data.frame() %>% head 
predictions <- predict(m2)
get_predicted_ci(m2, predictions) %>% as.data.frame() %>% head 
#' Does not match. The former gets the TMB prediction error,  which is better. 
#' But the latter reverts back to the lme4 incorrect prediction error. 

For the issue above, and with respect to the insights package, my bigger concern is the inconsistency in modem m2. As mentioned, the issue why the prediction errors differ between lmer and glmmTMB is something @bbolker is already aware of; see this github issue from glmmTMB

Things gets a bit weirder for GLMMs though, and this is where I will throw my hands up in the air (even higher!)

m1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial)
get_predicted(m1, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m1, type = "response")
get_predicted_ci(m1, predictions) %>% as.data.frame() %>% head 
#' These two do not match (the latter is conisderably larger); not sure why.


m2 <- glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial)
get_predicted(m2, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m2, type = "response")
get_predicted_ci(m2, predictions) %>% as.data.frame() %>% head 
#' These two also do not match (the latter is conisderably larger); not sure why. 
#' Also, note the latter gets very similar results to get_predicted_ci(m1, predictions), 
#' while the former is not the same as get_predicted(m1, ci = 0.95)
@strengejacke strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Aug 21, 2023
@strengejacke
Copy link
Member

strengejacke commented Sep 26, 2023

get_predicted_ci() is mostly (internally) used for those model classes that do not return any standard errors for predictions (i.e. when there's nothing like se.fit = TRUE). This also applies to merMod-models. For these models, we use something like:

M <- model.matrix(m1)
V <- vcov(m1)
se <- sqrt(diag(M %*% V %*% t(M)))

based on https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#lme4. Thus, when calling get_predicted_ci(), standard errors are always based on the above formula. get_predicted() returns standard errors from predict() where possible. That's why get_predicted.glmmTMB() relies on standard errors returned by predict.glmmTMB(), but when calling get_predicted_ci() - even for a glmmTMB model - the above formula is used. That explains the difference between standard errors for glmmTMB models returned by get_predicted() or get_predicted_ci().

I think this should be clearly documented, and that get_predicted() is preferred (we may even hide get_predicted_ci() from the users?) in order to get SE and CI.

@strengejacke
Copy link
Member

Tagging @bbolker just FYI.

@strengejacke strengejacke self-assigned this Sep 26, 2023
strengejacke added a commit that referenced this issue Sep 27, 2023
…ls (#814)

* Differences between get_predicted and get_predicted_ci for mixed models
Fixes #797

* news, version

* docs

* this is intentional, turn styler off

* typos
@strengejacke strengejacke reopened this Sep 27, 2023
@strengejacke
Copy link
Member

Not sure if the documentation I added is sufficient, so I'll keep this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants