Plotting normality of residuals: Scaling issues / differences #335

mutlusun · 2024-03-27T22:19:34Z

Dear package authors,

Thank you very much for all your work and effort for the easystats packages. I like and use them a lot and they make my life definitely easier.

Recently, I stumbled upon a scaling "issue" with a data set consisting of two time points. Here is a small example for examination:

set.seed(1234)
x <- rnorm(100)
y <- rnorm(100, mean = x, sd = .5)
myd <- data.frame(x, y)

When fitting a regression model to this data and plotting the residuals, I get the impression that something is off here:

fit <- lm(y ~ x, data = myd)
tmp <- performance::check_normality(fit)
plot(tmp, type = "qq")

However, plotting in a different way doesn't show such a strong deviation:

plot(tmp, type = "qq", detrend = FALSE)

First of all, I'm not sure whether these differences in visualization are an "issue" of this package. When looking at the y-axis in the first plot, it is clear that the deviations don't have a strong magnitude. However, I still find it a bit strange that the same function suggests different interpretations.

Would it be an option to set certain minimal limits for the plot in the case of detrend = TRUE? I'm happy to implement this but wanted to ask for your thoughts before.

Note: This is not such an issue with qqplotr installed as the bands are often very wide and the interval wider.

The text was updated successfully, but these errors were encountered:

bwiernik · 2024-03-27T22:32:44Z

The second plot shows a very strong deviation. Those tails are much wider than expected under a normal distribution. The first plot better shows the amount of the deviation by removing the angular axis.

strengejacke · 2024-03-27T22:33:38Z

This issue is related to easystats/performance#643 (comment).
When you look at the examples (https://easystats.github.io/see/articles/performance.html#qq-plot), you see that detrend = TRUE often works well. But sometimes, due to the different limits of the y-axis, the plots look rather different (although they are not).

I still find it a bit strange that the same function suggests different interpretations.

It does not. It's just the scale that "zooms in" for detrend = TRUE, and actually, for detrend = TRUE, the deviation is usually easier to see.

That said, it may be surprising to users to see that large differences. Maybe we find a way to adjust the y-axis, so the differences are not that large.

bwiernik · 2024-03-27T22:37:11Z

We set the non-detrended y axis limits to fixed values if I recall correctly. We should do the same with appropriate values for the detrended y axis.

strengejacke · 2024-03-27T22:40:59Z

No, I think for both plots, we don't set limits, thus ggplot2 decides which range to use, based on the range of the data.

see/R/plot.check_normality.R

Line 214 in 1a090e1

.plot_diag_qq <- function(x,

The question is if it's possible to find reasonable y-limits for the detrended plot?

mutlusun · 2024-03-28T14:32:04Z

Thank you for your replies!

Sorry, for being sloppy in my description above. As both of you pointed out, the tails show a strong deviation and both plots show the same thing but the scaling is different. I am aware of that. I was, for example, more concerned with the range of -1 to 1 of the standard normal distribution quantiles which could be interpreted differently between both plots (when not paying attention to the limits of the y-axes).

The question is if it's possible to find reasonable y-limits for the detrended plot?

I have thought about that a while but I think it is hard to come up with reasonable y-limits. Is there a good rule of thumb for deciding when there is a deviation? (This way minimum y-limits could be set. But, I didn't find one and it is probably misleading to decide for one.)

As an alternative: Would it be possible to set qqplotr as a hard dependency for this plot? This could be an easy way to increase the y-limits a bit.

ClaudioZandonella · 2024-05-22T09:37:36Z

Adding on this topic, I found these results interesting.

qqplot with detrend = TRUE

qqplot with detrend = FALSE

They provide completely opposite results 🤯. This makes interpretation pretty troublesome.
Running shapiro.test() on the residuals I obtain p-value = 0.8866.

Do you have any idea on how to interpret these opposite results?

This is not just a toy example, I initially obtained these results from a real study. I am not sure if these strange results are related to the characteristics of the study: small fixed effect (Marginal R2 ~ 12%), but very large random effect (conditional R2 ~ 90%), so the remaining residual variance is really small.

Below is a minimal working example to simulate data and reproduce the plots.

Click me - Snippet Code

library('tidyverse')
library('ggplot2')
library('lme4')
set.seed(2024)

# create data
df <- expand_grid(
  id = 1:20,
  trial = 1:20,
  condition = 0:1
  ) %>% 
  mutate(
    across(c(id, trial, condition), factor)
  )

contrasts(df$condition) <- contr.sum(2)

# generate random effect
ran_ef <- rnorm(20, mean = 0, sd =.45)

# get model matrix random effects
my_model <- "y ~ condition + (1 | id)"
df$y <- 1 # lFormula need y
foo <- lFormula(eval(my_model), df)
Z <- t(as.matrix(foo$reTrms$Zt))

# get model matrix fixed effects
V <- model.matrix(~ condition, data = df)

# simulate y
df$y <- as.vector(V %*% c(1.4, .15) + Z %*% ran_ef + rnorm(nrow(df), mean = 0, sd = .15))

# plot data
ggplot(df) +
  geom_point(aes(x = id, y = y, color = condition), 
             alpha = .6, position = position_jitter(width = .2))

# fit model
my_fit <- lmer(y ~ condition + (1 | id), data = df)
performance::check_model(my_fit)
performance::r2(my_fit)

performance::check_normality(my_fit)
shapiro.test(residuals(my_fit))
qqnorm(residuals(my_fit))

tmp <- performance::check_normality(my_fit)
plot(tmp, type = "qq")
plot(tmp, type = "qq", detrend = FALSE)

By the way, thank you for the amazing package❤️❤️

strengejacke · 2024-05-22T09:58:33Z

It seems like qqplotr is not installed? I get completely different plots:

library(easystats)
#> # Attaching packages: easystats 0.7.1.2
#> ✔ bayestestR  0.13.2.1    ✔ correlation 0.8.4.2  
#> ✔ datawizard  0.10.0.4    ✔ effectsize  0.8.8    
#> ✔ insight     0.19.11.2   ✔ modelbased  0.8.7    
#> ✔ performance 0.11.0.9    ✔ parameters  0.21.7.1 
#> ✔ report      0.5.8.2     ✔ see         0.8.4.1

library(lme4)
#> Loading required package: Matrix

set.seed(2024)

# create data
df <- expand.grid(
  id = 1:20,
  trial = 1:20,
  condition = 0:1
) |>
  to_factor()

contrasts(df$condition) <- contr.sum(2)

# generate random effect
ran_ef <- rnorm(20, mean = 0, sd = .45)

# get model matrix random effects
my_model <- "y ~ condition + (1 | id)"
df$y <- 1 # lFormula need y
foo <- lFormula(eval(my_model), df)
Z <- t(as.matrix(foo$reTrms$Zt))

# get model matrix fixed effects
V <- model.matrix(~condition, data = df)

# simulate y
df$y <- as.vector(V %*% c(1.4, .15) + Z %*% ran_ef + rnorm(nrow(df), mean = 0, sd = .15))

# fit model
my_fit <- lmer(y ~ condition + (1 | id), data = df)
tmp <- performance::check_normality(my_fit)
plot(tmp, type = "qq")

plot(tmp, type = "qq", detrend = FALSE)

^{Created on 2024-05-22 with reprex v2.1.0}

ClaudioZandonella · 2024-05-22T10:07:47Z

Yes, thank you for the clarification!

After installing qqplotr I obtain more reasonable (normal😅) results

Before I obtained the message

For confidence bands, please install 'qqplotr'.

I thought it was only an "additional" feature, not a requirement.

Thank you for the immediate help!

strengejacke · 2024-05-22T11:13:04Z

I thought it was only an "additional" feature, not a requirement.

Yes, usually, it should indeed only be for CIs. But qqplotr has a detrend-argument that does the transformation of the plot, and w/o that package, we have written our own detrend-functions, which obviously doesn't work well in all situations.

@mattansb any ideas?

mattansb · 2024-05-22T11:20:36Z

The CIs are an additional feature enabled by qqplotr, but they really do make possible the interpretation of the the detrended QQ plot (and the non-detrended plot as well!)

strengejacke · 2024-05-22T11:28:25Z

I meant the own detrend-code:

see/R/plot.check_normality.R

Lines 281 to 304 in 7dc72a6

    
           qq_stuff <- list( 
        
             if (detrend) { 
        
               ggplot2::geom_hline( 
        
                 yintercept = 0, 
        
                 linewidth = size_line, 
        
                 colour = colors[1], 
        
                 na.rm = TRUE 
        
               ) 
        
             } else { 
        
               ggplot2::geom_qq_line( 
        
                 linewidth = size_line, 
        
                 colour = colors[1], 
        
                 na.rm = TRUE 
        
               ) 
        
             }, 
        
             ggplot2::geom_qq( 
        
               mapping = if (detrend) ggplot2::aes(y = ggplot2::after_stat(.data$sample) - ggplot2::after_stat(.data$theoretical)), 
        
               shape = 16, 
        
               na.rm = TRUE, 
        
               stroke = 0, 
        
               size = size_point, 
        
               colour = colors[2] # "#2c3e50" 
        
             ) 
        
           )

Any ideas, why the above plot (#335 (comment)) deviates that much when qqplotr is not installed?

mattansb · 2024-05-22T12:11:01Z

Hmmmmm.... there's some scaling issue here. I'll look into it.

mattansb · 2024-05-22T12:20:09Z

This isn't 100%, but it's very close.........

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.3

g <- rnorm(20, sd = 15)

old

ggplot() + 
  stat_qq(aes(sample = g,
              y = after_stat(sample - theoretical),
              x = after_stat(theoretical)))

fix?

# Re scale?
N <- length(g)
SD <- sd(g) * sqrt((N - 1) / N)

ggplot() + 
  stat_qq(aes(sample = g,
              y = after_stat(sample - theoretical * SD),
              x = after_stat(theoretical * SD)))

similiar to qqplotr

ggplot() + 
  qqplotr::stat_qq_point(aes(sample = g), detrend = TRUE)

^{Created on 2024-05-22 with reprex v2.1.0}

strengejacke · 2024-05-22T14:36:49Z

Thanks, I'll revise the code in see.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting normality of residuals: Scaling issues / differences #335

Plotting normality of residuals: Scaling issues / differences #335

mutlusun commented Mar 27, 2024

bwiernik commented Mar 27, 2024

strengejacke commented Mar 27, 2024 •

edited

bwiernik commented Mar 27, 2024

strengejacke commented Mar 27, 2024 •

edited

mutlusun commented Mar 28, 2024

ClaudioZandonella commented May 22, 2024 •

edited

strengejacke commented May 22, 2024

ClaudioZandonella commented May 22, 2024

strengejacke commented May 22, 2024 •

edited

mattansb commented May 22, 2024

strengejacke commented May 22, 2024 •

edited

mattansb commented May 22, 2024

mattansb commented May 22, 2024

strengejacke commented May 22, 2024 •

edited

Plotting normality of residuals: Scaling issues / differences #335

Plotting normality of residuals: Scaling issues / differences #335

Comments

mutlusun commented Mar 27, 2024

bwiernik commented Mar 27, 2024

strengejacke commented Mar 27, 2024 • edited

bwiernik commented Mar 27, 2024

strengejacke commented Mar 27, 2024 • edited

mutlusun commented Mar 28, 2024

ClaudioZandonella commented May 22, 2024 • edited

strengejacke commented May 22, 2024

ClaudioZandonella commented May 22, 2024

strengejacke commented May 22, 2024 • edited

mattansb commented May 22, 2024

strengejacke commented May 22, 2024 • edited

mattansb commented May 22, 2024

mattansb commented May 22, 2024

strengejacke commented May 22, 2024 • edited

strengejacke commented Mar 27, 2024 •

edited

strengejacke commented Mar 27, 2024 •

edited

ClaudioZandonella commented May 22, 2024 •

edited

strengejacke commented May 22, 2024 •

edited

strengejacke commented May 22, 2024 •

edited

strengejacke commented May 22, 2024 •

edited