-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plotting normality of residuals: Scaling issues / differences #335
Comments
The second plot shows a very strong deviation. Those tails are much wider than expected under a normal distribution. The first plot better shows the amount of the deviation by removing the angular axis. |
This issue is related to easystats/performance#643 (comment).
It does not. It's just the scale that "zooms in" for That said, it may be surprising to users to see that large differences. Maybe we find a way to adjust the y-axis, so the differences are not that large. |
We set the non-detrended y axis limits to fixed values if I recall correctly. We should do the same with appropriate values for the detrended y axis. |
No, I think for both plots, we don't set limits, thus ggplot2 decides which range to use, based on the range of the data. Line 214 in 1a090e1
The question is if it's possible to find reasonable y-limits for the detrended plot? |
Thank you for your replies! Sorry, for being sloppy in my description above. As both of you pointed out, the tails show a strong deviation and both plots show the same thing but the scaling is different. I am aware of that. I was, for example, more concerned with the range of -1 to 1 of the standard normal distribution quantiles which could be interpreted differently between both plots (when not paying attention to the limits of the y-axes).
I have thought about that a while but I think it is hard to come up with reasonable y-limits. Is there a good rule of thumb for deciding when there is a deviation? (This way minimum y-limits could be set. But, I didn't find one and it is probably misleading to decide for one.) As an alternative: Would it be possible to set |
It seems like library(easystats)
#> # Attaching packages: easystats 0.7.1.2
#> ✔ bayestestR 0.13.2.1 ✔ correlation 0.8.4.2
#> ✔ datawizard 0.10.0.4 ✔ effectsize 0.8.8
#> ✔ insight 0.19.11.2 ✔ modelbased 0.8.7
#> ✔ performance 0.11.0.9 ✔ parameters 0.21.7.1
#> ✔ report 0.5.8.2 ✔ see 0.8.4.1 library(lme4)
#> Loading required package: Matrix set.seed(2024)
# create data
df <- expand.grid(
id = 1:20,
trial = 1:20,
condition = 0:1
) |>
to_factor()
contrasts(df$condition) <- contr.sum(2)
# generate random effect
ran_ef <- rnorm(20, mean = 0, sd = .45)
# get model matrix random effects
my_model <- "y ~ condition + (1 | id)"
df$y <- 1 # lFormula need y
foo <- lFormula(eval(my_model), df)
Z <- t(as.matrix(foo$reTrms$Zt))
# get model matrix fixed effects
V <- model.matrix(~condition, data = df)
# simulate y
df$y <- as.vector(V %*% c(1.4, .15) + Z %*% ran_ef + rnorm(nrow(df), mean = 0, sd = .15))
# fit model
my_fit <- lmer(y ~ condition + (1 | id), data = df)
tmp <- performance::check_normality(my_fit)
plot(tmp, type = "qq") plot(tmp, type = "qq", detrend = FALSE) Created on 2024-05-22 with reprex v2.1.0 |
Yes, thank you for the clarification! After installing Before I obtained the message
I thought it was only an "additional" feature, not a requirement. Thank you for the immediate help! |
Yes, usually, it should indeed only be for CIs. But qqplotr has a detrend-argument that does the transformation of the plot, and w/o that package, we have written our own detrend-functions, which obviously doesn't work well in all situations. @mattansb any ideas? |
The CIs are an additional feature enabled by qqplotr, but they really do make possible the interpretation of the the detrended QQ plot (and the non-detrended plot as well!) |
I meant the own detrend-code: Lines 281 to 304 in 7dc72a6
Any ideas, why the above plot (#335 (comment)) deviates that much when qqplotr is not installed? |
Hmmmmm.... there's some scaling issue here. I'll look into it. |
This isn't 100%, but it's very close......... library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.3 g <- rnorm(20, sd = 15) old ggplot() +
stat_qq(aes(sample = g,
y = after_stat(sample - theoretical),
x = after_stat(theoretical))) fix? # Re scale?
N <- length(g)
SD <- sd(g) * sqrt((N - 1) / N)
ggplot() +
stat_qq(aes(sample = g,
y = after_stat(sample - theoretical * SD),
x = after_stat(theoretical * SD))) similiar to qqplotr ggplot() +
qqplotr::stat_qq_point(aes(sample = g), detrend = TRUE) Created on 2024-05-22 with reprex v2.1.0 |
Thanks, I'll revise the code in see. |
Dear package authors,
Thank you very much for all your work and effort for the easystats packages. I like and use them a lot and they make my life definitely easier.
Recently, I stumbled upon a scaling "issue" with a data set consisting of two time points. Here is a small example for examination:
When fitting a regression model to this data and plotting the residuals, I get the impression that something is off here:
However, plotting in a different way doesn't show such a strong deviation:
First of all, I'm not sure whether these differences in visualization are an "issue" of this package. When looking at the y-axis in the first plot, it is clear that the deviations don't have a strong magnitude. However, I still find it a bit strange that the same function suggests different interpretations.
Would it be an option to set certain minimal limits for the plot in the case of
detrend = TRUE
? I'm happy to implement this but wanted to ask for your thoughts before.Note: This is not such an issue with
qqplotr
installed as the bands are often very wide and the interval wider.The text was updated successfully, but these errors were encountered: