`get_data.coxph` returns data without labels #790

iago-pssjd · 2023-07-08T15:41:41Z

get_data.coxph returns data without labels. As a consequence, when used for parameters::parameters, the attribute pretty_labels is not useful at all.

Indeed, in the function

insight/R/get_data.R

Lines 1827 to 1852 in a95325c

    
           get_data.coxph <- function(x, source = "environment", verbose = TRUE, ...) { 
        
             # try to recover data from environment 
        
             model_data <- .get_data_from_environment(x, source = source, verbose = verbose, ...) 
        
             if (!is.null(model_data)) { 
        
               return(model_data) 
        
             } 
        
             # fall back to extract data from model frame 
        
             # first try, parent frame 
        
             dat <- tryCatch( 
        
               { 
        
                 mf <- .recover_data_from_environment(x) 
        
                 mf <- .prepare_get_data(x, stats::na.omit(mf), verbose = FALSE) 
        
               }, 
        
               error = function(x) NULL 
        
             ) 
        
             # second try, default extractor. Less good because of coercion to other types 
        
             if (is.null(dat)) { 
        
               # second try, global env 
        
               dat <- get_data.default(x, source = source, verbose = verbose, ...) 
        
             } 
        
             dat 
        
           }

the issue happens with .prepare_get_data, where labels are removed from variables.

The text was updated successfully, but these errors were encountered:

strengejacke · 2023-07-09T16:49:10Z

labels are not removed in general inside .prepare_get_data(), maybe there's a specific issue with coxph models. Will look into this.

library(easystats)
#> # Attaching packages: easystats 0.6.0.10
#> ✔ bayestestR  0.13.1.2   ✔ correlation 0.8.4   
#> ✔ datawizard  0.8.0.3    ✔ effectsize  0.8.3.11
#> ✔ insight     0.19.3     ✔ modelbased  0.8.6.3 
#> ✔ performance 0.10.4.1   ✔ parameters  0.21.1.2
#> ✔ report      0.5.7.9    ✔ see         0.8.0.2
data(efc)
m <- lm(neg_c_7 ~ e42dep, data = efc)
str(get_data(m))
#> 'data.frame':    94 obs. of  2 variables:
#>  $ neg_c_7: num  12 20 11 12 19 15 11 15 10 28 ...
#>   ..- attr(*, "label")= chr "Negative impact with 7 items"
#>  $ e42dep : Factor w/ 4 levels "1","2","3","4": 3 3 3 4 4 4 4 4 4 4 ...
#>   ..- attr(*, "label")= chr "elder's dependency"
#>   ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#>   .. ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
str(get_data(m, source = "mf"))
#> 'data.frame':    94 obs. of  2 variables:
#>  $ neg_c_7: num  12 20 11 12 19 15 11 15 10 28 ...
#>   ..- attr(*, "label")= chr "Negative impact with 7 items"
#>  $ e42dep : Factor w/ 4 levels "1","2","3","4": 3 3 3 4 4 4 4 4 4 4 ...
#>   ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
#>   .. ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"
#>   ..- attr(*, "label")= chr "elder's dependency"
#>  - attr(*, "terms")=Classes 'terms', 'formula'  language neg_c_7 ~ e42dep
#>   .. ..- attr(*, "variables")= language list(neg_c_7, e42dep)
#>   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:2] "neg_c_7" "e42dep"
#>   .. .. .. ..$ : chr "e42dep"
#>   .. ..- attr(*, "term.labels")= chr "e42dep"
#>   .. ..- attr(*, "order")= int 1
#>   .. ..- attr(*, "intercept")= int 1
#>   .. ..- attr(*, "response")= int 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "predvars")= language list(neg_c_7, e42dep)
#>   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
#>   .. .. ..- attr(*, "names")= chr [1:2] "neg_c_7" "e42dep"
#>  - attr(*, "na.action")= 'omit' Named int [1:6] 4 27 33 46 58 97
#>   ..- attr(*, "names")= chr [1:6] "4" "27" "33" "46" ...
#>  - attr(*, "is_subset")= logi FALSE

^{Created on 2023-07-09 with reprex v2.0.2}

iago-pssjd · 2023-07-16T13:22:48Z

I should remark that get_data is called with the option source = "mf", since it is what is called here:

https://github.com/easystats/parameters/blob/71a5271a3f90c4707f67e5d2b5b07bd458ffe94b/R/format_parameters.R#L364-L373

(called by parameters:::.add_model_parameters_attributes, which is called in https://github.com/easystats/parameters/blob/71a5271a3f90c4707f67e5d2b5b07bd458ffe94b/R/1_model_parameters.R#L616-L631)

For a minimal example:

library(survival)
dat_regression_test <- data.frame(
    time = c(4, 3, 1, 1, 2, 2, 3),
    status = c(1, 1, 1, 0, 1, 1, 0),
    x = c(0, 2, 1, 1, 1, 0, 0),
    sex = c(0, 0, 0, 0, 1, 1, 1)
)
attr(dat_regression_test$x, "label") <- "Pred"
mod <- survival::coxph(Surv(time, status) ~ x + strata(sex),
                       data = dat_regression_test,
                       ties = "breslow"
)

str(get_data(mod, source = "mf"))
'data.frame':	7 obs. of  4 variables:
 $ time  : num  4 3 1 1 2 2 3
 $ status: num  1 1 1 0 1 1 0
 $ x     : num  0 2 1 1 1 0 0
 $ sex   : num  0 0 0 0 1 1 1
 - attr(*, "is_subset")= logi FALSE

For your example str(parameters(m)) includes

 - attr(*, "pretty_labels")= Named chr [1:4] "(Intercept)" "elder's dependency [slightly dependent]" "elder's dependency [moderately dependent]" "elder's dependency [severely dependent]

However, for str(parameters(mod))

- attr(*, "pretty_labels")= Named chr "x"
  ..- attr(*, "names")= chr "x"

iago-pssjd · 2023-07-29T22:35:13Z

Maybe the issue is that when calling .prepare_get_data in get_data.coxph, it is called through stats::na.omit in line 1840, which removes all labels.

iago-pssjd · 2023-08-10T13:32:27Z

So, @strengejacke why in some of the get_data methods there is a call to stats::na.omit inside .prepare_get_data and there is no in others? Is there an alternative?

strengejacke · 2023-08-14T08:38:55Z

The original idea of get_data() was to retrieve the data that was used to fit the model, matching the same number of observations (i.e. NA removed). Meanwhile, since there are so many edge cases, and because for updating the model or calculating predictions it's not necessary to remove missings, the default now is to retrieve the data from the environment, i.e. the original data. When this doesn't work, get_data() falls back to retrieving data from the model frame.

strengejacke · 2023-08-14T08:40:23Z

However, for str(parameters(mod))

Yes, but that data isn't labelled, so no surprise here?

iago-pssjd · 2023-08-14T08:58:53Z

@strengejacke The issue is that stats::na.omit removes the labels. Replacing it by tidyr::drop_na solves the issue, but I know you do not use dependencies and I did not find any other base way to remove the missings keeping the labels (beyond copying the labels and pasting them after removing missings).

Yes, but that data isn't labelled, so no surprise here?

Wrong, it is labelled, since previously I had done

attr(dat_regression_test$x, "label") <- "Pred"

strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`get_data.coxph` returns data without labels #790

`get_data.coxph` returns data without labels #790

iago-pssjd commented Jul 8, 2023 •

edited

strengejacke commented Jul 9, 2023

iago-pssjd commented Jul 16, 2023 •

edited

iago-pssjd commented Jul 29, 2023

iago-pssjd commented Aug 10, 2023

strengejacke commented Aug 14, 2023

strengejacke commented Aug 14, 2023

iago-pssjd commented Aug 14, 2023 •

edited

get_data.coxph returns data without labels #790

get_data.coxph returns data without labels #790

Comments

iago-pssjd commented Jul 8, 2023 • edited

strengejacke commented Jul 9, 2023

iago-pssjd commented Jul 16, 2023 • edited

iago-pssjd commented Jul 29, 2023

iago-pssjd commented Aug 10, 2023

strengejacke commented Aug 14, 2023

strengejacke commented Aug 14, 2023

iago-pssjd commented Aug 14, 2023 • edited

`get_data.coxph` returns data without labels #790

`get_data.coxph` returns data without labels #790

iago-pssjd commented Jul 8, 2023 •

edited

iago-pssjd commented Jul 16, 2023 •

edited

iago-pssjd commented Aug 14, 2023 •

edited