Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_data_extracts() gets less data for rows_distinct() than for col_vals_*()⁠ #475

Open
mayeulk opened this issue May 2, 2023 · 1 comment

Comments

@mayeulk
Copy link
Contributor

mayeulk commented May 2, 2023

Description

get_data_extracts() behaves differently for validation functions of the form ⁠col_vals_*()⁠, conjointly() and rows_distinct()
For rows_distinct, the tibble contains only the tested columns, contrary to the other functions.

Reproducible example

library(pointblank)
library(dplyr)

tbl <- tibble(id1=1:5,
                     id2=c("A", "b", "C", "D", "E"),
                     a = c(8, 8, 8, 5, 9),
                     b = c(11,11:14),
                     date = as.Date(paste0("2023-01-0",1:5)))

# The columns or set of columns that need to be displayed
# (to help identify the row) along the column with invalid value
id_columns <- c("id1", "id2")

agent <-
  create_agent(
    tbl = tbl,
    tbl_name = "small_table",
    label = "An example."
  ) %>%
  col_vals_gt(columns = vars(a), value = 6) %>%
  col_vals_gt(columns = vars(b), value = 11) %>%
  col_vals_regex(columns = vars(id2), regex = "[A-Z]")   %>%
  rows_distinct(columns = vars(a))  %>%
  rows_distinct(columns = c("b"))  %>%
  rows_distinct(columns = c("a", "b"))  %>%
  conjointly(
    ~ col_vals_lt(., columns = vars(a), value = 7),
    ~ col_vals_gt(., columns = vars(a), value = vars(b)))   %>%
  col_is_date(columns =  "date") %>%
  interrogate()

agent

agent %>% get_agent_report(display_table = FALSE)

# Loop over each step and display a selection of columns from failing rows
for (c_step in 1:nrow(get_agent_report(agent, display_table = F))){
  print("====================")
  get_agent_x_list(agent, i = c_step)$briefs %>% print
  print(c("current step: ", c_step))
  get_agent_x_list(agent, i = c_step)$columns %>% print
  columns_to_display <- unique c((id_columns, get_agent_x_list(agent, i = c_step)$columns ))
  get_data_extracts(agent, i=c_step)  %>%
    select(columns_to_display)  %>%  # comment this line out to see the result
    print
}

Expected result

For the col_vals_*()⁠ and conjointly() function, get_data_extracts() returns all columns, which allows further selection of the columns one wishes to keep for display.
However, for rows_distinct(columns=vars(a)), only the 'a' column remains. I do not know of a way to get the full rows for the failing rows with pointblank.
Using agent %>% get_agent_report(display_table = TRUE), the same issue holds for the "CSV" buttons.

In the example, we want two columns, id1 and id2, to be displayed (to help identify the failing row) along with the column with invalid values.
Commenting out the following line in the code above helps see the difference in behaviour:
select(columns_to_display) %>%

Session info

sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.04

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8
[4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.1.2 pointblank_0.11.4

loaded via a namespace (and not attached):
[1] rstudioapi_0.14 xml2_1.3.3 magrittr_2.0.3 tidyselect_1.2.0 gt_0.9.0 R6_2.5.1
[7] rlang_1.1.0 fastmap_1.1.1 fansi_1.0.4 tools_4.2.2 xfun_0.39 utf8_1.2.3
[13] blastula_0.3.3 cli_3.6.1 withr_2.5.0 commonmark_1.9.0 htmltools_0.5.5 digest_0.6.31
[19] tibble_3.2.1 lifecycle_1.0.3 crayon_1.5.2 sass_0.4.5 base64enc_0.1-3 vctrs_0.6.2
[25] glue_1.6.2 compiler_4.2.2 pillar_1.9.0 generics_0.1.3 markdown_1.6 pkgconfig_2.0.3

@mayeulk
Copy link
Contributor Author

mayeulk commented May 3, 2023

I edited my example, which missed c(...) in columns_to_display <- unique (c(id_columns, get_agent_x_list(agent, i = c_step)$columns ))

@rich-iannone rich-iannone added this to the v0.12.0 milestone Oct 28, 2023
@rich-iannone rich-iannone modified the milestones: v0.12.0, v0.13.0 Feb 20, 2024
@rich-iannone rich-iannone changed the title get_data_extracts() gets less data for rows_distinct() than for col_vals_*()⁠ get_data_extracts() gets less data for rows_distinct() than for col_vals_*()⁠ Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants