Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: assert_cols() #121

Open
billdenney opened this issue Feb 23, 2023 · 2 comments
Open

Feature Request: assert_cols() #121

billdenney opened this issue Feb 23, 2023 · 2 comments

Comments

@billdenney
Copy link
Contributor

I often want to do a set of checks on the columns of data.frames before doing checks on the values within each column itself.

For example, I want to check that all columns are present, and rather than use the has_names() function with verify(), I'd like the output to specify what column or columns are missing. Similarly, I use verify(is.numeric(numeric_column_1)) %>% verify(is.numeric(numeric_column_2)) when a cleaner report would look more like assert(is.numeric, numeric_column_1, numeric_column_2).

What would you think about an assert_cols() function?

@tonyfischetti
Copy link
Owner

I really like that idea. I'd like to make sure the semantics operate a lot like assert. Would it be like assert but on the vector of column names?

I think a lot of great functionality could come from assert_cols. Like checking whether there are no

  • duplicate column names
  • column names contain no ridiculous characters
  • missing columns (like you said)
  • data type (like you said)
  • fits a regex pattern
  • etc...

@billdenney
Copy link
Contributor Author

Yeah, that covers a lot of the space I was thinking of. Generally, I'm thinking that it would be used two few different ways:

  1. On the column names (duplicate names, character check, missing columns, name regexp)
  2. On the column overall (data type is the main thing that I see here as the regexp pattern of values within the column seems like it would be handled by assert(), but if you wanted the answer for the column name instead of the row, then the regexp method could apply here, too.)

I see the above two as different ways to use the data, so I'd think they would either be two different functions (e.g. assert_col_names() and assert_col(), my preference) or one function with two modes of use (e.g. assert_col(..., assert_type = c("names", "values"))).

FYI, https://sfirke.github.io/janitor/reference/clean_names.html can help a lot with rational column naming, but it is a correction function rather than a checking function.

What do you think? Are there other use cases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants