Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce usage of perl = TRUE in regex, where possible #663

Open
strengejacke opened this issue Oct 10, 2022 · 5 comments
Open

reduce usage of perl = TRUE in regex, where possible #663

strengejacke opened this issue Oct 10, 2022 · 5 comments
Labels
low priority 😴 This issue can be easily workaround or happens only in edge cases

Comments

@strengejacke
Copy link
Member

Currently, we use perl = TRUE in several instances where we have regular expression. However, this is very slow. If possible, we should remove it. Can anyone help judging the patterns and see if perl = TRUE is really needed?

https://github.com/easystats/insight/search?q=%22perl+%3D+TRUE%22

@IndrajeetPatil @bwiernik @vincentarelbundock

@vincentarelbundock
Copy link
Contributor

Are any of these calls executed thousands of times? Otherwise, even if a micro benchmark shows that perl regexes are an order of magnitude slower, noone will ever notice.

I'm super into performance improvement and think there are probably lots of low-hanging fruits in easystats, but it's usually best to start with profiling to avoid premature optimization, and loss of functionality.

@strengejacke strengejacke added low priority 😴 This issue can be easily workaround or happens only in edge cases and removed medium priority 🚶 labels Oct 10, 2022
@strengejacke
Copy link
Member Author

Yeah, probably you're right. I think the find_*() functions are called multiple times. And I have already replaced a lot of instances, e.g. also using startsWith() / endsWith() or adding fixed = TRUE. Maybe in total, this might give a noticeable difference?

@strengejacke
Copy link
Member Author

strengejacke commented Oct 10, 2022

In general, I'm curious if patterns like this one "^(?!sd_|cor_)(.*)" works w/o perl = TRUE?

@vincentarelbundock
Copy link
Contributor

In general, I'm curious if patterns like this one "^(?!sd_|cor_)(.*)" works w/o perl = TRUE?

Ya, I'm pretty sure that look-ahead and look-behinds do not work with the base R regexes, so we need perl=TRUE in those cases.

@strengejacke
Copy link
Member Author

strengejacke commented Oct 10, 2022

Ok. But maybe here:

.grep_non_smoothers <- function(x) {
  grepl("^(?!(s\\())", x, perl = TRUE) &
    # this one captures smoothers in zi- or mv-models from gam
    grepl("^(?!(s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(ti\\())", x, perl = TRUE) &
    grepl("^(?!(te\\())", x, perl = TRUE) &
    grepl("^(?!(t2\\())", x, perl = TRUE) &
    grepl("^(?!(gam::s\\())", x, perl = TRUE) &
    grepl("^(?!(gam::s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(VGAM::s\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::s\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::ti\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::te\\())", x, perl = TRUE) &
    grepl("^(?!(brms::s\\())", x, perl = TRUE) &
    grepl("^(?!(brms::t2\\())", x, perl = TRUE) &
    grepl("^(?!(smooth_sd\\[))", x, perl = TRUE)
}

we could instead use !startsWith()? (at least for some of those expressions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority 😴 This issue can be easily workaround or happens only in edge cases
Projects
None yet
Development

No branches or pull requests

2 participants