reduce usage of `perl = TRUE` in regex, where possible #663

strengejacke · 2022-10-10T16:48:35Z

Currently, we use perl = TRUE in several instances where we have regular expression. However, this is very slow. If possible, we should remove it. Can anyone help judging the patterns and see if perl = TRUE is really needed?

https://github.com/easystats/insight/search?q=%22perl+%3D+TRUE%22

@IndrajeetPatil @bwiernik @vincentarelbundock

The text was updated successfully, but these errors were encountered:

vincentarelbundock · 2022-10-10T16:52:56Z

Are any of these calls executed thousands of times? Otherwise, even if a micro benchmark shows that perl regexes are an order of magnitude slower, noone will ever notice.

I'm super into performance improvement and think there are probably lots of low-hanging fruits in easystats, but it's usually best to start with profiling to avoid premature optimization, and loss of functionality.

strengejacke · 2022-10-10T17:10:31Z

Yeah, probably you're right. I think the find_*() functions are called multiple times. And I have already replaced a lot of instances, e.g. also using startsWith() / endsWith() or adding fixed = TRUE. Maybe in total, this might give a noticeable difference?

strengejacke · 2022-10-10T17:11:52Z

In general, I'm curious if patterns like this one "^(?!sd_|cor_)(.*)" works w/o perl = TRUE?

vincentarelbundock · 2022-10-10T17:13:26Z

In general, I'm curious if patterns like this one "^(?!sd_|cor_)(.*)" works w/o perl = TRUE?

Ya, I'm pretty sure that look-ahead and look-behinds do not work with the base R regexes, so we need perl=TRUE in those cases.

strengejacke · 2022-10-10T17:15:20Z

Ok. But maybe here:

.grep_non_smoothers <- function(x) {
  grepl("^(?!(s\\())", x, perl = TRUE) &
    # this one captures smoothers in zi- or mv-models from gam
    grepl("^(?!(s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(ti\\())", x, perl = TRUE) &
    grepl("^(?!(te\\())", x, perl = TRUE) &
    grepl("^(?!(t2\\())", x, perl = TRUE) &
    grepl("^(?!(gam::s\\())", x, perl = TRUE) &
    grepl("^(?!(gam::s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(VGAM::s\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::s\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::s\\.\\d\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::ti\\())", x, perl = TRUE) &
    grepl("^(?!(mgcv::te\\())", x, perl = TRUE) &
    grepl("^(?!(brms::s\\())", x, perl = TRUE) &
    grepl("^(?!(brms::t2\\())", x, perl = TRUE) &
    grepl("^(?!(smooth_sd\\[))", x, perl = TRUE)
}

we could instead use !startsWith()? (at least for some of those expressions)

strengejacke added the medium priority 🚶 label Oct 10, 2022

strengejacke added low priority 😴 This issue can be easily workaround or happens only in edge cases and removed medium priority 🚶 labels Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce usage of `perl = TRUE` in regex, where possible #663

reduce usage of `perl = TRUE` in regex, where possible #663

strengejacke commented Oct 10, 2022

vincentarelbundock commented Oct 10, 2022

strengejacke commented Oct 10, 2022

strengejacke commented Oct 10, 2022 •

edited

vincentarelbundock commented Oct 10, 2022

strengejacke commented Oct 10, 2022 •

edited

reduce usage of perl = TRUE in regex, where possible #663

reduce usage of perl = TRUE in regex, where possible #663

Comments

strengejacke commented Oct 10, 2022

vincentarelbundock commented Oct 10, 2022

strengejacke commented Oct 10, 2022

strengejacke commented Oct 10, 2022 • edited

vincentarelbundock commented Oct 10, 2022

strengejacke commented Oct 10, 2022 • edited

reduce usage of `perl = TRUE` in regex, where possible #663

reduce usage of `perl = TRUE` in regex, where possible #663

strengejacke commented Oct 10, 2022 •

edited

strengejacke commented Oct 10, 2022 •

edited