-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow regex under certain circumstances involving \b #8166
Comments
This is likely an optimization that standard ruby has implemented in their regex engine that we are missing. It would be an enormous help if you can figure out what they changed! You might try checking their bug tracker to see if you can find a similar report. |
@lopex That would make sense. Along with that change came the |
@headius @lopex and I briefly looked at how onigmo did linear_time? and the method they use is way more than what is needed for just that feature. The method it calls does quite a bit more for what I guess for caching logic mentioned above? I think we determined it really only is non-linear if we push a StackEntry (which is not very many joni opcodes). @lopex you can correct me if that is wrong. In any case this if this issue does involve regexp cache then perhaps we do need the whole method :) |
Environment:
Linux 6.5.0-26-generic #26~22.04.1-Ubuntu
Consider following minimal example:
Here are timing results (in seconds) for jruby-9.4.6.0 (very similar timings in current jruby-head):
For ruby 3.1.3:
For ruby 3.2.1:
So the issue is that for certain (but not all) regex employing \b there are drastic performance issues when run against multi line strings consisting of a lot of words. Issue does not happen if instead of \b \w is used. At first, I thought it may be related to the arabic characters, but it is not, happening also on a purely ASCII string. The issue is present also in regular ruby, but on 3.2 it seems fixed. I haven't found a matching ruby lang issue, if anyone knows I would be interested.
Question for me is if this will be fixed "automatically", once jruby targets ruby 3.2 compatibility (e.g. if the underlying regex engine is purely implemented in ruby), or if a separate fix is needed?
The text was updated successfully, but these errors were encountered: