Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corner case regex colorization #678

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

tristanlatr
Copy link
Contributor

Trying to fix #668

@codecov
Copy link

codecov bot commented Jan 23, 2023

Codecov Report

Patch coverage: 90.00% and project coverage change: -0.02 ⚠️

Comparison is base (0636f0d) 92.49% compared to head (88d31d6) 92.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #678      +/-   ##
==========================================
- Coverage   92.49%   92.48%   -0.02%     
==========================================
  Files          47       47              
  Lines        8103     8131      +28     
  Branches     1935     1940       +5     
==========================================
+ Hits         7495     7520      +25     
- Misses        354      357       +3     
  Partials      254      254              
Impacted Files Coverage Δ
pydoctor/epydoc/markup/_pyval_repr.py 92.82% <90.00%> (-0.16%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tristanlatr tristanlatr marked this pull request as ready for review January 31, 2023 05:18
# branch (the compiler may optimize this even more)
subpatternappend((IN, [item[0] for item in items]))
return subpattern
# pydoctor: remove all optimizations for round-tripping issues
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to edit sre_parse at all ?
I’m definitely not sure of what I’m doing when editing this file…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might indeed need this to remove some optimizations, also we might want to add support for latest python regex features. So there are good reasons to have our own fork of sre_parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any regex guru in twisted contributors that could give me some feedback on how to remove all opmizations of our sre_parse fork and ensure we don’t break the logic ? @adiroiban

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiroiban @glyph can you help me on that ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tristanlatr what do you need here exactly? I'm not super familiar with this code, and I am not sure what the relevant optimizations are. What would give you confidence the logic isn't broken, beyond the existing test suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so basically there are several issues here. We vendor the sre_parse module from the python 3.6 standard library. First, why the 3.6 version? It is because the non capturing groups are optimized from 3.7 onwards such that the SubPattern instances cannot be converted back to the same string. Then, I suspect there are other optimizations that’ are preventing some regex to roundtrip, the changes up there is one of them. Lastly, some regex feature have been added recently to python 3.11 I believe. So we need to add support for those in our sre_parse module to provide complete colorizing for regexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would give you confidence the logic isn't broken, beyond the existing test suite?

Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)
And no, we just have our test suite to check that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not broken from a regex matching perspective, but it is from a regex colorizing perspective (since the roundtrip is impossible for some cases)

gotcha, this makes a bit more sense. My plate is honestly kind of full right now, so I don't know how much time I can dedicate to this. Maybe ping the mailing list to see if anyone is interested? Sustainable open source infrastructure is hard :(.

assert color_re(r'^<(?P<descr>.*)>$') == """r<span class="rst-variable-quote">'</span>^&lt;<span class="rst-re-group">(?P&lt;</span><span class="rst-re-ref">descr</span><span class="rst-re-group">&gt;</span>.<span class="rst-re-op">*</span><span class="rst-re-group">)</span>&gt;$<span class="rst-variable-quote">'</span>"""

@pytest.mark.xfail
def test_re_named_groups_weird() -> None:
# This regex triggers some weird behaviour: it adds the &crarr; element at the end where it should not be...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should be be fixed

linebreakok:bool = attr.ib(default=False, init=False)
warnings: List[str] = attr.ib(factory=list)

# state linked to regex colorization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These state trackers should be moved into the state object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Edge case regex pattern colorization
2 participants