Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find words using a word regex, rather than splitting on what is not #15

Closed
wants to merge 1 commit into from
Closed

Find words using a word regex, rather than splitting on what is not #15

wants to merge 1 commit into from

Conversation

foolip
Copy link

@foolip foolip commented Oct 15, 2012

This helped when finding typos in HTML, like tihs. An
alternative fix is of course to include <> in the split regex.

This helped when finding typos in HTML, like <b>tihs</b>. An
alternative fix is of course to include <> in the split regex.
@lyda
Copy link
Owner

lyda commented Nov 14, 2012

I really need to run through this to address UTF8 issues in general. There are words that could be in the misspell list but are not due to UTF8 issues. I filed #16 to remind myself to dig into that some weekend.

In the meantime, I've submitted 475fe97 which should help with HTML, XML and things like it (apache config files spring to mind). Beyond your example, every single word next to a tag boundary was going unchecked which is annoying.

@lyda lyda closed this Mar 3, 2013
@foolip
Copy link
Author

foolip commented Mar 4, 2013

Thanks for looking into this!

@lyda
Copy link
Owner

lyda commented Mar 4, 2013

No prob. I do want to learn more about utf8 in python so I will get to this. I hope the change I made is sufficient for the delimiters. I've released a new version that has this change so pip install --upgrade misspellings should grab it. Thanks for taking the time to give feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants