-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't checks for valid termination #13
Comments
Hi thank you for this note. I will think about it. So it means add more logic and check for these end characters only if the URL does not have query part. |
Oh [] were just meant to enclose the characters. The invalid symbols are ".,?!-" (Quotes not included). Sorry for the misunderstanding. This is as per my research done. Could be wrong 😅 |
OK, thanks. I will look to all your reported issues. |
Hi @MacBox7, I think that I am still not able to say, from the example above if ',' or '.' characters should or should not be part of the URL. Thank you! |
In the meantime, one can hack (at least commas) via
Perhaps sensible default is treating unconventional special characters as forbidden in url and adding a nicer constructor argument to allow to configure that if someone really wants them in URL? |
URL regex pattern is introduced in e39e5ee commit. For extracting URLs from `Email` content, the pattern preforms better than the URLExtract because the syntax for links is well defined (.md syntax) and URLExtract has problems with termination, see lipoja/URLExtract#13.
For the following input:
The output generated is:
['http://httpbin.org/status/204,', 'http://httpbin.org/status/204.']
The set
[.,?!-]
are not valid terminal symbols for the url and thus should be checked.The text was updated successfully, but these errors were encountered: