Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename extracted as URL #43

Open
Larrax opened this issue Jun 24, 2019 · 4 comments
Open

Filename extracted as URL #43

Larrax opened this issue Jun 24, 2019 · 4 comments
Labels

Comments

@Larrax
Copy link

Larrax commented Jun 24, 2019

From the following input (which is a legit archive filename):

PAYMENT EUR 1,420.00.zip

URL is extracted using find_urls:

1,420.00.zip
@voldmar
Copy link

voldmar commented Oct 7, 2019

.zip is the valid TLD, according to public suffix list. But the comma should not be the the part of domain I guess

@gleb-shnshn
Copy link

#47 fixed

@lipoja
Copy link
Owner

lipoja commented Oct 26, 2019

First of all thank you for your time working on the patch. However I am not quite sure about the fix, please have a look on my comment: #47 (comment)

And lets discuss this topic a bit maybe we can agree on something.
Thank you!

@lipoja lipoja linked a pull request Mar 24, 2020 that will close this issue
@jayvdb jayvdb mentioned this issue Apr 5, 2020
@lipoja lipoja added the low label Apr 11, 2020
@andreys42
Copy link

andreys42 commented May 7, 2024

as growth point : you can add some probability in extraction algorith, for example to decrease false positive rate you can use frequency of TLD usage as attribute (weight), so that probability of detected domain (for example Apple.Inc) would be lower than one that uses popular TLD (Apple.com/net/com and so on).
Finally, users can choose threshold of this prob value that fits the best their purposes ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants