Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

Tag privacy Violators #3347

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

SepehrRasouli
Copy link

@SepehrRasouli SepehrRasouli commented Aug 19, 2022

What does this PR do?

This PR adds the ability to tag privacy violating websites, Fixing Issue #1987. I grabbed the ASN data from here, I also used IPWhois as it was used in here. I used a dictionary as a cache so the IPWhois queries wouldn't slow down the searches drastically.
I also changed the templates to show the tagged websites correctly.

Why is this change important?

This PR is motivated by Issue #1987.

How to test this PR locally?

make run

Related issues

Closes #1987

@SepehrRasouli SepehrRasouli changed the title Feature/tag privacy violators Tag privacy Violators Aug 19, 2022
@unixfox
Copy link
Member

unixfox commented Sep 8, 2022

When doing a search this adds some latency for each new IP address right? I mean, it has to do a WHOIS before rendering the results to the user screen.

If so, IMO this feature just be configurable and disabled by default. It already takes quite an amount of time to do the requests to the configured engines, so adding even more latency will degrade the user experience.

@SepehrRasouli
Copy link
Author

I've implemented some kind of caching so when the user sends a query and sees some results, the privacy violators' URLs would be cached, not adding more latency the next time the user sends another query.
But i think adding this a s a feature, disabled by default would be good.

@unixfox
Copy link
Member

unixfox commented Sep 9, 2022

I've implemented some kind of caching so when the user sends a query and sees some results, the privacy violators' URLs would be cached, not adding more latency the next time the user sends another query. But i think adding this a s a feature, disabled by default would be good.

Adding some caching is minor in a case where the URL is always going to be different, when the user search on SearX it is very unlikely it will always search for the same exact thing. Also, most of the engines give different result URLs when you search for the same keyword.

The cache will hardly ever get used.

On top of that, due to the cache, this feature may decrease the privacy of the user because literally every search the user is doing is going to be saved in memory.

As discussed with @dalf about this feature, the best option for the WHOIS is to instead use a local database like this one: https://lite.ip2location.com/database-asn. But it will require a manual intervention from the user in order to setup this database. But using this database will be so much quicker to access than doing a WHOIS request to an external server.
This same WHOIS server can also log every website that you get in the results on SearX, harming even more the privacy of the user.

@SepehrRasouli
Copy link
Author

I agree with you, caching will harm the user's privacy and i will remove it.
But what do you think should we do about this ? I added it as a preference, disabled by default, so the requests wouldn't take long by default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filtering or tagging Cloudflare and other Reverse Proxy hoster
2 participants