Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protection agains spam #299

Open
sirex opened this issue Dec 29, 2020 · 10 comments
Open

Protection agains spam #299

sirex opened this issue Dec 29, 2020 · 10 comments

Comments

@sirex
Copy link

sirex commented Dec 29, 2020

I just finished cleaning up spam users. It looks, that Spirit does not have any protection against Spam, because I had to clean about 6000 spam users, with random user names and random emails.

I will try to hack something, to add protection against Spam, but Spirit should definitely have this built in.

Also, since Spirt does not have good moderation tools, I had to delete spam users directly for Python shell, but now I have incorrect comment numbers, errors when Spirit tries to jump to a page which no longer exists. So it would be nice to have a script, that would update all that.

@sirex
Copy link
Author

sirex commented Dec 29, 2020

Currently I used following script that generates a Python code, that deletes users with all their content:

from django.contrib.auth.models import User
from textwrap import wrap


lines = []
for user in User.objects.order_by('-date_joined')[:100]:
    lines += ['', f"{user.pk:>6},  # {user.username:<20} {user.email:<40}"]
    last_comment = getattr(user.st_comments.order_by('-date').first(), 'comment', None)
    if last_comment:
        lines += [''] + wrap(
            last_comment,
            initial_indent=(' ' * 8) + ' # ',
            subsequent_indent=(' ' * 8) + ' # ',
            max_lines=8,
            width=72,
        )

lines = '\n'.join(lines)
print(f"\nUser.objects.filter(pk__in=[\n{lines}\n]).delete()\n\n")

I just review generated script, remove all non-spam users and run this code.

@nitely
Copy link
Owner

nitely commented Dec 29, 2020

Deleting users is probably never a good idea. They can just register again with the same email. You should deactivate their account instead. Hard deleting topics, and comments will break notifications, bookmarks, and possibly other things.

There is a very simple registration protection, that may help against bots, but humans will bypass any protection anyway. There are a few things that may help, like not allowing new users to post links, or having a queue of messages that trusted users/mods can review and approve (ala stackoverflow); but things a like captchas are annoying and useless against humans.

There should be a way to soft delete all topics/comments by user.

I wonder, what kind of spam did you get?

@sirex
Copy link
Author

sirex commented Dec 29, 2020

Added reCapcha, will see if it helps: sirex/ubuntu.lt@bb00c03...959453e

Spam is generated no by humans, but by bots, some how they are easily able to go through all the email verification. And email addresses are not issue, they use a random email every time.

Here is a few examples:

username             email
MartinHoike          hr3nod@yandex.com
Ererticeque          pzodoweecebipstisearm@creditreportsps.com
SpalSauro            nhomoweecebipstisearm@fastcheckcreditscores.com
NopdiepZicietS       fvjekweecebipstisearm@creditscoreusd.com
Unilypespoiziz       imoojweecebipstisearm@creditscorewww.com
prathyantash         lwiysweecebipstisearm@creditreportspa.com
bragreetweda         fhtfxweecebipstisearm@creditscorests.com
absoluteweddingstudio vishalsh4325@gmail.com
GonnerturneCep       yklifweecebipstisearm@creditscorecheckww.com
JasonMix             garrettbranchgolf@yahoo.com

Quite quickly I reached the point, that most users are spam users, more precisely there were about 6000 real users and more than 6000 spam users.

And regarding messages, some topics have 3 real user posts, and 1000 fake spam user posts. If I would mark those posts as deleted I would see endless pages of deleted posts.

So this is the case, where most of the content is generated by fake spam users and there is no point keeping that spam generated content in database.

I hope reCaptcha will help. And next thing, to fix some how incorrect comment counts and redirects to a in-existing page.

Forum in question is https://ubuntu.lt/.

Registration form with reCaptcha looks like this: https://ubuntu.lt/user/register/

@sirex
Copy link
Author

sirex commented Dec 29, 2020

I would be really surprised, if all this spam would be generated by real humans. My guess, that spam bots just became very sophisticated. And without serious anti-spam protection, they can ruin a community forum in days. And ubuntu.lt community forum exists for more than 15 years.

@nitely
Copy link
Owner

nitely commented Dec 29, 2020

Registration form with reCaptcha looks like this: https://ubuntu.lt/user/register/

Am I supposed to see the captcha right away? I see there is a "captcha" label, but that's it... there is no captcha.

I would be really surprised, if all this spam would be generated by real humans.

If they are bots then the captcha should help. Let me know how it goes, it may be worth to add it as an optional feature.

And regarding messages, some topics have 3 real user posts, and 1000 fake spam user posts. If I would mark those posts as deleted I would see endless pages of deleted posts.

That's a good point.

@sirex
Copy link
Author

sirex commented Dec 29, 2020

Am I supposed to see the captcha right away? I see there is a "captcha" label, but that's it... there is no captcha.

There are several reCaptcha versions, I'm using latest v3, which some how detects bots automagically without showing an image or something like that. Older versions shows some images and asks to enter what is in that image.

@sirex
Copy link
Author

sirex commented Dec 29, 2020

I think, I managed to fix comment count and last active date for topics with this query:

from spirit.topic.models import Topic
from spirit.comment.models import Comment
from django.db.models import Case, When, Value, Exists, OuterRef, Subquery, Count, Max


Topic.objects.update(
    comment_count=Case(
        When(
            Exists(
                Comment.objects.
                filter(
                    topic_id=OuterRef('id'),
                    is_removed=False,
                    action=Comment.COMMENT,
                )
            ),
            then=Subquery(
                Comment.objects.
                filter(
                    topic_id=OuterRef('id'),
                    is_removed=False,
                    action=Comment.COMMENT,
                ).
                values('topic_id').
                order_by('topic_id').
                annotate(comment_count=Count('*')).
                values('comment_count')[:1]
            ),
        ),
        default=Value(0),
    ),
    last_active=Subquery(
        Comment.objects.
        filter(topic_id=OuterRef('id')).
        values('topic_id').
        order_by('topic_id').
        annotate(last_active=Max('date')).
        values('last_active')[:1]
    ),
)

@sirex
Copy link
Author

sirex commented Dec 29, 2020

Last fix with bookmarks pointing to a comment number, that no longer exists. The fix is not perfect, it only ensures, that comment number is not greater than total number of comments on a topic. So this does not guarantee, that bookmark points to correct last seen comment, but it ensures, that does not end up on a 404 page, when comment number points to a non existing comment.

The query used was:

from spirit.comment.bookmark.models import CommentBookmark
from django.db.models import Case, When, OuterRef, Subquery, Count, F, PositiveIntegerField


CommentBookmark.objects.update(
    comment_number=Subquery(
        CommentBookmark.objects.
        filter(id=OuterRef('id')).
        values('user_id', 'topic_id', 'comment_number').
        order_by('user_id', 'topic_id', 'comment_number').
        annotate(
            total_comments=Count('topic__comment'),
        ).
        annotate(
            comment_number_=Case(
                When(
                    comment_number__gt=F('total_comments'),
                    then=F('total_comments'),
                ),
                default=F('comment_number'),
                output_field=PositiveIntegerField(),
            ),
        ).
        values('comment_number_')[:1],
    ),
)

@sirex
Copy link
Author

sirex commented Dec 29, 2020

So in summary, in order to improve protection against spam Spirit needs following things:

  • Email verification no longer protects against spam bots, so Spirit should also provide other anti spam tools, for example new user confirmation, a question challenge for new users or integration with external anti spam services like reCaptcha.
  • There should be a separate new users moderation page, where moderators could see all new users and approved registration or mark users as spam. If user is marked as spam, then all users comments are also marked as spam. Currently /st/admin/user/ does not have anything like this, it does not even have a link to all user comments, the only option to go manually though all user comments and remove then one by one, which in my case, would take forever.
  • If comment is marked as spam, it should not show up in the topic, it should completely disappear, decreasing bookmark number, topic comment count and last active date.

If these features would be available, then spam bots would no be able to attack the forum at such a massive scale as it happened with ubuntu.lt community forum.

@sirex
Copy link
Author

sirex commented Feb 19, 2021

Now, more than a month has passed, and during that time, I found 4 new spam users, it looks, that at least two of them were manually created users. So it looks reCAPTCHA did the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants