Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring ML for building a more robust and scalable version of Kindly #57

Open
nathanfletcher opened this issue Jul 13, 2021 · 5 comments
Assignees

Comments

@nathanfletcher
Copy link

Looking into ways to achieve what Kindly #41 does to make it better.
This may also result in solutions that are not vendor-locked.
Maybe @nathanbaleeta may have a few ideas.

@lacabra
Copy link
Collaborator

lacabra commented Jul 19, 2021

@nathanfletcher: can you document here some of the findings? Thanks! 🙏

@nathanfletcher
Copy link
Author

nathanfletcher commented Jul 21, 2021

I will start here with the basics from discussions with @nathanbaleeta

A number of things I'll be looking into:

  • Data sourcing. Leverage Twitter API to get access to training data.

  • Natural Language Processing/ Understanding. Leverage Natural Language Toolkit(NLTK) or spaCy (open source python based natural language processing libraries) for data preprocessing before fitting the cyber bullying model.

  • Machine Learning Algorithms. Build on Scikit-learn (Open-source Python-based ML library) which ships with several implementations of ML algorithms right out of the box to build and evaluate the cyber bullying model. Explore shallow learning as a proof of concept as we try to collect enough data before embarking on deep learning methods to achieve state-of-the-art results in the long run.

  • AI/ ML technology stack: Python, Scikit learn, Pandas, NLTK, spaCy, TextBlob, Numpy, Keras, Tensorflow, Jupyter notebooks, Colab, Tensorboard & FastAPI.

@nathanbaleeta nathanbaleeta changed the title TensoFlow and other ML methods for Kindly Exploring ML for building a more robust and more scalable version of Kindly Jul 21, 2021
@nathanbaleeta nathanbaleeta changed the title Exploring ML for building a more robust and more scalable version of Kindly Exploring ML for building a more robust and scalable version of Kindly Jul 21, 2021
@nathanbaleeta
Copy link

nathanbaleeta commented Jul 26, 2021

PROBLEM DEFINITION
The use of Twitter and social networking sites (SNS) such as Facebook to communicate with one another and the world, has led to increased instances of cyberbullying, especially among teenagers. (Reference)

Twitter is an American microblogging and social networking service on which users post and interact with messages known as "tweets". Registered users can post, like, and retweet tweets, but unregistered users can only read them. (Wikipedia)

Cyberbullying is the use of information and communication technology to harass and harm in a deliberate, repetitive, and hostile manner.

Types of cyberbullying include bullying someone through social media, harassment, sexting, cyberstalking, deception, impersonation, and sending nasty messages via chat rooms and instant messenger. Here are more examples of cyberbulling.

According to Twitter demographics published by www.statista.com as of April 2021: users aged less than 24 years old were almost the 24 percent worldwide as shown below in the graphic:
statistic_id283119_twitter_-distribution-of-global-audiences-2021-by-age-group

SOLUTION
To solve this problem, we will follow the typical machine learning pipeline. We will first import the required libraries and the dataset. We will then do exploratory data analysis to see if we can find any trends in the dataset. Next, we will perform text preprocessing to convert textual data to numeric data that can be used by a machine learning algorithm. Finally, we will use machine learning algorithms to train and test our sentiment analysis models

@nathanfletcher
Copy link
Author

@lacabra This repository is where my files and practical learnings are https://github.com/nathanfletcher/ml_text_classification

@amreenp7
Copy link

@nathanfletcher to include this in documentation before closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants