Skip to content

Project to analyse text streams (tweets or docs) using big data and machine learning. Uses Apache Spark to built textual metrics, then processes the text via various classification models to evaluate the sentiment (models via SciKit-Learn).

License

Notifications You must be signed in to change notification settings

paulknewton/twitter-ml

Repository files navigation

pypi Build Status codecov pyup python3 Documentation Status

DeepSource

Welcome to TwitterML

Project to analyse text streams (tweets or docs) using big data and machine learning. Uses Apache Spark to built textual metrics, then processes the text via various classification models to evaluate the sentiment (models via SciKit-Learn).

waffle

wordcloud

learning_curve

roc_kfolds

Features

  • Classifier Builder - standalone tool to configure classifiers and train them using pre-classified samples
  • Text Classify - a standalone program for classifying the sentiment of text using NLTK and SciKit-Learn classifiers
  • Document Scanner - a program for classifying text documents on the Spark platform
  • Twitter-Kafka Publisher - reads tweets from Twitter and pumps them into a Kafka server (where they can be consumed by out Twitter Consumer programs).
  • Twitter Analyser - reads tweets from Kafka and performs analysis of the text using the Spark platform.

About

Project to analyse text streams (tweets or docs) using big data and machine learning. Uses Apache Spark to built textual metrics, then processes the text via various classification models to evaluate the sentiment (models via SciKit-Learn).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published