Skip to content

maufcost/hotel-reviews-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

hotel-reviews-sentiment-analysis

Python Program focused on Natural Language Processing (NLTK) and Machine Learning (Binary Classification Naive Bayes) to get the sentiment analysis of hotel reviews - either positive or negative.

About the included files:

The 'preprocess_dataset.py' script is responsible for taking care of preprocessing the raw dataset. The unedited version of the dataset contained "invalid" hotel reviews, such as "No positive review" or "No negative opinion" - information that do not provide valuable insight when training the classifier. This script also separates the raw dataset into two different files: one for positive reviews and another one for negative reviews for further labeling in 'training_classifier.py'.

The 'training_classifier.py' script imports preprocessed data, trains a Naive Bayes classifier, and pickles necessary Python objects for further use in the creation of the 'sentiment_mod.py' module.

The 'sentiment_mod.py' script is responsible for using the pickled files (features and trained Naive Bayes classifier) and contains the 'get_sentiment_from(review)' function that returns the classifier prediction based on user input.

The 'testing_mod.py' script tests the reliability of the classifier (around ~90.20%). You can use the trained classifier by simply importing the 'sentiment_mod.py' module into your Python script and using its 'get_sentiment_from(review)' function to get the sentiment analysis for your own reviews!

Notes:

The raw and preprocessed datasets are not included in this repository due to copyright/licensing issues. You can either import the sentiment module or load the pickled classifier object into your scripts.