A spam classifier using Natural Language Processing (NLP) is a machine learning model designed to automatically categorize and filter out unwanted or irrelevant messages, typically in the context of emails or text messages. It analyzes the content of messages and applies NLP techniques to distinguish between legitimate and spam messages based on various features, such as the presence of specific keywords, patterns, or text characteristics.
- Introduction
- Getting Started
- Prerequisites
- Code Description
- Usage
- Screenshots
- Packages And Libraries
- Author
This program is designed to classify SMS messages into two categories: spam and ham. It processes the text messages using various techniques, such as data cleaning, preprocessing, and the Bag of Words model. The Naive Bayes classifier is used for making the final classification decision.
These instructions will help you get a copy of the project up and running on your local machine for testing and development purposes.
Before you begin, ensure you have met the following requirements:
- Having an understanding of Lemmatization, Stemming, Stop words, Bag of words, Naive Bayes Classifier
- Dataset Link (https://archive.ics.uci.edu/dataset/228/sms+spam+collection)
- Python (>=3.0)
- Python libraries: pandas, nltk, sklearn
You can install the required libraries using pip:
pip install pandas nltk scikit-learn
The code is structured into several main sections:
-
Importing the Dataset: Reads the SMS dataset using Pandas.
-
Data Cleaning and Preprocessing: Cleans and preprocesses the text data, including removing non-alphabetic characters, converting to lowercase, and applying stemming and stopword removal.
-
Creating the Bag of Words Model: Utilizes the CountVectorizer from scikit-learn to convert the text data into numerical features.
-
Train-Test Split: Splits the dataset into a training set and a testing set for model evaluation.
-
Training the Naive Bayes Classifier: Utilizes a Multinomial Naive Bayes classifier to train the spam detection model.
-
valuating the Model: Calculates and displays the confusion matrix and accuracy score for model performance evaluation.
-
Creating the Streamlit App: Generates the app.py file using the Streamlit library to create a user-friendly web application for spam detection.
You can use this code as a starting point for SMS spam classification. To use the program, follow these steps:
- Install the prerequisites.
- Ensure you have a dataset with SMS messages and labels.
- Modify the file path to your dataset in the code.
Run the code to train and evaluate the SMS spam classifier.
-
Tf-idf(X)
-
(Y) Array of 0's & 1's of label('spam''ham')
-
Final Outcome
pandas
re
nltk
Scikit-learn
Streamlit
pickle
This Model is developed by Ayush Verma.