Spam_Classifier_Project

A spam classifier using Natural Language Processing (NLP) is a machine learning model designed to automatically categorize and filter out unwanted or irrelevant messages, typically in the context of emails or text messages. It analyzes the content of messages and applies NLP techniques to distinguish between legitimate and spam messages based on various features, such as the presence of specific keywords, patterns, or text characteristics.

Introduction

This program is designed to classify SMS messages into two categories: spam and ham. It processes the text messages using various techniques, such as data cleaning, preprocessing, and the Bag of Words model. The Naive Bayes classifier is used for making the final classification decision.

Getting Started

These instructions will help you get a copy of the project up and running on your local machine for testing and development purposes.

Prerequisites

Before you begin, ensure you have met the following requirements:

Having an understanding of Lemmatization, Stemming, Stop words, Bag of words, Naive Bayes Classifier
Dataset Link (https://archive.ics.uci.edu/dataset/228/sms+spam+collection)
Python (>=3.0)
Python libraries: pandas, nltk, sklearn

You can install the required libraries using pip:

pip install pandas nltk scikit-learn

Code Description

The code is structured into several main sections:

Importing the Dataset: Reads the SMS dataset using Pandas.
Data Cleaning and Preprocessing: Cleans and preprocesses the text data, including removing non-alphabetic characters, converting to lowercase, and applying stemming and stopword removal.
Creating the Bag of Words Model: Utilizes the CountVectorizer from scikit-learn to convert the text data into numerical features.
Train-Test Split: Splits the dataset into a training set and a testing set for model evaluation.
Training the Naive Bayes Classifier: Utilizes a Multinomial Naive Bayes classifier to train the spam detection model.
valuating the Model: Calculates and displays the confusion matrix and accuracy score for model performance evaluation.
Creating the Streamlit App: Generates the app.py file using the Streamlit library to create a user-friendly web application for spam detection.

Usage

You can use this code as a starting point for SMS spam classification. To use the program, follow these steps:

Install the prerequisites.
Ensure you have a dataset with SMS messages and labels.
Modify the file path to your dataset in the code.

Run the code to train and evaluate the SMS spam classifier.

Screenshots

Dataset Frame
After Data Cleaning
Difference between Actual DataSet And Cleaning Dataset
Bag Of Words(X)
Tf-idf(X)
(Y) Array of 0's & 1's of label('spam''ham')
X(independent variable) & y(dependent variable)
Confusion Matrix
Accuracy Score
Final Outcome

Packages And Libraries

pandas
re
nltk
Scikit-learn
Streamlit
pickle

Author

This Model is developed by Ayush Verma.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Colab+pycharm_code		Colab+pycharm_code
screenshots		screenshots
sms+spam+collection_Dataset		sms+spam+collection_Dataset
spyder_code		spyder_code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab+pycharm_code

Colab+pycharm_code

screenshots

screenshots

sms+spam+collection_Dataset

sms+spam+collection_Dataset

spyder_code

spyder_code

README.md

README.md

Repository files navigation

Spam_Classifier_Project

Table of Contents

Introduction

Getting Started

Prerequisites

Code Description

Usage

Screenshots

Packages And Libraries

Author

About

Releases

Packages

Languages

Thenx0009/Spam_Classifier_Project

Folders and files

Latest commit

History

Repository files navigation

Spam_Classifier_Project

Table of Contents

Introduction

Getting Started

Prerequisites

Code Description

Usage

Screenshots

Packages And Libraries

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages