Skip to content

EDA and Machine Learning Model Training for the Student Performance Data

Notifications You must be signed in to change notification settings

Ryaz16/Students-Analysis-Performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Machine learning project

Students-Performance-Analysis

Problem Statement:

  • The problem statement is to determine how variables such as gender, race/ethnicity, parental level of education, lunch, and test preparation course affect student performance (test scores).

Data Collection:

Data Checks:

  • A series of data checks were performed to ensure that the data was clean, complete, and in the correct format. This included checking for missing values, duplicate values, and outliers, as well as data types and the number of unique values in each column.

Exploratory Data Analysis (EDA):

  • The data was analyzed to understand its structure, patterns, and relationships. This involved computing summary statistics, exploring correlations between variables, identifying potential outliers or missing values, and finding numerical and categorical columns along with the number of unique values in each categorical column.

Data Visualization:

  • Visualizations were created to identify trends and patterns that may be difficult to see in tabular format, helping to gain insights quickly and communicate results effectively to others.

Data Pre-Processing:

  • The data was transformed to make it suitable for use with machine learning models. This involved techniques such as scaling, normalization, feature selection, or feature engineering.

Model Training:

  • Machine learning models were built using the pre-processed data. The data was split into training and test sets, and the training set was used to train the models.

Model Evaluation:

  • The performance of the models was evaluated using various metrics such as confusion_matrix, classification_report ,RandomForestClassifier and accuracy. This helped to determine which models were performing best.

Choosing the Best Model:

  • Based on the evaluation results, the best-performing model was chosen for predicting student performance.