Skip to content

This project trains a model using Multinomial Naive Bayes algorithm to predict gender of a person from his/her first name. For this project, we used a dataset downloaded from data.gov which contains a zip file containing 142 txt files. There are files for every year from 1800 to 2021.

Notifications You must be signed in to change notification settings

taeefnajib/predict-gender-from-first-name

Repository files navigation

Predict Gender from First Names

This project trains a model using Multinomial Naive Bayes algorithm to predict gender of a person from his/her first name. For this project, we used a dataset downloaded from data.gov which contains a zip file containing 142 txt files. There are files for every year from 1800 to 2021.

###Instruction

  1. Clone this repository:
git clone https://github.com/taeefnajib/predict-gender-from-first-name
  1. Download the zip file from data.gov and unzip the names folder. Place it in the working directory.

  2. Install all the dependencies:

pip install -r requirements.txt
  1. data.py prepare a csv file from all the txt files and pre-processes the dataset. You don't need to run it in the command line.

  2. train.py builds a model and trains it on the dataset. The repository contains the files data.csv and model.pkl. If you remove them and run train.py, this file will create the files data.csv and model.pkl

  3. test.py uses argparse to allow users to predict genders from first names in the command line. Use --name or -n followed by the name you want to predict gender for. Example:

python test.py --name Josh
  1. If you want to use FastAPI instead, you can do it:
uvicorn main:app --reload

This will open Swagger UI interface at 127.0.0.1 using port 8080 (if it is available). If you use the first name as a string it will reuturn a dictionary for Gender and Probability

About

This project trains a model using Multinomial Naive Bayes algorithm to predict gender of a person from his/her first name. For this project, we used a dataset downloaded from data.gov which contains a zip file containing 142 txt files. There are files for every year from 1800 to 2021.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages