Skip to content

Evaluation of gender bias in a graph-based classification algorithm.

License

Notifications You must be signed in to change notification settings

gganssle/data-day-tx-2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data-day-tx-2018

This is the talk given by Graham Ganssle and Steve Purves at Data Day Texas 2018. This talk was given in conjunction with Lynn Pausic and Chris LaCava's talk about how human bias is preserved in machine learning systems.

The Aim

We show how biased training data biases results of model outputs by assessing the qualification of loan applicants based on US Census data. We train our model on a dense, varied dataset and quantify the difference in apparent loan-worthiness with respect to applicant gender.

Are female loan applicants automatically screened out of credit applications by biased computer models?

Methods

We use a graph convolutional network to predict a node property (credit worthiness) from other node properties and edge connections to other credit applicants.

Data

The data used in this experiment is extracted from the 1994 US Census data. It is the commonly referenced Census-Income dataset, AKA the "Adult" dataset. We got it from the UCI ML Repo, here.

How do you run this thing?

You first have to condition the data by running the data_cleaning and test_cleaning notebooks. Then you have to run the graphicator notebook to build the graph and associated files out of the clean csv files.

Before you run train the GCN you have to build the GCN code. Do this by cd gcn; python setup.py install;. Then to train, cd into the one-level-deeper gcn/ and run the training script: cd gcn; python train --dataset credit.

A Tip of Our Hat

Creative Commons License
data-day-TX-2018 by Lynn Pausic, Graham Ganssle, Steve Purves, Expero Inc is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

This work borrows heavily from Graph Convolutional Networks by Thomas Kipf and Max Welling, licensed MIT: ©Thomas Kipf, 2016. You can find their excellent paper here.

The data used in this experiment was obtained from the UCI ML Repository: Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.