data-day-tx-2018

This is the talk given by Graham Ganssle and Steve Purves at Data Day Texas 2018. This talk was given in conjunction with Lynn Pausic and Chris LaCava's talk about how human bias is preserved in machine learning systems.

The Aim

We show how biased training data biases results of model outputs by assessing the qualification of loan applicants based on US Census data. We train our model on a dense, varied dataset and quantify the difference in apparent loan-worthiness with respect to applicant gender.

Are female loan applicants automatically screened out of credit applications by biased computer models?

Methods

We use a graph convolutional network to predict a node property (credit worthiness) from other node properties and edge connections to other credit applicants.

Data

The data used in this experiment is extracted from the 1994 US Census data. It is the commonly referenced Census-Income dataset, AKA the "Adult" dataset. We got it from the UCI ML Repo, here.

How do you run this thing?

You first have to condition the data by running the data_cleaning and test_cleaning notebooks. Then you have to run the graphicator notebook to build the graph and associated files out of the clean csv files.

Before you run train the GCN you have to build the GCN code. Do this by cd gcn; python setup.py install;. Then to train, cd into the one-level-deeper gcn/ and run the training script: cd gcn; python train --dataset credit.

A Tip of Our Hat

data-day-TX-2018 by Lynn Pausic, Graham Ganssle, Steve Purves, Expero Inc is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

This work borrows heavily from Graph Convolutional Networks by Thomas Kipf and Max Welling, licensed MIT: ©Thomas Kipf, 2016. You can find their excellent paper here.

The data used in this experiment was obtained from the UCI ML Repository: Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dat		dat
for_chris_lynn		for_chris_lynn
gcn		gcn
img		img
logs		logs
nb		nb
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dat

dat

for_chris_lynn

for_chris_lynn

gcn

gcn

img

img

logs

logs

nb

nb

results

results

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

data-day-tx-2018

The Aim

Methods

Data

How do you run this thing?

A Tip of Our Hat

About

Releases

Packages

Languages

License

gganssle/data-day-tx-2018

Folders and files

Latest commit

History

Repository files navigation

data-day-tx-2018

The Aim

Methods

Data

How do you run this thing?

A Tip of Our Hat

About

Topics

Resources

License

Stars

Watchers

Forks

Languages