Credit Card Default Prediction using various approaches to assess class imbalance

This is a notebook detailing the implementation on Python of six models to maximize a Bank Profit function under heavy class imbalance and compare it to other standard gain and loss functions. We used information of 30,000 Taiwan's customers produced on October 2005, detailed description of the information we used can be found on the for Machine Learning Repository of University of California, Irvine.

I propose as a Bank Profit score: $Average Profit_{per costumer} = \frac{ \sum_i(\alpha* True Negative_i - (1-\alpha)*False Negative_i)}{Total True Negative + Total False Negative)}$, where $\alpha$ is a parameter that impose a relative value for a client default against a non-default. Since $\alpha$ is unknown we train our models with $\alpha = (\frac{1}{3}, \frac{3}{7}, \frac{1}{2})$.

This notebook has four sections: i) data loading and handling, ii) exploratory data analysis, iii) modelling, iv) conclusions.

Further research

Considering recent major developments in automl, now I present examples of Bank Profit score usage in autogluon (https://auto.gluon.ai/stable/index.html) as you can see in the notebooks folder. Since this repository was published, many colleagues considered the replacement of F1-Score as unnecessary. I am going to include results on how unaligned F1-Score and the actual company's profit could be under different scenarios.

Also, in 2024, there is much controversy in the usage of SMOTE. Further analysis with different datasets should be done, particularly with different scores such as Bank Profit score.

Conclusions

Class Imbalance is a very common issue in the daily application of statistical methods to a broad range of problems. Here we tried to SOMTE from Nitesh et al. (2002) and a custom loss function for model selection during Hyperparameter Tunning (HPT) for LightGBM and CatBoost. We only found an improvement over no HPT (both with SMOTE) on LightGBM with our custom function for alfa = 1/3 improving from 0.1853 to 0.1991, a modest increase of 7.44%. We found no improvement on the rest of our models. Considering the importance of this area of research and the vast numerous of options to assess it we believe that more studies are needed to understand it and write more user-friendly codes.

Bibliography

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. (2002). Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Credit Card Default Prediction using various approaches to assess class imbalance

Further research

Conclusions

Bibliography

About

Releases

Packages

Languages

License

DiegoDVillacreses/credit_card_default

Folders and files

Latest commit

History

Repository files navigation

Credit Card Default Prediction using various approaches to assess class imbalance

Further research

Conclusions

Bibliography

About

Topics

Resources

License

Stars

Watchers

Forks

Languages