Bank Marketing Campaign Predictive Modeling

Section 1: Understanding the Dataset

About the dataset - Find the best strategy to improve the next marketing campaign

Link: https://www.kaggle.com/janiobachmann/bank-marketing-dataset

Data Description

1 - Age
2 - Job: Type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - Marital: marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - Education: (categorical: primary, secondary, tertiary and unknown)
5 - Default: has credit in default? (categorical: 'no','yes','unknown')
6 - Housing: has housing loan? (categorical: 'no','yes','unknown')
7 - Loan: has personal loan? (categorical: 'no','yes','unknown')
8 - Balance: Balance of the individual
9 - contact: contact communication type (categorical: 'cellular','telephone')
10 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
11 - day: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
12 - duration: last contact duration, in seconds (numeric)
13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
15 - previous: number of contacts performed before this campaign and for this client (numeric)
16 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
17 - y - has the client subscribed a term deposit? (binary: 'yes','no')

Section 2: Exploratory Data Analysis (EDA)

Summary Statistics

Notes from Pandas Profiling

11k rows
There are no missing values
There are no duplicate rows
Some outliers in the field 'balance'
Just one percent of the rows had a default

Just 13% of the customers took loan

Just 1.5% of the data points defaulted

Those who default have low balance in their account

Section 3: Data Pre-processing

Steps:

Create target variables
Drop the original target variable columns
Encode the columns containing 'yes' and 'no' values
Correlation check
Separate features and target
Standardization
Train and Test split

Section 4: Model Building

Results

3 types groups of algorithms were explored: Boosting, Neural Networks and Automated ML.

Section 5: Fairness-Bias Check using SHAP values

To prepare the model for future deployment, it was necessary to perform a fairness and bias test using Shap. Results show that common bias related variables such as marital status and job, were not given high importance in our model. This means that our model is fairly unbiased and would mitigate the reinforcement of biases in a non-interpretable and unaccountable way.

Section 6: Understanding Causal Inference

Chi square

Interpretation - The chart above visualizes our sample data. If there is truly no effect of the campaign by looking at the number of depsotis before it occured and after, then the data would show an even ratio split between 'deposit' and 'no deposit' for each time.

chi-square and p-value The chisquared value = 0.00824, p-value = ~1 and degrees of freedom = 2. With a p-value =1 , we are not able to reject the null hypothesis. This means that the campaign does not effect the number of deposits. Note that these numbers vary every time since we do not set a random split. In fact, re-running the code should yield wildly different answers. Therefore, we can expect that, depending on the split, the campaing to have or not to have an effect. As mentionned prior, the ideal scenario would be to use this dataset as past data and use data after a new campaign to asses the effect.

Section 7: DoWhy: Double Machine Learning for Causal Inference

Stratification

The hypothesis we are looking to research is whether there or not there is a causal relationship between age and weather an individual will deposit.
Looking only at AGE as a treatment variable, it seems that the causal effect of AGE is 0.001814747418610118. This would mean that for every unit increase in AGE, the probability that someone depsoits goes up by 0.18%.
While this may seem shocking for some, this is likely not realistic. There may be some non-linearity effects or we must test using another method.

Regression

The hypothesis we are looking to research is whether there or not there is a causal relationship between AGE and deposit_bool.

We also look at the causal relationship of other predictors on depsoit_bool (such as "job" and "default").

Looking at AGE as the treament variable, we see that first it is a significant treamtent variable within 10% (p < 0.094) and its effect using a linear regression model is -0.0006593246127298835.

This would imply that for every unit increase in AGE, the probability of someone depositing goes down by 0.07%. This estimate is much more realistic, as it seems ther is no age bias anymore.

Here is the causal effect of each variable on the odds of depsoiting (deposit_bool) and the associated p-value:

age = 0.001596470177772813 (p = 0.0010000000000000009)
marital = -0.08611891032763919 (p < 0.001)
balance = 0.1030394858277377 (p < 0.001)
education (tertiary) = 0.06589147843681609 (p < 0.001)
default = -0.08093474884816493 (p = 0.016)
housing = -0.17525694004024445 (p < 0.001)
loan = -0.11660829943692341 (p < 0.001)

The biggest anomaly is the causal effect of housing. "housing" denotes if someone has a housing loan. The causal effect seems that if someone does has made a housing loan, the odds of them making a deposit after a marketing campaing goes down by 17%. This seems quite steep at first but, with reason makes sense. Someone that takes a housing loan is likely to have an account and funds are a major issue. Thus, when we run the campaign, the individual does not feel inclined to deposit but rather save up (or remain in their current situtaiton. Additionally, there might be some non-linear effect that is not captured properly.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Enterprise Analytics - Final.pptx		Enterprise Analytics - Final.pptx
Final_Enterprise.ipynb		Final_Enterprise.ipynb
README.md		README.md
bank.csv		bank.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enterprise Analytics - Final.pptx

Enterprise Analytics - Final.pptx

Final_Enterprise.ipynb

Final_Enterprise.ipynb

README.md

README.md

bank.csv

bank.csv

Repository files navigation

Bank Marketing Campaign Predictive Modeling

Section 1: Understanding the Dataset

Section 2: Exploratory Data Analysis (EDA)

Section 3: Data Pre-processing

Section 4: Model Building

Results

Section 5: Fairness-Bias Check using SHAP values

Section 6: Understanding Causal Inference

Chi square

Section 7: DoWhy: Double Machine Learning for Causal Inference

About

Releases

Packages

Contributors 4

Languages

McGill-MMA-EnterpriseAnalytics/datasectuals

Folders and files

Latest commit

History

Repository files navigation

Bank Marketing Campaign Predictive Modeling

Section 1: Understanding the Dataset

Section 2: Exploratory Data Analysis (EDA)

Section 3: Data Pre-processing

Section 4: Model Building

Results

Section 5: Fairness-Bias Check using SHAP values

Section 6: Understanding Causal Inference

Chi square

Section 7: DoWhy: Double Machine Learning for Causal Inference

About

Topics

Resources

Stars

Watchers

Forks

Languages