Skip to content

Predicting whether or not a person deposits money after a marketing campaign. Gain insights to develop the best strategy in the next marketing campaign

Notifications You must be signed in to change notification settings

McGill-MMA-EnterpriseAnalytics/datasectuals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bank Marketing Campaign Predictive Modeling

Section 1: Understanding the Dataset

About the dataset - Find the best strategy to improve the next marketing campaign

Link: https://www.kaggle.com/janiobachmann/bank-marketing-dataset

Data Description

1 - Age
2 - Job: Type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - Marital: marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - Education: (categorical: primary, secondary, tertiary and unknown)
5 - Default: has credit in default? (categorical: 'no','yes','unknown')
6 - Housing: has housing loan? (categorical: 'no','yes','unknown')
7 - Loan: has personal loan? (categorical: 'no','yes','unknown')
8 - Balance: Balance of the individual
9 - contact: contact communication type (categorical: 'cellular','telephone')
10 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
11 - day: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
12 - duration: last contact duration, in seconds (numeric)
13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
15 - previous: number of contacts performed before this campaign and for this client (numeric)
16 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
17 - y - has the client subscribed a term deposit? (binary: 'yes','no')

Section 2: Exploratory Data Analysis (EDA)

  1. Summary Statistics

1

  1. Notes from Pandas Profiling
  • 11k rows
  • There are no missing values
  • There are no duplicate rows
  • Some outliers in the field 'balance'
  • Just one percent of the rows had a default
  1. Just 13% of the customers took loan

3

  1. Just 1.5% of the data points defaulted

2

  1. Those who default have low balance in their account

4

Section 3: Data Pre-processing

Steps:

  1. Create target variables
  2. Drop the original target variable columns
  3. Encode the columns containing 'yes' and 'no' values
  4. Correlation check
  5. Separate features and target
  6. Standardization
  7. Train and Test split

Section 4: Model Building

Results

3 types groups of algorithms were explored: Boosting, Neural Networks and Automated ML.

6

Section 5: Fairness-Bias Check using SHAP values

8

To prepare the model for future deployment, it was necessary to perform a fairness and bias test using Shap. Results show that common bias related variables such as marital status and job, were not given high importance in our model. This means that our model is fairly unbiased and would mitigate the reinforcement of biases in a non-interpretable and unaccountable way.

Section 6: Understanding Causal Inference

Chi square

5

Interpretation - The chart above visualizes our sample data. If there is truly no effect of the campaign by looking at the number of depsotis before it occured and after, then the data would show an even ratio split between 'deposit' and 'no deposit' for each time.

chi-square and p-value The chisquared value = 0.00824, p-value = ~1 and degrees of freedom = 2. With a p-value =1 , we are not able to reject the null hypothesis. This means that the campaign does not effect the number of deposits. Note that these numbers vary every time since we do not set a random split. In fact, re-running the code should yield wildly different answers. Therefore, we can expect that, depending on the split, the campaing to have or not to have an effect. As mentionned prior, the ideal scenario would be to use this dataset as past data and use data after a new campaign to asses the effect.

Section 7: DoWhy: Double Machine Learning for Causal Inference

Stratification

The hypothesis we are looking to research is whether there or not there is a causal relationship between age and weather an individual will deposit.
Looking only at AGE as a treatment variable, it seems that the causal effect of AGE is 0.001814747418610118. This would mean that for every unit increase in AGE, the probability that someone depsoits goes up by 0.18%.
While this may seem shocking for some, this is likely not realistic. There may be some non-linearity effects or we must test using another method.

Regression

The hypothesis we are looking to research is whether there or not there is a causal relationship between AGE and deposit_bool.

We also look at the causal relationship of other predictors on depsoit_bool (such as "job" and "default").

Looking at AGE as the treament variable, we see that first it is a significant treamtent variable within 10% (p < 0.094) and its effect using a linear regression model is -0.0006593246127298835.

This would imply that for every unit increase in AGE, the probability of someone depositing goes down by 0.07%. This estimate is much more realistic, as it seems ther is no age bias anymore.

Here is the causal effect of each variable on the odds of depsoiting (deposit_bool) and the associated p-value:

age = 0.001596470177772813 (p = 0.0010000000000000009)
marital = -0.08611891032763919 (p < 0.001)
balance = 0.1030394858277377 (p < 0.001)
education (tertiary) = 0.06589147843681609 (p < 0.001)
default = -0.08093474884816493 (p = 0.016)
housing = -0.17525694004024445 (p < 0.001)
loan = -0.11660829943692341 (p < 0.001)

The biggest anomaly is the causal effect of housing. "housing" denotes if someone has a housing loan. The causal effect seems that if someone does has made a housing loan, the odds of them making a deposit after a marketing campaing goes down by 17%. This seems quite steep at first but, with reason makes sense. Someone that takes a housing loan is likely to have an account and funds are a major issue. Thus, when we run the campaign, the individual does not feel inclined to deposit but rather save up (or remain in their current situtaiton. Additionally, there might be some non-linear effect that is not captured properly.

About

Predicting whether or not a person deposits money after a marketing campaign. Gain insights to develop the best strategy in the next marketing campaign

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •