Skip to content

A Python utility for Cramer's V Correlation Analysis for Categorical Features in Pandas Dataframes.

License

Notifications You must be signed in to change notification settings

ayanatherate/dfcorrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implement pairwise categorical correlations (with heatmap) of all columns in Pandas Dataframes in just one line of code.

Test in Collab

image image image

Automatically detects categorical features and ignores numerical features. Also has custom feature addition/removal option.

Run:

git clone https://github.com/ayanatherate/dfcorrs.git
cd dfcorrs 
pip install -r requirements.txt

If using ipynb notebooks:

!git clone https://github.com/ayanatherate/dfcorrs.git

Open any Python Notebook/IDE:

Cramer's v correlation for Categorical features

from dfcorrs.cramersvcorr import Cramers
import pandas as pd

cramers=Cramers()
data=pd.read_csv(r'../adatasetwithlotsofcategoricalandcontinuousfeatures.csv')


cramers.corr(data)

"""
 cramer's v corr comparison between all categorical features
 returns a Pandas datframe similar to .corr()
"""


cramers.corr(data, plot_htmp=True)

"""
plots correlaton heatmap using plotly
"""

cramers.corr(data)[#feature_name]

"""
single out a categorical feature and observe correlations, returns Pandas Series
"""

At times, a sparse/categorical feature might be falsely interpreted by Pandas as a continuous feature by default (Example: 'City Code', 'Candidate ID') and vice-versa. Hence, to solve that problem :


For custom adding categorical columns for cramers corr comparison use:

cramers.corr(data, add_cols=['feature_name'])

"""
 added column should be present in the dataset provided
 kindly use .astype('str') to force-convert falsely identified continuous columns (if any) before using.
"""

For custom removing categorical(or redundant) columns for cramers corr comparison, use:

cramers.corr(data, rem_cols=['feature_name'])

If you want to use the wrapper for single-shot cramer's v correlation on two python arrays or two separate pandas dataframe column-objects:

"""
single-shot operation, does not remap
after applying operatio on the entire dataframe
"""
cramers.cramers_v(data['feature_name1'], data['feature_name2'])

cramers.cramers_v([i for i in some classes1], [i for i in some classes2]) #say, we have two python arrays/lists instead

About

A Python utility for Cramer's V Correlation Analysis for Categorical Features in Pandas Dataframes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages