Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not just build this directly into pandas? #3

Open
patwater opened this issue Apr 8, 2016 · 2 comments
Open

Why not just build this directly into pandas? #3

patwater opened this issue Apr 8, 2016 · 2 comments

Comments

@patwater
Copy link

patwater commented Apr 8, 2016

No description provided.

@rbeesley
Copy link

Alternatively, make the library an extension of pandas.core.generic.NDFrame, and then these statistics can be ran against Series as well as DataFrames, and they don't have to be instantiated as a separate object. This would allow you to do an import and use it directly on a DataFrame:

import pandas as pd
from pandas.core.generic import NDFrame

def monkeypatch_method(cls):
    def decorator(func):
        setattr(cls, func.__name__, func)
        return func
    return decorator

@monkeypatch_method(NDFrame)
def summary(self):
    """Does the same as dfs = DataFrameSummary(self); dfs[column]"""
    dfs = DataFrameSummary(self)
    dfs.__getitem__(self,column)

Then it could be used like this:

import pandas as pd
import pandas_summary

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])

df['num_legs'].summary()

instead of:

import pandas as pd
from pandas_summary import DataFrameSummary

df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])

dfs = DataFrameSummary(df)
dfs['num_legs']

The above code is untested, but generally correct with how it'd be exposed and used. I think this makes more sense than using properties and is more like DataFrame.describe() in how it is called, but providing the additional information DataFrameSummary class provides... the summary. It's just more tightly integrated.

Developed as an extension library like this, it could be brought to Pandas in the future more easily.

@mmourafiq
Copy link
Collaborator

mmourafiq commented Oct 23, 2019

@rbeesley this is an interesting idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants