Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce function isn't columns-only #114

Open
rluedde opened this issue Jul 11, 2020 · 8 comments
Open

Coerce function isn't columns-only #114

rluedde opened this issue Jul 11, 2020 · 8 comments

Comments

@rluedde
Copy link
Contributor

rluedde commented Jul 11, 2020

The _coerce function can take in multiple columns at once. The docstring and argument name (column vs columns) say it can only deal with a single column. Is this something that should be changed?

@rluedde
Copy link
Contributor Author

rluedde commented Jul 11, 2020

If using in other modules, should the "_" of _coerce be removed?

@ljwolf
Copy link
Member

ljwolf commented Jul 13, 2020

Great question! Think about what _coerce would do if you passed a dataframe--- it would try to coerce the entire input and, if that failed for any reason, then the original dataframe would be returned. If you look at where it's used, I'm applying it column by column. This is so that all columns that can be coerced are coerced. You could define _coerce recursively and get this effect pretty easily:

def coerce(data, cast_to=numpy.float64):
    if isinstance(data, pandas.DataFrame):
        for column in data.columns:
            data[column] = coerce(data[column], cast_to=cast_to)
        return data
    else:
        try:
            return data.astype(cast_to)
        except:
            return data

@rluedde
Copy link
Contributor Author

rluedde commented Jul 13, 2020

Aha. This makes sense. Would it be a good idea to change coerce to get this change-what-you-can behavior (to be tuned for a whole DataFrame) instead of the all-or-nothing behavior that it currently is? This way you don't have to:

for variable in variables:
    data[variable] = _replace_missing(_coerce(data[variable], float))

it might instead look like:

data[variables] = _replace_missing(coerce(data[variables], float))

but _replace_missing is only built for single columns and would need to be altered. This seems unnecessary. Don't fix what isn't broken?

Also, how do you get the syntax highlighting in this markdown?

@ljwolf
Copy link
Member

ljwolf commented Jul 13, 2020

it might instead look like ...

I'm OK defining things recursively in the pattern I've shown you. That, generally, is quite useful! Polymorphic functions are very good things, especially when they are quite general. I am happy to have wide-ranging changes that improve the library 😄

also, how do you get the syntax highlighting ...

When you use three ticks for code blocks, put the name of the language you want to highlight in at the end of the three ticks.
Python:

def do_something():
    for i in range(1, n):
        print(i)

Julia:

function do_something(n)
    for i in 1:n
        println(i)
    end
end

@rluedde
Copy link
Contributor Author

rluedde commented Jul 13, 2020

So, to make sure I'm comprehending, you're for the idea of changing _replace_missing to also follow the pattern that you outlined with coerce (to be able to handle columns AND dataframes)?

def test():
    print("Science!")

Cooooooool. Thanks!

@ljwolf
Copy link
Member

ljwolf commented Jul 13, 2020

for the idea of changing _replace_missing

yep!

@rluedde
Copy link
Contributor Author

rluedde commented Jul 13, 2020

Is it valid to have the tuple _ACS_MISSING be local to _replace_missing and not as an argument/global?

@ljwolf
Copy link
Member

ljwolf commented Jul 13, 2020

yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants