Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case of replicate column names differ among years #116

Open
rluedde opened this issue Jul 13, 2020 · 1 comment
Open

Case of replicate column names differ among years #116

rluedde opened this issue Jul 13, 2020 · 1 comment

Comments

@rluedde
Copy link
Contributor

rluedde commented Jul 13, 2020

>>> from cenpy.moe import replicate_table_utils as crtu
>>> data_14 = crtu.get_replicate_data_api(["B15002"], 2014, "140", "04")
https://www2.census.gov/programs-surveys/acs/replicate_estimates/2014/data/5-year/140/B15002_04.csv.gz
>>> data_18.columns.levels[0][:3]
Index(['ESTIMATE', 'MOE', 'SE'], dtype='object', name='categories')
>>> data_14.columns.levels[0][:3]
Index(['estimate', 'moe', 'SE'], dtype='object', name='categories')

In the 2014 data, the names of these columns are capitalized and all other years (I think) are lowercase. This causes issues in (at least) apply_func.

I don't have good ideas for a solution. I imagine it's a .upper() or .lower() call somewhere in read_replicate_file or get_replicate_data. A comprehension might be needed as well?

@dfolch
Copy link
Contributor

dfolch commented Jul 13, 2020

Good idea to always convert to the same case. It is probably not worth the extra code to test for capitalization, just run upper on these three column names for all files read in with read_replicate_file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants