Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting B00001_001E when not requested #103

Open
dfolch opened this issue Apr 8, 2020 · 4 comments
Open

getting B00001_001E when not requested #103

dfolch opened this issue Apr 8, 2020 · 4 comments
Labels
documentation issues that pertain to documentation

Comments

@dfolch
Copy link
Contributor

dfolch commented Apr 8, 2020

B00001_001E is being returned when not requested.

tucson = products.ACS(2017).from_place('Tucson, AZ', level='tract', variables=['B00002*'])

tucson

GEOID geometry B00001_001E B00002_001E state county tract
04019000100 POLYGON ((-12353986.95 3791891.58, -12353934.6... 100.0 68.0 04 019 000100
04019002602 POLYGON ((-12352400.09 3798883.47, -12352399.9... 211.0 134.0 04 019 002602
04019001600 POLYGON ((-12350387.1 3795258.49, -12350376.19... 322.0 171.0 04 019 001600
@ronnie-llamado
Copy link
Member

This is the intended behavior. re is the default package so the pattern B00002* would match both B00001_001E and B00002_001E.

To fix this, either:

  1. Change your search string to B00002.*
  2. Reformat your code to allow you to pass in fnmatch or a custom function as your engine in the search. See here for documentation: cenpy.products.ACS.filter_variables

I'd recommend going with option 1 in this case.

@dfolch
Copy link
Contributor Author

dfolch commented Apr 21, 2021

Thank you for clarifying this @ronnie-llamado. Some thoughts on this.

Since this is not really a bug, maybe we just update the examples (i.e., Notebooks) and docs, e.g.:

>>> tools.national_to_block(cxn, *cxn.varslike('H001*"))
*cxn.varslike("P001*"),

I noticed that the ^P004 style syntax works, when it would seem that it shouldn't under standard re rules.

Currently this note is in the code acknowledging some weirdness.

cenpy/cenpy/remote.py

Lines 288 to 291 in fde2ad6

Only regex and fnmatch will be supported modules. Note that, while
regex is the default, the python regular expressions module has some
strange behavior if you're used to VIM or Perl-like regex. It may be
easier to use fnmatch if regex is not providing the results you expect.

@ronnie-llamado
Copy link
Member

ronnie-llamado commented Apr 21, 2021

@dfolch, do you have any suggestions on which string pattern would be the best (most intuitive) for examples/docs?

Here's a quick snippet showing off some potential possibilities:

import cenpy

conn = cenpy.remote.APIConnection("ACSDT5Y2017")

# unintended variables returned
print( '0', list( conn.varslike('B00002*').index ) )       # original 
print( '' )

# intended variables returned
print( '1', list( conn.varslike('B00002.*').index ) )      # B00002.*
print( '2', list( conn.varslike('B00002\w+').index ) )     # B00002\w+
print( '3', list( conn.varslike('B00002').index ) )        # B00002 
print( '4', list( conn.varslike('^B00002').index ) )       # ^B00002
print( '5', list( conn.varslike('^B00002\w+$').index ) )   # ^B00002\w+$

Returns:

0 ['B00001_001E', 'B00002_001E']

1 ['B00002_001E']
2 ['B00002_001E']
3 ['B00002_001E']
4 ['B00002_001E']
5 ['B00002_001E']

Option 3 (B00002) is the friendliest, but doesn't fully utilize re. Since the Census variables are already formatted and cenpy just searches for a substring within the variable, this works but may not be as intuitive.

@dfolch
Copy link
Contributor Author

dfolch commented Apr 21, 2021

Your point is well taken that there is some mystery with Option 3. I didn't realize this query would return 166 items: conn.varslike('1002').

Since Option 0 is not a great re example and it's not simple substring matching, I think it should be changed.

Option 3 covers most use cases and doesn't require people to even think about re so I would make this the standard in the docs and examples. Maybe insert an example somewhere showing that fancy re are possible. For example, getting just the variables for females (B01001_026 to B01001_049) from table B01001. There are some tables with a Puerto Rico specific counterpart (e.g., B05001 vs. B05001PR) which could make an interesting re example.

@ronnie-llamado ronnie-llamado added the documentation issues that pertain to documentation label May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation issues that pertain to documentation
Projects
None yet
Development

No branches or pull requests

2 participants