Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for V2 API #155

Open
3 of 16 tasks
JWCook opened this issue May 31, 2021 · 7 comments
Open
3 of 16 tasks

Add initial support for V2 API #155

JWCook opened this issue May 31, 2021 · 7 comments
Labels
new endpoint Add a new API endpoint
Milestone

Comments

@JWCook
Copy link
Member

JWCook commented May 31, 2021

With the V2 API in development, in the near future I'd like to start adding initial support for it, with the expectation that it's subject to change, and just for testing purposes for now. The main feature I'm interested in is being able to specify which fields to include in the response (with a very minimal response by default, containing only IDs).

A few of the more commonly used read-only endpoints would be a good place to start:

  • GET /observations
  • GET /taxa
  • GET /projects

General features:

  • Update models to accommodate differences in v2 response formats
    • Observation
    • Taxon
    • ControlledTerm
    • ???
  • Simplified syntax for specifying return fields
    • Top-level fields
    • Nested fields
  • Simplified syntax for specifying return fields to exclude
    • Top-level fields
    • Nested fields

Other:

  • Docs
  • Tests
@JWCook JWCook added the new endpoint Add a new API endpoint label May 31, 2021
@JWCook JWCook added this to the v0.15 milestone Jul 6, 2021
@JWCook JWCook removed this from the v0.15 milestone Sep 2, 2021
@JWCook JWCook added this to the v0.18 milestone Jun 28, 2022
@JWCook JWCook modified the milestones: v0.18, v1.0 Feb 5, 2023
@JWCook JWCook modified the milestones: v1.0, v1.1 Mar 8, 2023
@abubelinha
Copy link

Not sure if v2 api will support some kind of both positive/negative way of requesting fields. Perhaps also wildcards for getting nested fields

  • negated fields: i.e., if want to keep getting all the v1 api info, but removing a few too verbose fields (i.e. identifications and observation_photos) it would be good to have a way to do that (instead of having to explicitly state all needed fields but those two).
  • nested fields: couldn't find a way to request i.e. taxon and all its nested fields. Maybe there is some kind of hidden syntax (i.e. taxon.* or whatever). I even wonder how to get some given children fields (i.e. ancestors list inside taxon)

Anyway, perhaps pyinaturalist could implement those features even if inaturalist v2 api doesn't make those requests easy to code.

@JWCook
Copy link
Member Author

JWCook commented May 27, 2023

Good suggestions. For nested objects, it looks like you have to specify every single sub-field you want to get, otherwise it defaults to an ID only, for example:

curl 'https://api.inaturalist.org/v2/observations?id=14150125&fields=species_guess,observed_on,user,taxon' |  jq '.results[0]'
{
  "uuid": "91a29d5f-d2bf-47ff-b629-d0b79d51e46c",
  "species_guess": "Common Loon",
  "observed_on": "2018-07-07",
  "user": {
    "id": 1020044
  },
  "taxon": {
    "id": 4626
  }
}

To get complete taxon data, it requires passing something like this in the request body:

Example
{
    "fields": [
        "taxon": {
            "ancestry": true,
            "ancestor_ids": true,
            "ancestors": {
                "id": true,
                "uuid": true,
                "name": true,
                "iconic_taxon_name": true,
                "is_active": true,
                "preferred_common_name": true,
                "rank": true,
                "rank_level": true
            },
            "default_photo": {
                "attribution": true,
                "license_code": true,
                "url": true,
                "square_url": true
            },
            "iconic_taxon_name": true,
            "id": true,
            "is_active": true,
            "name": true,
            "preferred_common_name": true,
            "rank": true,
            "rank_level": true
        }
  ]
}

For pyinaturalist, I'd ideally like simpler usage like this:

# Get observation.taxon and all its sub-fields
get_observations(fields=['taxon'])

# Get all fields
get_observations(fields='all')

# Get all fields except identifications
get_observations(except_fields=['identifications'])

@abubelinha
Copy link

abubelinha commented May 28, 2023

Thanks.
I've seen they suggest a more compact RISON syntax for passing the JSON request but I was expecting they to provide some kind of wildcards or getchildren=yes as a way to reduce requests length.
So for now I'll need to increase the length of my requests to reduce the length of my responses.

Regarding your suggested syntax for pyinaturalist, yes, that's what I'd suggest too.
I'd modify your 1st example get_observations(fields=['taxon']) doing it a bit more elaborate, so users need to explicitly request subfields (to avoid confusion with v2 api default behavior, which does not return them) but in a much less verbose way.
So I'd just provide an easy wildcard-syntax in case users want pyinaturalist to return additional stuff.

# this should get only taxon ID (or whatever api v2 returns by default):
get_observations(fields=['taxon']) 

# get all nested children passing some kind of pyinaturalist wildcard flag: *,+,@
get_observations(fields=['taxon.*']) 

# some more elaborated combined requests with & without wildcards:
get_observations(fields=['taxon', # basic parent ¿id? & some children below:
    'taxon.name', 'taxon.threatened', 'taxon.introduced',  # basic children fields
    'taxon.default_photo.*', 'user.*', # children fields with all their descendants
    'observed_o*', # all fields starting with those letters
    'geo*.*' # both wildcard behaviors (start letters and children)
    ])

Those last lines would serve to get both observed_on, observed_on_details & observed_on_string plus geoprivacy & geojson+children.

So ideally wildcards could work in those two ways, but perhaps this is too complicated. Just thinking loud

@JWCook
Copy link
Member Author

JWCook commented May 28, 2023

to avoid confusion with v2 api default behavior, which does not return them

That's a good point. I'll consider the wildcard/glob format.

A problem with both that and my example, though, is that it's going to be fairly user error-prone. If I want to specify more than just a couple fields, it's going to require typing in a lot of strings without any support for IDE autocompletion or error checking. For example, if I had something like:

get_observations(fields=[
    'obscured',                         
    'observed_on',                      
    'outlinks',                         
    'out_of_range',                     
    'owners_identification_from_vision',
    'place_guess',                      
    'place_ids',                        
    'positional_acuracy',              
    'preferences',                      
    'private_location',                 
    'private_place_ids',                
    'private_place_guess',              
    'project_ids',                      
    'project_ids_with_curator_id',      
    'project_ids_without_curator_id',   
    'public_positional_accuracy',       
    'taxon.*',                          
    'user.*',                           
])

It would probably take me an embarrassingly long time to realize I misspelled 'positional_accuracy' and that it was missing from the response. Although I suppose I could add some error checking to validate field names.

Another possibility would be to use model objects that contain all the available attributes (which are already mostly complete). It would still get a bit verbose, but it would allow for autocompletion. Example in VSCode:
image

That could have other downsides I haven't thought of, though, and there might be some scenarios where wildcard strings would be simpler. I'll need to give it some more thought.

@JWCook
Copy link
Member Author

JWCook commented May 29, 2023

There are some changes in the main branch to add basic support for GET /observations v2.

  • currently, fields is just passed directly to the API
  • fields='all' will get all available fields
  • except_fields=[...] will get all fields except those specified (no wildcard support)

Examples:

from pyinaturalist.v2 import get_observations

obs = get_observations(id=14150125, fields={'species_guess': True, 'user': {'login': True}})

obs = get_observations(id=14150125, fields='all', page='all')

obs = get_observations(id=14150125, except_fields=['identifications'])

@abubelinha
Copy link

abubelinha commented May 30, 2023

Thanks a lot!!
Two questions

  1. No rush but ... is it possible to install it using pip?
    I tried pip install --upgrade pyinaturalist but still get same version I had: pyinaturalist 0.18.0 (Windows 10, Python 3.9)
  2. Makes no sense to do that but what's the expected behavior if you try to pass some fields and some except_fields? (both in same get_observations() call)

@JWCook
Copy link
Member Author

JWCook commented May 30, 2023

  1. Yes, I made a pre-release build which you can install with:
pip install -U --pre pyinaturalist
  1. I updated it to raise a ValueError in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new endpoint Add a new API endpoint
Projects
None yet
Development

No branches or pull requests

2 participants