Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

farmOS.py revisited: Design/collaboration proposal - hybrid sync/async, pluggable authentication/reactors #31

Open
symbioquine opened this issue Feb 3, 2020 · 17 comments
Milestone

Comments

@symbioquine
Copy link

symbioquine commented Feb 3, 2020

I've been digging into possible farmOS.py API structures and (relatively) "recent" improvements in Python's support for async co-routines as prompted by the discussion on symbioquine/farm-os-area-feature-proxy#1.

I plan to use this issue to capture my thoughts and see if we can converge on a design for a next major version of this library that could support a wider range of use-cases more gracefully.

Design goals

  • Fully abstract Drupal API semantics and return types
  • Support all major farmOS data types which can be accessed via a vanilla farmOS installation's APIs
  • Support both synchronous and asynchronous usage with same API and code base
  • Support pagination/iteration elegantly for all record types
  • Map elegantly/efficiently between the proposed Python API and the underlying farmOS transport API(s)
  • "Batteries included" extension strategy
    • Accept extensions for common functionality (async/OAuth2/etc) in-repo without creating dependencies
    • Model extensions so it is easy to use off-repo extensions where functionality is less general or cannot be included for code/license reasons
  • Pluggable authentication support without otherwise affecting API interface; session cookie, basic auth, OAuth2, etc
  • Pluggable transport backend without otherwise affecting API interface; requests, AIOHTTP, Twisted, etc
  • Make it easy to maintain near full test coverage

API Sketching

Synchronous Usage

Simple login/password session client;

import farmOS

with farmOS.create_session_client(url="http://farmos.local", username="root", password="test") as farm:
    print(farm.info())

Yields;

FarmOSFarmInfo(name='Test0', url='http://farmos.local', api_version='1.1')

OAuth2 client;

import farmOS

# TODO What should the parameters be... presumably this needs to support Authorization/PKCE flows for both interactive console and non-interactive console + configuration of scope/grant-type/urls/etc
with farmOS.create_oauth2_client(url="http://farmos.local", ...) as farm:

Asynchronous Usage

All API usage should be identical except that;

  • Different farmOS.create_*_client methods are used to create the client
  • await is used for calling all methods which directly return a value
  • async with is used anywhere you would use with in synchronous usage
  • async for is used when iterating over pages of results - or derivatives thereof
import asyncio, farmOS

async def main():
    async with farmOS.create_asyncio_session_client(url="http://farmos.local", username="root", password="test") as farm:
        print(await farm.info())

asyncio.run(main())

Areas

By Id

farm.area.get_by_id(record_id=8)

Yields;

FarmOSArea(area_id='8', area_type='field', name='cactus_field', description='<p>This is the description!</p>\n', flags=['priority', 'monitor', 'review'])

Pagination

page = farm.area.query_page(filters={'area_type': 'field'})
while page:
    for area in page:
        print(area)
    page = await page.next_page()

or

for page in farm.area.iterate_query_pages(filters={'area_type': 'field'}):
    for area in page:
        print(area)

Yields;

FarmOSArea(area_id='8', area_type='field', name='cactus_field', description='<p>This is the prickly description!</p>\n', flags=['priority', 'monitor'])
FarmOSArea(area_id='9', area_type='field', name='boulder_field', description='<p>This is the cold hard description!</p>\n', flags=['review'])

Iteration

for area in farm.area.iterate_query(filters={'area_type': 'field'}):
    print(area)

Yields;

FarmOSArea(area_id='8', area_type='field', name='cactus_field', description='<p>This is the prickly description!</p>\n', flags=['priority', 'monitor'])
FarmOSArea(area_id='9', area_type='field', name='boulder_field', description='<p>This is the cold hard description!</p>\n', flags=['review'])

Creation

farm.area.create(FarmOSArea(area_type='field', name='neon_field', description='<p>This is the vibrant description!</p>\n'))

Yields;

FarmOSArea(area_id='10', area_type='field', name='neon_field', description='<p>This is the vibrant description!</p>\n', flags=[])

Updates

farm.area.update(FarmOSArea(area_id='10', description='<p>This is the really vibrant description!</p>\n'))

Yields;

None/throws

Deletion

farm.area.delete(FarmOSArea(area_id='8', area_type='field', name='cactus_field', description='<p>This is the prickly description!</p>\n', flags=['priority', 'monitor']))

or

farm.area.delete_by_id(record_id='8')

Yields;

None/throws

How to support sync/async in the same code base

The layers that do most of the heavy lifting get written in the async style, but have no concrete external dependencies themselves.

These classes get put into python files named async_*.py and the classes are named following the pattern Async*.

Then we generate the non-async versions from that with some code seen below. The top level of the farmOS library then provides static factory methods for wiring together the various permutations of the sync/async client layers.

import glob
import os.path

import libcst as cst

# Vaguely based on https://stackoverflow.com/a/55365529
class AsyncToSync(cst.CSTTransformer):
    def leave_FunctionDef(self, old_node, node):
        return node.with_changes(asynchronous=None)

    def leave_For(self, old_node, node):
        return node.with_changes(asynchronous=None)

    def leave_CompFor(self, old_node, node):
        return node.with_changes(asynchronous=None)

    def leave_With(self, old_node, node):
        return node.with_changes(asynchronous=None)

    def leave_Await(self, old_node, node):
        return node.expression

    def leave_Name(self, old_node, node):
        if node.value == 'AbstractAsyncContextManager':
            return node.with_changes(value='AbstractContextManager')

        if node.value == '__aenter__':
            return node.with_changes(value='__enter__')

        if node.value == '__aexit__':
            return node.with_changes(value='__exit__')

        if not node.value.startswith("Async"):
            return node

        return node.with_changes(value=node.value[5:])  # Remove "Async" prefix

    def leave_ImportFrom(self, old_node, node):
        if not type(node.module) is cst.Attribute:
            return node

        if not type(node.module.attr) is cst.Name:
            return node

        if not node.module.attr.value.startswith("async_"):
            return node

        return node.with_changes(
            module=node.module.with_changes(
                attr=node.module.attr.with_changes(
                    value=node.module.attr.value[6:])))


for filename in glob.glob('./farmOS/**/async_*.py', recursive=True):
    print(filename)

    with open(filename, 'r') as ifp:
        source_tree = cst.parse_module(ifp.read())

        modified_tree = source_tree.visit(AsyncToSync())

        base_filename = os.path.basename(filename)

        output_filename = os.path.join(
            os.path.dirname(filename), base_filename[6:])

        with open(output_filename, 'w') as ofp:
            ofp.write(
                "# Generated from {!r} do not modify by hand.\n".format(base_filename))
            ofp.write(modified_tree.code)

Structure

Diagram .png files include embedded draw.io source.

2020_02_11_farmos_py_issue_31_top_level_module

Asynchronous

2020_02_11_farmos_py_issue_31_async

Synchronous

2020_02_11_farmos_py_issue_31_sync

Asynchronous Sensors

2020_02_11_farmos_py_issue_31_sensor_async

Synchronous Sensors

2020_02_11_farmos_py_issue_31_sensor_sync

FAQs

It seems like you are trying to "boil the ocean" with this issue... is all this tractable to tackle as a single "issue"?

Maybe not. I expect the API described here to take a lot of work to implement, document, and test thoroughly.

My purpose as mentioned above is to polish a vision for how the farmOS Python API should be. If we can align on that vision, a branch can be used to track the development against that vision until it can be released as a new major version.

Why expose pagination explicitly?

This gives consumers of the farmOS.py library the flexibility to handle large amounts of data in/from farmOS performantly. It would be really challenging or impossible to completely hide the pagination and still provide that flexibility.

Why are there no "get all" or "query all" methods which return collections, only pagination and iteration?

This would unnecessarily increase the "surface area" of the API while encouraging unperformant usage patterns that hold all entities/records in memory.

Consumers which require the entities/pages as collections can easily do so themselves;

all_areas = list(farm.area.iterate_query(filters={'area_type': 'field'}))

Will those static factory methods on the top-level module of farmOS explode combinatorially as the library supports more kinds of transports, auth mechanisms, etc?

Possibly, but this also lets us be opinionated about the supported combinations and provide concrete documentation with embedded code examples for each one.

How would the proposed API support retrieving multiple records by id in a single request - i.e. #14?

Naively, it would seem that this should be supported by the get_by_id methods - e.g. on FarmOSTaxonomyClient - however that would probably be a mistake since that would make the return type different depending whether a single or multiple ids are passed.

A better approach would be to support this via the filters argument to the query_page, iterate_query_pages, iterate_query methods. This conveniently fits into both the proposed python API (which already return multiple records for those methods) and the existing Drupal entity "transport" API simply by adding a simple requirement for the handling of the filters argument;

The values in the filters dictionary may be either a single value, or an iterable of values. If it is an iterable of values then the result will be the union of the results of the query run separately with each value passed singly - assuming no changes occurred between the two queries.

Thus the example from #14 would be supported by the proposed API as follows;

for log in farm.log.iterate_query(filters={'id': ['163', '167']}):
    print(log)

Calls farmOS as;

https://farmos.local/log.json?id[]=163&id[]=167

Conveniently, this also allows for other kinds of cool queries to execute efficiently from a transport API perspective;

for log in farm.log.iterate_query(filters={'type': ['farm_harvest', 'farm_observation']}):
    print(log)

Calls farmOS as;

https://farmos.local/log.json?type[]=farm_harvest&type[]=farm_observation

References

@paul121
Copy link
Member

paul121 commented Feb 4, 2020

@symbioquine great!! Look forward to hearing your thoughts. As I mentioned in that issue, all I've briefly looked into is the httpx library as a potential path to async support.

I've just released a new beta version of this library with additional support for OAuth (most everything up to this point has been a beta release for testing in the Aggregator). Once we get OAuth finalized in 7.x-1.4, I think we'll release a v0.1.6 of the library. Perhaps we could start drawing up async support for v0.2.0 and allow for some more breaking changes/improvements.

@paul121
Copy link
Member

paul121 commented Feb 4, 2020

@symbioquine Just seeing the updated description! This is great.

Support all major farmOS data types which can be accessed via a vanilla farmOS installation's APIs

I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however

Support both synchronous and asynchronous usage with same API and code base

👍

"Batteries included" extension strategy

👍

Pluggable authentication support without otherwise affecting API interface; session cookie, basic auth, OAuth2, etc

This sounds good! I think we will really only need to support OAuth and Session cookie auth, but adding support for Basic would be fairly easy too. However because the modules for Basic Auth aren't included with farmOS core, lets not prioritize that now (although this might be different with D8)

Pluggable transport backend without otherwise affecting API interface; requests, AIOHTTP, Twisted, etc

Yea, interested how we can do this! Also curious what are advantages/disadvantages of the different backends. I want to lean towards whichever makes it easiest for this that you bring up -> "How to support sync/async in the same code base" !

Also, @symbioquine are you familiar with the included python ConfigParser library? I've been using this in the latest releases as it makes it a bit easier to provide default configurations for the various OAuth settings. In certain instances, it might also be nice to save different "Profiles" in one config file - I'm curious what your thoughts are with that. This would be related to the farmOS.create_session_client API sketch - perhaps the client type could be defined in a Config object?

@symbioquine
Copy link
Author

Thanks for all the feedback @paul121!

I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however

Hmmm... there's a tradeoff there. If those types are directly exposed to clients of farmOS.py it won't be abstracting the underlying transport (http/JSON/etc) API anymore.

I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...

This sounds good! I think we will really only need to support OAuth and Session cookie auth, but adding support for Basic would be fairly easy too. However because the modules for Basic Auth aren't included with farmOS core, lets not prioritize that now (although this might be different with D8)

Agreed on ignoring Basic Auth for the time being.

Pluggable transport backend without otherwise affecting API interface; requests, AIOHTTP, Twisted, etc

Yea, interested how we can do this! Also curious what are advantages/disadvantages of the different backends.

Supporting different backends would allow the client to be used directly in existing frameworks/applications. This is especially important for the async cases where it needs to play nicely with whichever event loop is being used to manage the async co-routines.

There are probably other differences between various backend options - performance, proxy configurability, TLS ciphers, etc. - but I don't know how important they are.

I want to lean towards whichever makes it easiest for this that you bring up -> "How to support sync/async in the same code base" !

I have a prototype which already achieves this to some degree although it may require a bit of refining...

Planning to add more detail to this proposal in the next couple days which should capture a structural diagram and the high level interfaces involved.

Also, @symbioquine are you familiar with the included python ConfigParser library? I've been using this in the latest releases as it makes it a bit easier to provide default configurations for the various OAuth settings. In certain instances, it might also be nice to save different "Profiles" in one config file - I'm curious what your thoughts are with that. This would be related to the farmOS.create_session_client API sketch - perhaps the client type could be defined in a Config object?

I've worked with a number of different configuration frameworks in past - including a very long time ago with ConfigParser... These days I'm pretty wary of baking "configuration" into APIs - especially early on or at too low of a level.

  • General purpose config frameworks usually represent a less succinct API than one could create directly in the programming language itself.
  • You already know your clients have tooling for working with Python and presumably chose to work in Python, but you probably don't know how your clients like X kind of config or how nicely it plays with their tooling and problem space.
  • Config frameworks create a lot of extra testing surface area due to the structure/type conversions usually involved.

Don't get me wrong, there are scenarios where externalizing configuration makes sense, I'd just recommend waiting until we have a very distinct group of use-cases which are hard to serve any other way and their criteria upon which the API can be crafted. Even then it is probably wise to make the configuration functionality an optional "convenience" part of the API rather than a core part.

@paul121
Copy link
Member

paul121 commented Feb 6, 2020

I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...

Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??

I've been working a lot with the Pydantic library and would be excited to use that here for data validation! I actually just recently learned about its Settings Management features:

This makes it easy to:
- Create a clearly-defined, type-hinted application configuration class
- Automatically read modifications to the configuration from environment variables
- Manually override specific settings in the initialiser where desired (e.g. in unit tests)

... Which might be an alternative to ConfigParser.. re: an API "configuration"...

Don't get me wrong, there are scenarios where externalizing configuration makes sense, I'd just recommend waiting until we have a very distinct group of use-cases which are hard to serve any other way and their criteria upon which the API can be crafted. Even then it is probably wise to make the configuration functionality an optional "convenience" part of the API rather than a core part.

Yeah, I agree that the configuration should be an optional "convenience" feature. Supporting Drupal Session Auth & OAuth Auth got to be fairly complicated from a single API level because parameters like username and password could be used by either session type. OAuth in particular can require quite a few parameters, so the Config helps clean that up. But I really like your sketch of doing something like client = farmOS.create_oauth_session(...)!

A good use case for an API Configuration would be using the library to POST sensor data. In the case of the simple Sensor Listener asset, the hostname, publickey and privatekey could be saved in config, and read only when a python script is run.

I have a prototype which already achieves this to some degree although it may require a bit of refining...
Planning to add more detail to this proposal in the next couple days which should capture a structural diagram and the high level interfaces involved.

Great!! Excited to continue thinking this through! 👍

@jgaehring
Copy link
Member

Skimming through this a bit since I saw the topic of JSON Schema was mentioned...

I would really love to have some farmOS Data Types! There's some discussion at farmOS/farmOS#231 about adding JSON Schema definitions to an API endpoint. This way, it might be possible to ingest the data type schemas (which might vary from one farmOS server to another) and generate the data models then. This will have to wait until farmOS 2.x, however

Hmmm... there's a tradeoff there. If those types are directly exposed to clients of farmOS.py it won't be abstracting the underlying transport (http/JSON/etc) API anymore.
I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...

Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??

I'm not sure if this speaks to your concern, depending on exactly what you meant by the "transport API", but the approach I've been taking with Field Kit is to take the schema data as a partially applied dependency, which could come from any sort of source, whether it's over the wire via http, or whether that schema is stored on disk, or what have you. Then that spits out some functions for creating and updating logs based on that schema. I've got a gist that demonstrates the basic usage I'm imaging (the actual implementation is still in the works): https://gist.github.com/jgaehring/8799af782404fdf8abbaa116b3e97614

I suppose this is not the purest form of type definition, b/c it still assumes JSON as the data interchange format, but I feel like if we're able to abstract the data model to JSON Schema, that brings us at least a step closer to something that would be truly format agnostic. I'm curious, do you know of any other methods of providing a spec for such data models that would be less coupled to a particular format?

In the meantime, I've opened a new issue, farmOS/farmOS#243, specifically for adding JSON Schema to farmOS.

I have among my long-term aspirations the dream of creating a local-first implementation of farmOS that could support the farmOS data model while not ever requiring connection to a farmOS server. It's my hope that the farmLog library I'm sketching out in the above gist could become the germ for such an application, where log types (as well as asset types etc) could be defined by providing a similar type of schema, while also preserving the modularity of farmOS. So far JSON Schema seems like a powerful way to attain that, but I'd love to know alternatives.

@symbioquine
Copy link
Author

symbioquine commented Feb 11, 2020

@paul121

I should update the proposal's design objectives to be clearer on the abstraction point so we can hash this out more fully. My thought was that the types should be part of farmOS.py so that it can abstract between farmOS versions and hide ugly parts of the transport API. e.g. why is the primary id field of an area called "tid"...

Hmm yea, I see what you're saying. Perhaps certain aspects like the "tid" could be abstracted, while also pulling the schemas from the farmOS server. A good example for this might be the "Soil Test" log type - this module might not be enabled on the server. Similarly, certain field types on a "log" may be provided by optional modules. Once pulled from the server, maybe these models could still be abstracted??

I want to explore this further in my reply to Jamie's comments, but I'm initially skeptical of the idea of dynamically basing the model types on a schema vended by farmOS at runtime. I think it may be difficult to do that and still provide a useful abstraction to consumers of farmOS.py.

Don't get me wrong...

Yeah, I agree that the configuration should be an optional "convenience" feature. Supporting Drupal Session Auth & OAuth Auth got to be fairly complicated from a single API level because parameters like username and password could be used by either session type. OAuth in particular can require quite a few parameters, so the Config helps clean that up. But I really like your sketch of doing something like client = farmOS.create_oauth_session(...)!

Agreed that shoving all the parameter combinations into a single constructor/factory-method would get really complex - as you noted, I was trying to get around that with a separate static factory-method for each scenario. (I'll add a FAQ entry to the issue description tackling the possible combinatorial explosion of said factory methods.)

That Pydantic settings management functionality looks like a good way to provide the configuration convenience methods we touched on above...

async def create_session_client(url: str, username: str, password: str):
    ....

async def create_session_client_from_settings(settings: FarmOSSessionClientSettings):
    return create_session_client(**settings.dict())

A good use case for an API Configuration would be using the library to POST sensor data. In the case of the simple Sensor Listener asset, the hostname, publickey and privatekey could be saved in config, and read only when a python script is run.

I'm interested to learn more about this use-case, but my initial reaction is that the actual sensor integration and accompanying environment might be out of scope for farmOS.py.

I would imagine the level of support that farmOS.py might need to provide is;

async with farmOS.create_asyncio_sensor(url="https://farmos.local", public_key="xxxxxx", private_key="yyyyyy") as sensor:
    await sensor.publish(data={"temperature": 76.5, "humidity": 60})
    await sensor.publish(data={"temperature": 76.5, "humidity": 60}, datetime=datetime.fromtimestamp(1541519720))

or;

sensor_settings = FarmOSSensorSettings.parse_file("/path/to/sensor_settings.json")

async with farmOS.create_asyncio_sensor_from_settings(sensor_settings) as sensor:
    ...

@symbioquine
Copy link
Author

@jgaehring, @paul121

I'm not sure if this speaks to your concern, depending on exactly what you meant by the "transport API"

I mean the REST/JSON/Drupal-entity API represents one abstraction boundary which aligns with what actually travels over the wire - via the http(s) network transport.

In its current incarnation, that "transport" API probably cannot be expected to abstract a wide range of current/future farmOS versions since it is partially an implementation detail of the underlying Drupal framework and likely needs to be able to change in order to allow farmOS itself to grow/evolve.

On the other hand, farmOS.py represents an different kind of API boundary which could provide a stable abstraction to its consumers over a (reasonably) wide range of farmOS versions. To put this more concretely, ideally none of the code in farmOS/farmOS-aggregator or symbioquine/farm-os-area-feature-proxy should need to change to support the D8 version (and be backwards compatible with the D7 version) of farmOS if we do this correctly. (Other than making sure to use a new enough version of farmOS.py.)

but the approach I've been taking with Field Kit is to take the schema data as a partially applied dependency, which could come from any sort of source, whether it's over the wire via http, or whether that schema is stored on disk, or what have you. Then that spits out some functions for creating and updating logs based on that schema. I've got a gist that demonstrates the basic usage I'm imaging (the actual implementation is still in the works): https://gist.github.com/jgaehring/8799af782404fdf8abbaa116b3e97614

I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.

I suppose this is not the purest form of type definition, b/c it still assumes JSON as the data interchange format, but I feel like if we're able to abstract the data model to JSON Schema, that brings us at least a step closer to something that would be truly format agnostic. I'm curious, do you know of any other methods of providing a spec for such data models that would be less coupled to a particular format?

I'm curious about the motivation for decoupling from JSON? This seems like a perfectly fine hard dependency to me - given the ubiquity of JSON and how well it is supported over a wide range of languages/tools. Do you know of any use-case for farmOS.py which cannot use JSON as a transport/interchange format?

In the meantime, I've opened a new issue, farmOS/farmOS#243, specifically for adding JSON Schema to farmOS.

Even in light of what I've said above, I agree that this would a be a useful feature.

I have among my long-term aspirations the dream of creating a local-first implementation of farmOS that could support the farmOS data model while not ever requiring connection to a farmOS server.

I'd like to hear more about this. I've been considering something that seems related - that it would be useful to have a "mock/headless farmOS" reference implementation that just keeps the data in memory and hosts the transport API against which libraries like farmOS.py could be tested.

@symbioquine
Copy link
Author

@paul121 @jgaehring Also I've updated the issue description to add more examples, and add diagrams showing how the dependencies/methods of the parallel async/sync APIs could be organized.

@symbioquine
Copy link
Author

Added a references section, diagrams for the top-level module's methods, and a couple more FAQs.

Also tagging @mstenta for comment since he was active on symbioquine/farm-os-area-feature-proxy#1

@paul121
Copy link
Member

paul121 commented Feb 12, 2020

@symbioquine Really appreciate all the thought you've put into this!! 💪 Overall, I think this proposal is great and would bring some really great features to the python library!

re: recent changes..

Why expose pagination explicitly?
...
It would be really challenging or impossible to completely hide the pagination and still provide that flexibility.

I like the idea of exposing pagination via the API. Currently, pagination is "exposed" via the returned response (see #15) and allows the consumer to request specific pages by passing a page into the filters object (see in code here). Admittedly this is a bit "hidden" and not very intuitive. We implemented this as a means of trying to match the API between the JS and Python libraries - but I think this is a good example of how the library should differ in favor of improving the "Pythonic" API 👍

Why are there no "get all" or "query all" methods"
...
This would unnecessarily increase the "surface area" of the API while encouraging unperformant usage patterns that hold all entities/records in memory.
Consumers which require the entities/pages as collections can easily do so themselves;

all_areas = list(farm.area.iterate_query(filters={'area_type': 'field'}))

The current implementation defaults to a "get all" action unless a page filter is provided. This likely became the default action as a result of not having other exposed pagination features... which does encourage unperformant usage patters. Seeing this example is really quite convincing.. a great example of iterators! 👍

How would the proposed API support retrieving multiple records by id in a single request - i.e. #14?
A better approach would be to support this via the filters argument to the query_page, iterate_query_pages, iterate_query methods.

👍 👍 Great examples in the FAQ! Another benefit of the filters argument is that it could still allow supplying a page filter as described above.

I'm still thinking on best ways to support Settings.. it would be great to avoid the second API method. I think think this would work just fine?

# Build settings object from env var or read from file
settings = FarmOSOAuthClientSettings()

async def create_oauth_client(**settings):
    ....

Re: JSON Schema....

I think our general motivation for JSON Schema has been to support record validation in the client libraries. The motivation for this comes from different log types having different fields: (screenshot of /farm.json output, which documents simple schema specs, but not JSON Schema compliant)
Screen Shot 2020-02-12 at 12 14 29 AM

This is important because some farmOS servers could have the soil_test module disabled, and thus not support the farm_soil_test log type. Similarly, the farm_quantity module could be disabled and not add quantity fields to the above log types. (correct me if I'm wrong @mstenta )

I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.

From my understanding, consumers will always be responsible for checking the supported record types on a farmOS server they connect to - which is different than "version compatibility" of the farmOS server. What we're trying to achieve is some additional validation in the library that can adapt to the farmOS server on runtime. I think this is possible!

Pydantic has a cool feature for Dynamic model creation :

There are some occasions where the shape of a model is not known until runtime. For this pydantic provides the create_model method to allow models to be created on the fly.

This is really great because the dynamic model can extend a base model. So..

  • Create a base log model
  • At runtime, request /farm.json and create dynamic models for additional log/asset/etc record types.
  • I think the FarmOSLogClient class @symbioquine has proposed could return any model that extends the base log model
  • When creating a log, the client would check the log_type and validate with the specified dynamic model

Note, the dynamic create_model method of Pydantic doesn't support JSON Schema, although there is an issue describing this and they are open to supporting that. I don't think it would be required for our use because we could check /farm.json (even as is without JSON Schema) for additional log types, and generate dynamic models based on available fields.

These thoughts are pretty conceptual, but I think that it might work. Worst case, the base log model could allow other fields and still facilitate sending/retrieving logs of any kind from the server, although with less support for validation.

@symbioquine
Copy link
Author

@paul121 Really appreciate all the thoughts/feedback here!

The current implementation defaults to a "get all" action unless a page filter is provided. This likely became the default action as a result of not having other exposed pagination features... which does encourage unperformant usage patters. Seeing this example is really quite convincing.. a great example of iterators! +1

How would the proposed API support retrieving multiple records by id in a single request - i.e. #14?
A better approach would be to support this via the filters argument to the query_page, iterate_query_pages, iterate_query methods.

+1 +1 Great examples in the FAQ! Another benefit of the filters argument is that it could still allow supplying a page filter as described above.

I think we should disallow controlling the pagination via the filters argument since that would yield code that is coupled to how the D7 RestWS module does entity pagination. The limit and page arguments should have dedicated arguments separate from filters that affect which entities get returned.

For example;

page = farm.area.query_page(filters={'area_type': 'field'}, page_num=3, max_page_size=10)

Calls farmOS as;

http://farmos.local/taxonomy_term.json?bundle=farm_areas&limit=10&page=3

I'm still thinking on best ways to support Settings.. it would be great to avoid the second API method. I think think this would work just fine?

# Build settings object from env var or read from file
settings = FarmOSOAuthClientSettings()

async def create_oauth_client(**settings):
    ....

Seems like a good simplification as long as we document it well enough. My thought was that the method is very cheap to implement and provides a nice top-level place to put that documentation in the code...

Re: JSON Schema....

I think our general motivation for JSON Schema has been to support record validation in the client libraries. The motivation for this comes from different log types having different fields: (screenshot of /farm.json output, which documents simple schema specs, but not JSON Schema compliant)

...

This is important because some farmOS servers could have the soil_test module disabled, and thus not support the farm_soil_test log type. Similarly, the farm_quantity module could be disabled and not add quantity fields to the above log types. (correct me if I'm wrong @mstenta )

These are good points. I'll need to put a bit more thought into this part...

From my understanding, consumers will always be responsible for checking the supported record types on a farmOS server they connect to - which is different than "version compatibility" of the farmOS server.

I think you're right that there are several connected concerns here.

What we're trying to achieve is some additional validation in the library that can adapt to the farmOS server on runtime. I think this is possible!

Validation makes total sense, we can't/shouldn't get around the fact that some farmOS installations may support a sub/superset of the fields. Especially when writing data to farmOS, consumers of farmOS.py will necessarily be coupled to the abstract feature set of the specific farmOS installation(s) they are working with. e.g. As you said above, you can't write log records with quantities if the farm_quantity module isn't available.

I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.

To make this a bit more concrete, as a consumer of data via farmOS.py I would hope I don't have to be defensive about the existence of the quantities field on a FarmOSLog object.

For example, instead of writing;

for quantity in getattr(log, 'quantities', []):
    print(quantity)

I should be able to write;

for quantity in log.quantities:
    print(quantity)

If the farm_quantity module isn't available then the quantities field should just be an empty list - which is a case I already have to deal with since not all logs would necessarily have quantities anyway.

These thoughts are pretty conceptual, but I think that it might work. Worst case, the base log model could allow other fields and still facilitate sending/retrieving logs of any kind from the server, although with less support for validation.

Agreed that we should try not to make farmOS.py a barrier to working with any of the data in a reasonably canonical farmOS installation.

@jgaehring
Copy link
Member

I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.

Yea, this is exactly what I'm dealing with in my farmLog module, and what JSON Schema goes a long ways in solving (in my context at least). Generally speaking, I think @paul121 is right that this is not so much a version compatibility issue, but more a dependency issue: specifically, how do we communicate dependencies and their required configurations across the client-server boundary? The fact that dependencies are going to bleed across that boundary seems unavoidable, and certainly produces its own issues, particularly if one of your goals is to abstract away the network protocols by which those dependencies are communicated.

I guess one way in which we're solving this with Field Kit is that we're mirroring the modularity of the server with similar modularity in the client. As we move further towards the Field Module architecture, this means that Field Kit itself makes fewer and fewer assumptions about the server's configuration, and instead each field module becomes more responsible for declaring its own dependencies explicitly. And again, I'm also allowing for some of that modularity by structuring farmLog to take a schema as a runtime dependency, which is a slightly different way of dealing with the modular dependency issue.

But that's application code. I'm not sure if it would be possible to take a similar modular approach with the farmOS.py library, but perhaps it's a thought?

Also, I'm still not sure about this:

I like where you're going with this, but to achieve what I've described above, the dependency would need to be a build-time dependency. i.e. model types or code could be generated from various schema at build-time, but unless I'm missing something, I don't think it would work at runtime without pushing version compatibility off on the consumers of farmOS.py.

It seems to me that there will always be some details (eg, configuration, modules, dependencies, etc) that need to be shared knowledge in a server-client architecture, and that cannot always be available at build-time. But I also don't see runtime dependencies as a barrier to abstraction. Maybe I'm still missing something though?

@symbioquine
Copy link
Author

@jgaehring Thanks for the inputs!

But that's application code. I'm not sure if it would be possible to take a similar modular approach with the farmOS.py library, but perhaps it's a thought?

Yes, I think this is a key point. Field Kit can safely make certain assumptions and provide behavior which effectively yield an implied interface. The contract for that interface is: "If your farmOS installation behaves like X, these N features will be available in farmOS.app". Users of farmOS.app then presumably dynamically adapt their behavior based on the available features - and (hopefully) that feels natural to them.

An API like farmOS.py is most useful though if it can reduce the amount of variables consumers have to deal with - and where that's not possible, provide predicable ways to deal with those variables.

It seems to me that there will always be some details (e.g., configuration, modules, dependencies, etc) that need to be shared knowledge in a server-client architecture, and that cannot always be available at build-time. But I also don't see runtime dependencies as a barrier to abstraction. Maybe I'm still missing something though?

Certain kinds of runtime dependencies are a barrier to abstraction. For example, I'm skeptical that it would be wise to generate the actual model Python classes at runtime since that would create the issue I described above where one has to be defensive when consuming farmOS.py even for the most basic attribute access cases.

Whether given modules are actually enabled and whether the corresponding data can ever be returned or saved from/to a particular farmOS installation is definitely only known at runtime.

@paul121
Copy link
Member

paul121 commented Feb 13, 2020

The limit and page arguments should have dedicated arguments separate from filters that affect which entities get returned.

  page = farm.area.query_page(filters={'area_type': 'field'}, page_num=3, max_page_size=10

Yea, this is nice. More explicit 👍

specifically, how do we communicate dependencies and their required configurations across the client-server boundary?

@symbioquine @jgaehring I think we're finally reaching the nitty-gritty details of this, great discussion 😃

Field Kit is a good example of why these server "dependencies" should be passed through the client to the consumer:

Users of farmOS.app then presumably dynamically adapt their behavior based on the available features - and (hopefully) that feels natural to them.

In general, I think farmOS.py should operate similarly. But perhaps there are different use cases. The quantity use case is a good example:

To make this a bit more concrete, as a consumer of data via farmOS.py I would hope I don't have to be defensive about the existence of the quantities field on a FarmOSLog object.

If the farm_quantity module isn't available then the quantities field should just be an empty list - which is a case I already have to deal with since not all logs would necessarily have quantities anyway.

For some use cases (I'm hesitant to say most) I think this behavior (always returning an empty list/adding other fields to logs) could be problematic. As a consumer, if you're using log quantities, wouldn't that mean your software/app/use-case is at least semi-dependent on the farm_quantity module being installed? And you should decide how to handle that, not the library? Perhaps you tell the user "Error: farm_quantity must be enabled to graph crop harvest data", OR if it isn't as critical, you can decide to parse all logs with an empty quantity list if that is easier.

Also, I think its worth noting if a log type has a quantity field, it returns an empty list [] if no quantity is provided. Adding an empty quantity [] to logs of all types seems to defeat the purpose of having log types, as you would then want to set all other possible fields when returning records to the consumer?

Perhaps a good example is the farmOS-Aggregator which is designed to simply relay the records retrieved by farmOS.py. If farmOS.py added an empty quantity to all logs, that might be problematic to consumers of the aggregator if they were to then try and "update" that log with a quantity value. (The aggregator might be somewhat of a unique example, but it also was a motivation for building farmOS.py)

I'm mostly worried about consumers of farmOS.py not being able to predict the structure of the model objects they get back - or the structure of the objects they can create locally.

This seems like a general challenge 3rd parties working with farmOS will have to work around. Most of the time, it can be expected that most of the core modules will be available. But thinking to the future, very custom modules could provide additional fields to farmOS records for specific use cases. The farmOS client libraries should still support these extensions.. but that might require consumers to be "defensive" about the structure of the models that are returned. If the library makes decisions that change the models the farmOS server is actually returning, isn't that a problem? 🤔

Regarding creating records, a quick test found that this request...
( creating a farm_soil_test log with an illegal seed_source property that is used on farm_seeding logs)

new_log = farm.log.send({'name': "New test log.", 'type':'farm_soil_test', 'done':1, 'seed_source':'blah'})

gets this response from the server ... '406 Not Acceptable: Unknown data property field_farm_seed_source.' (and farmOS.py returns nothing! Not good. The error response from server should at least be returned through to the consumer.) I think we all agree it would be great validation in the library could prevent a request like this being sent to the server.

Question: are there any use cases where a consumer cannot be defensive? Is that inconvenience worth manipulating the response from the server? Perhaps this could be a configuration for the library, to return "raw" data, or (for lack of a better term) "clean" data?

@symbioquine I must admit that I might be stuck in thinking about only a subset of use-cases for farmOS.py. In general I think of two use cases:

  • A library that facilitates communication with a farmOS server for the backend of an application that displays, modifies or generates farmOS records. (Where the user interacts with a web browser)
  • For general scripting purposes as it makes the process of bulk importing/exporting farm data much easier (when working with custom CSV files, for example)
  • A third is potentially using the library to upload sensor data where it is already collected in a python environment (currently this doesn't require authentication with farmOS - so this might be better provided by a separate library, it would really just be a convenience wrapper around a standardized http request. But, imo, worth considering including here)

I'm glad you created the forum post to collect more thoughts on what is expected of the library! I might list my thoughts on use-cases there.

@mstenta
Copy link
Member

mstenta commented Mar 20, 2020

Wow @symbioquine this is great! And so dense with ideas and considerations haha! Great job getting into all the details @paul121 and @jgaehring! I can't think of much more to add. There may be a few little details here and there that might need minor correction/clarification, but I think the overall issues remain the same. For example: the quantity module is a hard dependency of many log types, so the quantity field will not be missing on some sites and there on others. BUT, the "equipment used" field and the new "data" field ARE added dynamically depending on whether or not their module is enabled. Now this in itself is up for debate and I may be swayed to make them not optional if that simplifies things... the data field is an easy one... the "equipment used" field would put a hard dependency on the Equipment module, however, which I'm less sure about. As you can see we could probably spin off on lots of different tangents with these thoughts. :-)

In general, I love all the goals you laid out @symbioquine - and figuring out the details/considerations to move us towards them, even if that is done iteratively, is worthwhile IMO.

@mstenta
Copy link
Member

mstenta commented Mar 20, 2020

And of course... we should also consider the changing API options in Drupal 8. We will probably be adding the GraphQL module, in addition to the core JSON:API module, which will open up new possibilities... it would be worth playing around with some of those things early in this process as well.

@paul121 paul121 mentioned this issue Sep 25, 2020
@paul121 paul121 added this to the v2 milestone Feb 12, 2024
@paul121
Copy link
Member

paul121 commented Feb 12, 2024

We have some funding to make improvements to farmOS.py and farmOS-aggregator over the next few months. I'm tracking these in a v2 milestone: https://github.com/farmOS/farmOS.py/milestone/1

This issue is a little dated now but still has lots of great info and ideas. Some of these things have already been implemented in farmOS/farmOS.py, but certainly not everything (see initial issue description for a nice summary). There a couple ideas from this issue I'd love to pick up in this next around of work:

  • Pluggable authentication support (really a "must have" in this work package, specifically targeting support for OAuth Client Credentials grant in addition to password)
  • Pluggable transport backend. At minimum a good option for async support, perhaps via HTTPX, which provides both sync and async APIs and has an API very similar to Requests. I wonder if the Requests API could roughly be the abstraction for a transport backend.
  • More robust data models + verification of data is quite interesting but not likely to be included in this work. As possible would love to give it some thought. I think this issue was partially concerned with compatibility between farmOS v1 and v2 so some specifics might be a moot point now, but in general still great things to consider. Add models for farmOS record types #25

I think we could close this issue but will leave that for you @symbioquine - I'd prefer to move forward in some smaller issues but want to make sure we don't dismiss any large topics you mentioned here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants