Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] Allow Content Type JSON Lines #429

Closed
Marcelo00 opened this issue Apr 22, 2024 · 12 comments
Closed

[REQ] Allow Content Type JSON Lines #429

Marcelo00 opened this issue Apr 22, 2024 · 12 comments

Comments

@Marcelo00
Copy link

Marcelo00 commented Apr 22, 2024

Is your feature request related to a problem? Please describe.

One of our endpoints provides a stream with content type application/json-lines based on this format. One example of the returned data would be

b'' {'data': {'test_attribute_1':' example', 'test_attribute_2': 'example 2'}}\n{'data': {'test_attribute_1': 'example 3', 'test_attribute_2': 'example 4'}}\n{"end": true}

Currently, the regex used to figure out if the content type is based on json will also match with the above used type. Consequently, it will call json.loads(response.data) which leads to an error as the byte includes multiple jsons.

In general, what is your approach for supporting different content types?

Describe the solution you'd like

It would be nice, if it could support this content type. The deserialization could then look something similar to this

all_data = []
for w in data.split(b'\n'):
    all_data.append(json.loads(w))

However, I am not sure how such content should be validated.

Describe alternatives you've considered

If we just use application/octet-stream as the content type, I will get an error in the next validate_base step:
uai_annotation_store_client.exceptions.ApiTypeError: Invalid type. Required value type is str and passed type was FileIO at ['args[0]']

Additional context

@spacether
Copy link
Contributor

Why are you sending json lines data as binary when plain text will work? It says that it is utf8 encoded so it could be string. Where is the spec definition of that content type and payload?

@spacether
Copy link
Contributor

The approach to supporting different content types can be seen in the response body deserializer. They are explicitly handled on a case by case basis for types like

  • plain text
  • json
  • octet stream
  • multipart form data

@Marcelo00
Copy link
Author

Why are you sending json lines data as binary when plain text will work? It says that it is utf8 encoded so it could be string.

I am not sure as I joined the project after they decided on this content type. As it is not really a standardized content type, we are currently discussing if we should replace it with something else.

If we decide to stick with this content type, is it possible to support it in this library or is it required to be one of the more standardized types like the application/json? There is one other type that could be useful in our case.

I also had a quick look on the deserializer of the response but I only find the cases for the last three content types but not for the plain text. Did I miss something?

@spacether
Copy link
Contributor

My mistake, plain text is not on the list in python.

@spacether
Copy link
Contributor

spacether commented Apr 23, 2024

So my preference is not to support undefined content types unless there is significant prior work showing how the content type is sent and significant user need (lots of people want it).

Both of these look to be streamed json responses. Why not just get back the raw response, and deserialize it manually in a helper that you define? It is not clear how to handle streams in openapi. Should a function consume the response until it ends? What if it never ends? How should one terminate consumption of the response data early?

One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

@Marcelo00
Copy link
Author

There is apparently some traction on officially supporting streaming response in the OpenAPI specification. They will have a meeting tomorrow where they, among other things, discuss on how to support it. For more information see this issue and PR.

One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

This would also mean that the validation is not automatically checked by the library but the user needs to do it manually after iterating on it? I think for our use case it is sufficient if we have a way to just get the response from the server without the validation.

@spacether
Copy link
Contributor

When iterating the validation would be run

@Marcelo00
Copy link
Author

Should a function consume the response until it ends? What if it never ends? How should one terminate consumption of the response data early? One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

Do you have a more detailed plan on how you would implement the functionality?

In our use case, we could work with either getting the raw response or supporting a different content type like json sequence.

As we need the streaming endpoint to work, is there a way I can help you with?

@spacether
Copy link
Contributor

spacether commented Apr 29, 2024

My responses described that a context would be returned and methods could be called on it to yield validated results.
Json sequence is an acceptable feature add to the code base because it has a rfc.

Paths forward here are

  1. you calling existing raw response returning methods and deserializing the bytes like you describe. You can validate payloads using document defined schemas.
  2. you submitting a PR with a proposed feature
  3. Me submitting a PR with the feature. I am applying to jobs at this time. If this was something that you want, you paying me for the work would be motivating. Otherwise my suggestion is option 1 or 2.

What were the results of the openapi meeting?

@spacether
Copy link
Contributor

@Marcelo00 never heard back from you here. How would you like to move forward with this?

@Marcelo00
Copy link
Author

Marcelo00 commented May 14, 2024

Sorry, I forgot to inform you about our decision. For our use case it was sufficient enough to just get the raw response back.

I also watched a part of the recent openAPI meeting but it seems that it takes more time until the different streaming content types (such as jsonlines) are official supported by openAPI. However, version 3.0.4, 3.1.1 and 3.2.0 support two format options of the type string that can be used to define either bytes or binary depending on the actual content (see this link for the version 3.0.4). The PR I previously posted is also merged.

@spacether
Copy link
Contributor

Closing this issue because the end user can use existing functionality (receive raw response and iterate through body deserializing each line of content using openapi document defined schemas) to meet their needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants