[REQ] Allow Content Type JSON Lines #429

Marcelo00 · 2024-04-22T17:44:01Z

Is your feature request related to a problem? Please describe.

One of our endpoints provides a stream with content type application/json-lines based on this format. One example of the returned data would be

b'' {'data': {'test_attribute_1':' example', 'test_attribute_2': 'example 2'}}\n{'data': {'test_attribute_1': 'example 3', 'test_attribute_2': 'example 4'}}\n{"end": true}

Currently, the regex used to figure out if the content type is based on json will also match with the above used type. Consequently, it will call json.loads(response.data) which leads to an error as the byte includes multiple jsons.

In general, what is your approach for supporting different content types?

Describe the solution you'd like

It would be nice, if it could support this content type. The deserialization could then look something similar to this

all_data = []
for w in data.split(b'\n'):
    all_data.append(json.loads(w))

However, I am not sure how such content should be validated.

Describe alternatives you've considered

If we just use application/octet-stream as the content type, I will get an error in the next validate_base step:
uai_annotation_store_client.exceptions.ApiTypeError: Invalid type. Required value type is str and passed type was FileIO at ['args[0]']

Additional context

The text was updated successfully, but these errors were encountered:

spacether · 2024-04-22T18:04:57Z

Why are you sending json lines data as binary when plain text will work? It says that it is utf8 encoded so it could be string. Where is the spec definition of that content type and payload?

spacether · 2024-04-22T18:33:05Z

The approach to supporting different content types can be seen in the response body deserializer. They are explicitly handled on a case by case basis for types like

plain text
json
octet stream
multipart form data

Marcelo00 · 2024-04-23T14:52:33Z

Why are you sending json lines data as binary when plain text will work? It says that it is utf8 encoded so it could be string.

I am not sure as I joined the project after they decided on this content type. As it is not really a standardized content type, we are currently discussing if we should replace it with something else.

If we decide to stick with this content type, is it possible to support it in this library or is it required to be one of the more standardized types like the application/json? There is one other type that could be useful in our case.

I also had a quick look on the deserializer of the response but I only find the cases for the last three content types but not for the plain text. Did I miss something?

spacether · 2024-04-23T17:35:41Z

My mistake, plain text is not on the list in python.

spacether · 2024-04-23T17:42:27Z

So my preference is not to support undefined content types unless there is significant prior work showing how the content type is sent and significant user need (lots of people want it).

Both of these look to be streamed json responses. Why not just get back the raw response, and deserialize it manually in a helper that you define? It is not clear how to handle streams in openapi. Should a function consume the response until it ends? What if it never ends? How should one terminate consumption of the response data early?

One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

Marcelo00 · 2024-04-24T09:59:29Z

There is apparently some traction on officially supporting streaming response in the OpenAPI specification. They will have a meeting tomorrow where they, among other things, discuss on how to support it. For more information see this issue and PR.

One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

This would also mean that the validation is not automatically checked by the library but the user needs to do it manually after iterating on it? I think for our use case it is sufficient if we have a way to just get the response from the server without the validation.

spacether · 2024-04-24T14:42:32Z

When iterating the validation would be run

Marcelo00 · 2024-04-29T15:27:41Z

Should a function consume the response until it ends? What if it never ends? How should one terminate consumption of the response data early? One way to return the data would be to return an io.IOBase context manager, that way the calling code could iterate on it and be responsible for closing it.

Do you have a more detailed plan on how you would implement the functionality?

In our use case, we could work with either getting the raw response or supporting a different content type like json sequence.

As we need the streaming endpoint to work, is there a way I can help you with?

spacether · 2024-04-29T15:58:31Z

My responses described that a context would be returned and methods could be called on it to yield validated results.
Json sequence is an acceptable feature add to the code base because it has a rfc.

Paths forward here are

you calling existing raw response returning methods and deserializing the bytes like you describe. You can validate payloads using document defined schemas.
you submitting a PR with a proposed feature
Me submitting a PR with the feature. I am applying to jobs at this time. If this was something that you want, you paying me for the work would be motivating. Otherwise my suggestion is option 1 or 2.

What were the results of the openapi meeting?

spacether · 2024-05-13T16:59:52Z

@Marcelo00 never heard back from you here. How would you like to move forward with this?

Marcelo00 · 2024-05-14T14:11:40Z

Sorry, I forgot to inform you about our decision. For our use case it was sufficient enough to just get the raw response back.

I also watched a part of the recent openAPI meeting but it seems that it takes more time until the different streaming content types (such as jsonlines) are official supported by openAPI. However, version 3.0.4, 3.1.1 and 3.2.0 support two format options of the type string that can be used to define either bytes or binary depending on the actual content (see this link for the version 3.0.4). The PR I previously posted is also merged.

spacether · 2024-05-14T16:23:21Z

Closing this issue because the end user can use existing functionality (receive raw response and iterate through body deserializing each line of content using openapi document defined schemas) to meet their needs.

spacether closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQ] Allow Content Type JSON Lines #429

[REQ] Allow Content Type JSON Lines #429

Marcelo00 commented Apr 22, 2024 •

edited

spacether commented Apr 22, 2024

spacether commented Apr 22, 2024

Marcelo00 commented Apr 23, 2024

spacether commented Apr 23, 2024

spacether commented Apr 23, 2024 •

edited

Marcelo00 commented Apr 24, 2024

spacether commented Apr 24, 2024

Marcelo00 commented Apr 29, 2024

spacether commented Apr 29, 2024 •

edited

spacether commented May 13, 2024

Marcelo00 commented May 14, 2024 •

edited

spacether commented May 14, 2024

[REQ] Allow Content Type JSON Lines #429

[REQ] Allow Content Type JSON Lines #429

Comments

Marcelo00 commented Apr 22, 2024 • edited

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

spacether commented Apr 22, 2024

spacether commented Apr 22, 2024

Marcelo00 commented Apr 23, 2024

spacether commented Apr 23, 2024

spacether commented Apr 23, 2024 • edited

Marcelo00 commented Apr 24, 2024

spacether commented Apr 24, 2024

Marcelo00 commented Apr 29, 2024

spacether commented Apr 29, 2024 • edited

spacether commented May 13, 2024

Marcelo00 commented May 14, 2024 • edited

spacether commented May 14, 2024

Marcelo00 commented Apr 22, 2024 •

edited

spacether commented Apr 23, 2024 •

edited

spacether commented Apr 29, 2024 •

edited

Marcelo00 commented May 14, 2024 •

edited