New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check input is valid JSON without full parsing #1780
Comments
I'm still interested in this use-case for simdjson, so seemed worth converting into an Issue for tracking purposes. |
It is entirely valid. We have not done work toward this use case at this point in time. Can you tell us more about your use case? I am not sure why you'd ever want to check validity without also parsing. If you can provide motivation, this could help this issue along. |
Thanks Daniel. The primary use-case is a database system (key-value store) which lets users store documents in two main datatypes -
Clients can write new documents to the server in either binary or JSON; the server (which is the code I'm responsible for) wants to be able to verify the type of document at the point the client writes it. We don't blindly trust the datatype the client specifies, primarily because of data consistency issues - for example we don't want another client at some point in the future trying to manipulate a supposedly "JSON" document, then getting an error that we cannot parse the field(s) they are trying to access because the previous client sent us invalid JSON some time in the past. Broadly speaking, the point at which we accept and store some new JSON data isn't necessarily when it is manipulated, and we want to check it is valid up-front, not necessarily wait until the fields of the JSON data are accessed. |
Great answer. |
Use cases I can think of: Datastores that store json without parsing as the other commentor said (and hypothetically, database/service clients that validate before sending JSON to such datastores, or before making network calls to avoid calling external APIs (such as datastores) that expect JSON) - though I haven't had those real-world use cases personally
That would also make it easier to avoid the need for an arbitrary depth limit (no longer need to allocate the additional 8-byte
This would let you reduce the amount of memory needed when setting a depth limit for the rare use cases where you want to ignore any sort of depth limit (set depth to
If this continued to have a depth limit, that'd be possible - it'd be proportional to (because |
Discussed in #1393
Originally posted by daverigby January 14, 2021
Hi all,
As per subject, I'm wondering what's the most efficient way to use simdjson to simply check an input is valid JSON, without actually parsing out the content. Essentially the same use-case as http://www.json.org/JSON_checker/
Ideally this would require state which isn't proportional to the size of the input document; for my use-case I need to validate many MB JSON files without consuming a similar amount of memory.
Thanks in advance.
The text was updated successfully, but these errors were encountered: