Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimetype for CSV Sparql Query Results should use correct encoding as defined in the Specification #4856

Open
pajoma opened this issue Dec 13, 2023 · 0 comments
Labels
🐞 bug issue is a bug specification issues related to compliance to standards and external specs

Comments

@pajoma
Copy link

pajoma commented Dec 13, 2023

Current Behavior

The query results are encoded in UTF-8:

public static final TupleQueryResultFormat CSV = new TupleQueryResultFormat("SPARQL/CSV", List.of("text/csv"),
     StandardCharsets.UTF_8, List.of("csv"), SPARQL_RESULTS_CSV_URI, NO_RDF_STAR);

The specification says:

Systems providing these formats should note that the content types for CSV is text/csv and for TSV text/tab-separated-values. Being text/*, the default character set is US-ASCII. The charset parameter should be used in conjunction with SPARQL Results; UTF-8 is recommended: text/csv; charset=utf-8 and text/tab-separated-values; charset=utf-8.

But the mimetype exposed by RDF4J is "text/csv" (in SparqlMimeTypes)

public static final String CSV_VALUE = "text/csv";

UTF-8 is obviously the correct choice, but standard clients like the python requests library are assuming "ISO-8859-1" for the Content Type "text/csv".

I can modify the rest controllers to not use the standard RDF4J mimetypes, eg.

    @PostMapping(value = "/query", consumes = {MediaType.TEXT_PLAIN_VALUE, SparqlMimeTypes.SPARQL_QUERY_VALUE},
            produces = { SparqlMimeTypes.JSON_VALUE, SparqlMimeTypes.CSV_VALUE+ ";charset=UTF-8"}
    )
    @ResponseStatus(HttpStatus.OK)
    Flux<BindingSet> queryBindingsPost(@RequestBody String query) {...}

but then I have to map from "text/csv;charset=UTF-8" to "text/csv" everywhere else, to get the correct ResultWriters.

Expected Behavior

public static final TupleQueryResultFormat CSV = new TupleQueryResultFormat("SPARQL/CSV", List.of("text/csv"), StandardCharsets.UTF_8, List.of("csv"), SPARQL_RESULTS_CSV_URI, NO_RDF_STAR); 

should be text/csv;charset=utf-8

If "text/csv" remains included, the SPARQLResultsCSVWriter should use "ISO-8859-1" as encoding (with a warning maybe?))

Steps To Reproduce

  1. Expose a sparql endpoint using the standard mimetypes defined in RDF4J
  2. Call it with the python requests library and see, that is encodes the result in "ISO-8859-1"
            response = requests.post(
                url=f"...",
                data=query.encode("utf-8"),
                headers={
                    "X-API-KEY": api_key,
                    "Content-Type": "text/plain",
                    "Accept": "text/csv",
                    "X-Application": scope,
                },
            )
   
            enc = response.encoding  # is "ISO-8859-1", but in reality it is "UTF-8"

Version

4.3.8

Are you interested in contributing a solution yourself?

Perhaps?

Anything else?

No response

@pajoma pajoma added the 🐞 bug issue is a bug label Dec 13, 2023
@hmottestad hmottestad added the specification issues related to compliance to standards and external specs label Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug specification issues related to compliance to standards and external specs
Projects
None yet
Development

No branches or pull requests

2 participants