Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document required encoding of query parameters of search #2515

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Document required encoding of query parameters of search #2515

wants to merge 1 commit into from

Conversation

lucafavatella
Copy link
Contributor

Solr

A note in the documented changes of Solr 4.1.0 regarding portability of Solr across Web containers points out that "Query strings passed in via the URL need to be properly-%-escaped, UTF-8 encoded bytes, otherwise Solr refuses to handle the request".
A note in the documented changes of Solr 4.5.0 mentions parametrization of encoding of query parameters by ie parameter (e.g. ie=iso-8859-1), parametrization of encoding of POST request body by Content-Type header (e.g. application/x-www-form-urlencoded; charset=iso-8859-1), and UTF-8 as the default encoding.
As of Solr 4.10.4 UTF-8 is still the default encoding for both query parameters and POST request body.

Riak Search

The version of yokozuna in riak kv 2.2.3 is 2.1.10
that integrates Solr 4.10.4
(see also basho/yokozuna@7f0d464)
whose documentation is available online.

Yokozuna 2.1.10 depends on riak_kv 2.1.7
that via riak_api 2.1.6 depends on basho/webmachine 1.10.8-basho1
that contains e.g. module wrq,
and
that depends on mochiweb v2.9.0p2
that contains e.g. module mochiweb_util.

When receiving a search request,
yokozuna calls the search function,
that extracts the query - percent-decoded but not further decoded e.g. Unicode -
then appends some distributed search related parameters
then percent-encodes (not further e.g. Unicode) the parameters
and contacts Solr via POST request
setting header content type to application/x-www-form-urlencoded.

As such content type header has no charset specified, Solr interprets the POST body as UTF-8.

## Solr

A note in the documented changes of Solr 4.1.0 regarding portability of Solr across Web containers points out that ["Query strings passed in via the URL need to be properly-%-escaped, UTF-8 encoded bytes, otherwise Solr refuses to handle the request"](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L3376-L3381).
A note in the documented changes of Solr 4.5.0 mentions parametrization of encoding of query parameters by [`ie` parameter](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1995-L1997) (e.g. [`ie=iso-8859-1`](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/test/org/apache/solr/servlet/SolrRequestParserTest.java#L249)), parametrization of encoding of POST request body by [`Content-Type` header](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1997-L1998) (e.g. [`application/x-www-form-urlencoded; charset=iso-8859-1`](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/test/org/apache/solr/servlet/SolrRequestParserTest.java#L251)), and [UTF-8 as the default encoding](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/CHANGES.txt#L1997).
As of Solr 4.10.4 UTF-8 is still the [default](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L345-L348) encoding for both [query parameters](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L248) and [POST request body](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.10.4/solr/core/src/java/org/apache/solr/servlet/SolrRequestParsers.java#L602-L606).

## Riak Search

[The version of yokozuna in riak kv 2.2.3 is 2.1.10](https://github.com/basho/riak/blob/riak-2.2.3/rebar.config#L24)
[that integrates Solr 4.10.4](https://github.com/basho/yokozuna/blob/2.1.10/tools/grab-solr.sh#L21)
(see also basho/yokozuna@7f0d464)
whose documentation is available [online](https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.10.pdf).

[Yokozuna 2.1.10 depends on riak_kv 2.1.7](https://github.com/basho/yokozuna/blob/2.1.10/rebar.config#L14)
that [via](https://github.com/basho/riak_kv/blob/2.1.7/rebar.config#L38) [riak_api 2.1.6 depends on basho/webmachine 1.10.8-basho1](https://github.com/basho/riak_api/blob/2.1.6/rebar.config#L6)
that contains e.g. module [`wrq`](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl),
and
that [depends on mochiweb v2.9.0p2](https://github.com/basho/webmachine/blob/1.10.8-basho1/rebar.config#L9)
that contains e.g. module [`mochiweb_util`](https://github.com/basho/mochiweb/blob/v2.9.0p2/src/mochiweb_util.erl).

When receiving a [search request](https://docs.basho.com/riak/kv/2.2.3/developing/api/http/search-query/#request),
yokozuna [calls the `search` function](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_wm_search.erl#L58),
that [extracts](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_wm_search.erl#L125) [the](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl#L111) [query](https://github.com/basho/webmachine/blob/1.10.8-basho1/src/wrq.erl#L68-L70) - [percent-decoded but not further decoded e.g. Unicode](https://github.com/basho/mochiweb/blob/v2.9.0p2/src/mochiweb_util.erl#L202-L203) -
then [appends some distributed search related parameters](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L323)
then [percent-encodes (not further e.g. Unicode) the parameters](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L330)
and [contacts Solr via POST request](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L334)
[setting header content type to `application/x-www-form-urlencoded`](https://github.com/basho/yokozuna/blob/2.1.10/src/yz_solr.erl#L332).

As such content type header has no charset specified, Solr interprets the POST body as UTF-8.
@@ -25,6 +25,8 @@ GET /search/query/<index_name>

## Optional Query Parameters

Query parameters must be UTF-8 encoded.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mention percent-encoding as that is part of HTTP. Wanting to be extremely explicit and complete, and wanting to avoid any misunderstandings, this shall be e.g. "Query parameters must be UTF-8 encoded, before being percent-encoded (as required by HTTP)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant