-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode pipeline and plugin ids #15971
base: main
Are you sure you want to change the base?
Conversation
96842dd
to
d6546b7
Compare
Sigh. These tests all work on my local machine, which has a I have not yet been able to isolate why (since the integration test themselves clearly hold onto the UTF-8 bytes), but suspect it has to do with the process being spawned for the integration test interpreting those bytes as something other than UTF-8, and/or the bytes being returned from its HTTP API being interpreted by the integration test's process as something other than UTF-8. |
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
JRuby's `Ruby#newSymbol(String)` throws an exception when provided a `String` that contains characters outside of lower-ASCII because JRuby internals expect "the incoming String to be one of our mangled ISO-8859-1 strings" as noted in a comment on jruby/jruby#6217. Instead, we use `Ruby#newString(String)` to create a new `RubyString` (which works properly), and then rely on `RubyString#intern` to get our `RubySymbol`. This fixes a regression introduced in the 8.7 series in which pipeline id's are consistently represented as ruby symbols in the metrics store, and ensures similar issue does not exist when specifying a plugin id that contains characters above the lower-ASCII plane.
We cannot rely on `RubySymbol#toString` to produce a properly-encoded `String` whe the string contains characters above the lower-ASCII plane because the result is effectively a binary ruby-internal marshal of the bytes that only holds when the symbol contains lower-ASCII. Instead, we can use the internally-memoizing `RubySymbol#name` to get a properly-encoded `RubyString`, and `RubyString#asJavaString()` to get a properly-encoded java-`String`.
Jackson's JSON serializer leaks the JRuby-internal byte structure of Symbols, which only aligns with the byte-structure of the symbol's actual string when that string is wholly-comprised of lower-ASCII characters. By pre-converting Symbols to Strings, we ensure that the result is readable and useful.
49ca706
to
1a59b4e
Compare
Quality Gate passedIssues Measures |
Release notes
pipeline.id
's caused the pipeline to fail to start up.What does this PR do?
JRuby's
Ruby#newSymbol(String)
throws an exception when provided aString
that contains characters outside of lower-ASCII because JRuby internals expect "the incoming String to be one of our mangled ISO-8859-1 strings" as noted in a comment on jruby/jruby#6217.Instead, we use
Ruby#newString(String)
to create a newRubyString
(which works properly), and then rely onRubyString#intern
to get ourRubySymbol
.This fixes a regression introduced in the 8.7 series in which pipeline id's are consistently represented as ruby symbols in the metrics store, and ensures similar issue does not exist when specifying a plugin id that contains characters above the lower-ASCII plane.
This patch fixes related serialization issues when either plugin id's or pipeline id's contain characters outside the lower-ASCII plane.
Why is it important/What is the impact to the user?
Accented characters like umlauts have historically been supported in
pipeline.id
, and are useful in locales where they are meaningful.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)How to test this PR locally
Run a pipeline whose
pipeline.id
has unicode characters:Observe that the pipeline starts, and log messages include the pipeline id correctly:
Observe that the API responses contain the pipeline and plugin names correctly-encoded:
Related issues
Closes: #15968