-
-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent aliased string prefix matching on Erlang when prefix contains control/unicode characters. #3126
Comments
Also worth noting the issue also happens when there are any number of backlashes in the staring aliased prefix. I've encountered this issue just now when testing this code which aims to match prefixes with multiple backslashes. |
\\
.\\
.
That sounds correct to me! |
Upon some further investigation, it looks like the String prefix does indeed contain the text representation of escape sequences (so actually It also contains the text representation of unicode codepoints, so this prefix issue will happen with any string escape sequence as in this revised example: import gleam/io
pub fn match_some_prefixes(str: String) -> String {
case str {
"abc" as start <> rest -> {
"Matched starting 'abc' with alias"
}
"abc" <> rest -> {
"Matched starting 'abc' without alias"
}
"\n\r\t" as start <> rest -> {
"Matched starting '\n\r\t' with alias"
}
"\n\r\t" <> rest -> {
"Matched starting '\n\r\t' without alias"
}
"\u{1F600}" as start <> rest -> {
"Matched starting '\u{1F600}' with alias"
}
"\u{1F600}" <> rest -> {
"Matched starting '\u{1F600}' without alias"
}
_ -> "Did not match anything"
}
}
pub fn main() {
let _ = io.debug(match_some_prefixes("abc"))
let _ = io.debug(match_some_prefixes("\n\r\t"))
let _ = io.debug(match_some_prefixes("\u{1F600}"))
} $ gleam run --target javascript
Compiled in 0.01s
Running backslash_alias_test.main
"Matched starting 'abc' with alias"
"Matched starting '\n\r\t' with alias"
"Matched starting '😀' with alias"
$ gleam run --target erlang
Compiled in 0.01s
Running backslash_alias_test.main
"Matched starting 'abc' with alias"
"Matched starting '\n\r\t' without alias"
"Did not match anything" Note in the Erlang that for the unicode codepoint, the text representation is still using %%% cat build/dev/erlang/backslash_alias_test/_gleam_artefacts/backslash_alias_test.erl
-module(backslash_alias_test).
-compile([no_auto_import, nowarn_unused_vars, nowarn_unused_function, nowarn_nomatch]).
-export([match_some_prefixes/1, main/0]).
-spec match_some_prefixes(binary()) -> binary().
match_some_prefixes(Str) ->
case Str of
<<Start:3/binary, Rest/binary>> when Start =:= <<"abc">> ->
<<"Matched starting 'abc' with alias"/utf8>>;
<<"abc"/utf8, Rest@1/binary>> ->
<<"Matched starting 'abc' without alias"/utf8>>;
% Note prefix length of 6 instead of 3:
<<Start@1:6/binary, Rest@2/binary>> when Start@1 =:= <<"\n\r\t">> ->
<<"Matched starting '\n\r\t' with alias"/utf8>>;
<<"\n\r\t"/utf8, Rest@3/binary>> ->
<<"Matched starting '\n\r\t' without alias"/utf8>>;
% Note prefix length of 9 instead of the 4 this should have as UTF-8:
<<Start@2:9/binary, Rest@4/binary>> when Start@2 =:= <<"\u{1F600}">> ->
<<"Matched starting '\x{1F600}' with alias"/utf8>>;
<<"\u{1F600}"/utf8, Rest@5/binary>> ->
<<"Matched starting '\x{1F600}' without alias"/utf8>>;
_ ->
<<"Did not match anything"/utf8>>
end.
-spec main() -> binary().
main() ->
_ = gleam@io:debug(match_some_prefixes(<<"abc"/utf8>>)),
_ = gleam@io:debug(match_some_prefixes(<<"\n\r\t"/utf8>>)),
_ = gleam@io:debug(match_some_prefixes(<<"\x{1F600}"/utf8>>)). I'll update the description of the issue and try to propose a fix for both aspects of the issue soon. |
\\
.Fixes gleam-lang#3126. Fixes String prefix matching on Erlang when the prefix contains escape characters or unicode codepoints by ensuring the correct prefix byte length is matched against. Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Fixes gleam-lang#3126. Fixes String prefix matching on Erlang when the prefix contains escape characters or unicode codepoints by ensuring the correct prefix byte length is matched against. Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Fixes gleam-lang#3126. Fixes String prefix matching on Erlang when the prefix contains escape characters or unicode codepoints by ensuring the correct prefix byte length is matched against. Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Fixes gleam-lang#3126. Fixes String prefix matching on Erlang when the prefix contains escape characters or unicode codepoints by ensuring the correct prefix byte length is matched against. Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Fixes #3126. Fixes String prefix matching on Erlang when the prefix contains escape characters or unicode codepoints by ensuring the correct prefix byte length is matched against. Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Matching against a String which starts with a backslash using an alias for the first part of the match leads to inconsistent behaviour on Erlang.
Sample code with intentional redundant patterns to showcase the issue:
Output:
Tested on macOS with following Homebrew-installed tooling:
Here's the generated Erlang FWIW:
I unfortunately don't know Erlang that well, but I believe the issue is that
Start@1:2/binary
should beStart@1:1/binary
since we're only trying to match one backslash, which is one single byte and not two.I would have expected Gleam to internally load
\\
from strings as a single byte representing\
so I am not sure why this would return2
instead of1
.The text was updated successfully, but these errors were encountered: