Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent aliased string prefix matching on Erlang when prefix contains control/unicode characters. #3126

Closed
aznashwan opened this issue May 10, 2024 · 3 comments · Fixed by #3139
Labels
bug Something isn't working good first issue Good for newcomers help wanted Contributions encouraged priority:high

Comments

@aznashwan
Copy link
Contributor

Matching against a String which starts with a backslash using an alias for the first part of the match leads to inconsistent behaviour on Erlang.

Sample code with intentional redundant patterns to showcase the issue:

import gleam/io

pub fn match_leading_slash(str: String) -> String {
  case str {
    "/" as start <> rest -> {
      "Matched starting '/' with alias"
    }
    "/" <> rest -> {
      "Matched starting '/' without alias"
    }
    "\\" as start <> rest -> {
      "Matched starting '\\' with alias"
    }
    "\\" <> rest -> {
      "Matched starting '\\' without alias"
    }
    _ -> "Did not match anything"
  }
}

pub fn main() {
  let _ = io.debug(match_leading_slash("/usr/bin"))
  let _ = io.debug(match_leading_slash("\\\\.\\pipe\\test"))
}

Output:

$ gleam run --target javascript
   Compiled in 0.01s
    Running backslash_alias_test.main
"Matched starting '/' with alias"
"Matched starting '\\' with alias"

$ gleam run --target erlang
   Compiled in 0.01s
    Running backslash_alias_test.main
"Matched starting '/' with alias"
# NOTE: this should have also matched the first clause with the alias like JS.
"Matched starting '\\' without alias"

Tested on macOS with following Homebrew-installed tooling:

$ brew info gleam
==> gleam: stable 1.1.0 (bottled), HEAD

$ brew info erlang
==> erlang: stable 26.2.5 (bottled), HEAD

$ brew info rebar3
==> rebar3: stable 3.23.0 (bottled)

$ brew info node
==> node: stable 22.1.0 (bottled), HEAD

Here's the generated Erlang FWIW:

%%% $ cat ./build/dev/erlang/backslash_alias_test/_gleam_artefacts/backslash_alias_test.erl

-module(backslash_alias_test).
-compile([no_auto_import, nowarn_unused_vars, nowarn_unused_function, nowarn_nomatch]).

-export([match_leading_slash/1, main/0]).

-spec match_leading_slash(binary()) -> binary().
match_leading_slash(Str) ->
    case Str of
        <<Start:1/binary, Rest/binary>> when Start =:= <<"/">> ->
            <<"Matched starting '/' with alias"/utf8>>;

        <<"/"/utf8, Rest@1/binary>> ->
            <<"Matched starting '/' without alias"/utf8>>;

        <<Start@1:2/binary, Rest@2/binary>> when Start@1 =:= <<"\\">> ->
            <<"Matched starting '\\' with alias"/utf8>>;

        <<"\\"/utf8, Rest@3/binary>> ->
            <<"Matched starting '\\' without alias"/utf8>>;

        _ ->
            <<"Did not match anything"/utf8>>
    end.

-spec main() -> binary().
main() ->
    _ = gleam@io:debug(match_leading_slash(<<"/usr/bin"/utf8>>)),
    _ = gleam@io:debug(match_leading_slash(<<"\\\\.\\pipe\\test"/utf8>>)).

I unfortunately don't know Erlang that well, but I believe the issue is that Start@1:2/binary should be Start@1:1/binary since we're only trying to match one backslash, which is one single byte and not two.

I would have expected Gleam to internally load \\ from strings as a single byte representing \ so I am not sure why this would return 2 instead of 1.

@aznashwan aznashwan added the bug Something isn't working label May 10, 2024
@aznashwan
Copy link
Contributor Author

Also worth noting the issue also happens when there are any number of backlashes in the staring aliased prefix.

I've encountered this issue just now when testing this code which aims to match prefixes with multiple backslashes.

@aznashwan aznashwan changed the title Inconsistent string pattern matching with prefix alias on Erlang when string starts with \\. Inconsistent string pattern matching with prefix alias on Erlang when prefix contains \\. May 10, 2024
@lpil
Copy link
Member

lpil commented May 13, 2024

I believe the issue is that Start@1:2/binary should be Start@1:1/binary since we're only trying to match one backslash, which is one single byte and not two.

That sounds correct to me!

@lpil lpil added help wanted Contributions encouraged good first issue Good for newcomers priority:medium priority:high and removed priority:medium labels May 13, 2024
@aznashwan
Copy link
Contributor Author

Upon some further investigation, it looks like the String prefix does indeed contain the text representation of escape sequences (so actually \t instead of the byte 0x09)

It also contains the text representation of unicode codepoints, so this prefix issue will happen with any string escape sequence as in this revised example:

import gleam/io

pub fn match_some_prefixes(str: String) -> String {
  case str {
    "abc" as start <> rest -> {
      "Matched starting 'abc' with alias"
    }
    "abc" <> rest -> {
      "Matched starting 'abc' without alias"
    }
    "\n\r\t" as start <> rest -> {
      "Matched starting '\n\r\t' with alias"
    }
    "\n\r\t" <> rest -> {
      "Matched starting '\n\r\t' without alias"
    }
    "\u{1F600}" as start <> rest -> {
      "Matched starting '\u{1F600}' with alias"
    }
    "\u{1F600}" <> rest -> {
      "Matched starting '\u{1F600}' without alias"
    }
    _ -> "Did not match anything"
  }
}

pub fn main() {
  let _ = io.debug(match_some_prefixes("abc"))
  let _ = io.debug(match_some_prefixes("\n\r\t"))
  let _ = io.debug(match_some_prefixes("\u{1F600}"))
}
$ gleam run --target javascript
   Compiled in 0.01s
    Running backslash_alias_test.main
"Matched starting 'abc' with alias"
"Matched starting '\n\r\t' with alias"
"Matched starting '😀' with alias"

$ gleam run --target erlang
   Compiled in 0.01s
    Running backslash_alias_test.main
"Matched starting 'abc' with alias"
"Matched starting '\n\r\t' without alias"
"Did not match anything"

Note in the Erlang that for the unicode codepoint, the text representation is still using \u{...} so isn't properly translated into Erlang's \x{...} as the actual return strings:

%%% cat build/dev/erlang/backslash_alias_test/_gleam_artefacts/backslash_alias_test.erl

-module(backslash_alias_test).
-compile([no_auto_import, nowarn_unused_vars, nowarn_unused_function, nowarn_nomatch]).

-export([match_some_prefixes/1, main/0]).

-spec match_some_prefixes(binary()) -> binary().
match_some_prefixes(Str) ->
    case Str of
        <<Start:3/binary, Rest/binary>> when Start =:= <<"abc">> ->
            <<"Matched starting 'abc' with alias"/utf8>>;

        <<"abc"/utf8, Rest@1/binary>> ->
            <<"Matched starting 'abc' without alias"/utf8>>;

        % Note prefix length of 6 instead of 3:
        <<Start@1:6/binary, Rest@2/binary>> when Start@1 =:= <<"\n\r\t">> ->
            <<"Matched starting '\n\r\t' with alias"/utf8>>;

        <<"\n\r\t"/utf8, Rest@3/binary>> ->
            <<"Matched starting '\n\r\t' without alias"/utf8>>;

        % Note prefix length of 9 instead of the 4 this should have as UTF-8:
        <<Start@2:9/binary, Rest@4/binary>> when Start@2 =:= <<"\u{1F600}">> ->
            <<"Matched starting '\x{1F600}' with alias"/utf8>>;

        <<"\u{1F600}"/utf8, Rest@5/binary>> ->
            <<"Matched starting '\x{1F600}' without alias"/utf8>>;

        _ ->
            <<"Did not match anything"/utf8>>
    end.

-spec main() -> binary().
main() ->
    _ = gleam@io:debug(match_some_prefixes(<<"abc"/utf8>>)),
    _ = gleam@io:debug(match_some_prefixes(<<"\n\r\t"/utf8>>)),
    _ = gleam@io:debug(match_some_prefixes(<<"\x{1F600}"/utf8>>)).

I'll update the description of the issue and try to propose a fix for both aspects of the issue soon.

@aznashwan aznashwan changed the title Inconsistent string pattern matching with prefix alias on Erlang when prefix contains \\. Inconsistent aliased string prefix matching on Erlang when prefix contains control/unicode characters. May 13, 2024
aznashwan added a commit to aznashwan/gleam that referenced this issue May 15, 2024
Fixes gleam-lang#3126.

Fixes String prefix matching on Erlang when the prefix
contains escape characters or unicode codepoints by ensuring
the correct prefix byte length is matched against.

Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
aznashwan added a commit to aznashwan/gleam that referenced this issue May 15, 2024
Fixes gleam-lang#3126.

Fixes String prefix matching on Erlang when the prefix
contains escape characters or unicode codepoints by ensuring
the correct prefix byte length is matched against.

Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
aznashwan added a commit to aznashwan/gleam that referenced this issue May 15, 2024
Fixes gleam-lang#3126.

Fixes String prefix matching on Erlang when the prefix
contains escape characters or unicode codepoints by ensuring
the correct prefix byte length is matched against.

Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
aznashwan added a commit to aznashwan/gleam that referenced this issue May 15, 2024
Fixes gleam-lang#3126.

Fixes String prefix matching on Erlang when the prefix
contains escape characters or unicode codepoints by ensuring
the correct prefix byte length is matched against.

Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
lpil pushed a commit that referenced this issue May 20, 2024
Fixes #3126.

Fixes String prefix matching on Erlang when the prefix
contains escape characters or unicode codepoints by ensuring
the correct prefix byte length is matched against.

Signed-off-by: Nashwan Azhari <aznashwan@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Contributions encouraged priority:high
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants