Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow file:write to stdout with latin1 encoding on OTP 26.2.3 #8305

Open
VLanvin opened this issue Mar 25, 2024 · 2 comments
Open

Very slow file:write to stdout with latin1 encoding on OTP 26.2.3 #8305

VLanvin opened this issue Mar 25, 2024 · 2 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@VLanvin
Copy link

VLanvin commented Mar 25, 2024

Describe the bug
file:write/2 to erlang:group_leader() with latin1 encoding is unexpectedly slow on OTP 26.2.3. It is fine on OTP 26.0.2.

To Reproduce
Run the following script on OTP 26.2.3 with the input file (3MB) provided in this archive (redirect output to /dev/null).

main(_) ->
  Data = binary_to_term(element(2, file:read_file("input"))),
  io:setopts(erlang:group_leader(), [binary, {encoding, latin1}]),
  file:write(erlang:group_leader(), Data),
  ok.

time escript test.erl >> /dev/null takes ~1.6s. Commenting out the file:write or removing latin1 encoding reduces runtime to ~0.3s.

Expected behavior
file:write should be near-instantaneous there.

Affected versions
OTP 26.2.3
OTP 26.1.2
Doesn't happen on OTP 26.0.2

Additional context
This is a minimal repro of what happens in ELP's parse server.
The input file provided in the archive was obtained by running the parse server on OTP's unicode_util and dumping the resulting term.

@VLanvin VLanvin added the bug Issue is reported as a bug label Mar 25, 2024
@garazdawi
Copy link
Contributor

I haven't checked, but I assume this is because with the new stdio implementation in 26 we keep all internal data as unicode. So the data is first converted from latin1 to unicode to then be converted back again before outputted to stdout. Prior to OTP 26 the stdio used when redirecting to a non-terminal was not unicode aware, so it would just shuffle the bytes.

If my hypothesis is correct , the solution would be to make group aware that user_drv is currently in latin1 mode and then skip the convertion. A PR would be very welcome!

@garazdawi garazdawi self-assigned this Mar 25, 2024
@michalmuskala
Copy link
Contributor

Ah, now that we know where to look we'll work on sending a PR

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

4 participants