Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove invalid UTF8 check from Repr::from_utf8_unchecked #379

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented May 7, 2024

This PR removes the following code from Repr::from_utf8_unchecked:

// There's an edge case where the final byte of this buffer == `HEAP_MASK`, which is
// invalid UTF-8, but would result in us creating an inline variant, that identifies as
// a heap variant. If a user ever tried to reference the data at all, we'd incorrectly
// try and read data from an invalid memory address, causing undefined behavior.
if bytes_len == MAX_SIZE {
let last_byte = bytes[bytes_len - 1];
// If we hit the edge case, reserve additional space to make the repr becomes heap
// allocated, which prevents us from writing this last byte inline
if last_byte >= 0b11000000 {
repr.reserve(MAX_SIZE + 1)?;
}
}

In my opinion, this code should be removed for 2 reasons:

  1. It is part of the safety contract for from_utf8_unchecked that buf is the bytes of a valid UTF8 string. This check penalizes users who follow the safety contract (which everyone should), in order to support users who don't (and therefore have opted in to UB).
  2. This check alone is not enough to avoid UB anyway. If buf ends with the first byte of a multi-byte Unicode character, for example calling .as_str().chars().collect::<Vec<char>>() on the resulting CompactString will cause an out of bounds read.

@overlookmotel
Copy link
Contributor Author

overlookmotel commented May 7, 2024

CI fails are I believe caused by tests/fuzz which are breaking the safety constraint of utf8_unchecked. If intention is to merge this, those tests should be removed/adapted.

I've fixed the proptest, but I don't know how to alter the fuzzer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant