Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate utf8 hPutStr to standard hPutStr #589

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

BebeSparkelSparkel
Copy link
Contributor

Data.Text.IO.Utf8.hPutStr was implemented in #503 and has much better performance than Data.Text.IO.hPutStr when the encoding is "UTF-8" and the neline is LF, so I added it.

Question: Is there a faster way to check for the encoding without checking for string equality?

Copy link
Contributor

@Lysxia Lysxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspecting the textEncodingName of TextEncoding like you did seems like the right way.

src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
src/Data/Text/IO.hs Outdated Show resolved Hide resolved
wantWritableHandle "hPutStr" h $ \h_ -> do
bmode <- getSpareBuffer h_
return (bmode, haOutputNL h_)
let isUTF8 = maybe False ((== "UTF-8") . textEncodingName) $ haCodec h_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use textEncodingName GHC.IO.Encoding.utf8 instead of hard-coded "UTF-8".

Do we have any idea of performance impact? Comparing strings is not cheap. Maybe we can use GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_). After all, it's just an optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this the string comparison was hurting performance a lot when I benchmarked this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is it is possible to make your own utf-8 encoding https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-IO-Encoding.html#t:TextEncoding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But who does that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you crafted your own Unicode TextEncoding, you would not benefit from optimized code path, but that's fine.

It's actually safer to check GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_) than == "UTF-8", because everyone can set textEncodingName to whatever they like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried implementing this but I am unfamiliar with pointer equality. If I implemented this correctly, please resolve. Otherwise, I'll try and resolve the errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants