-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate utf8 hPutStr to standard hPutStr #589
base: master
Are you sure you want to change the base?
Integrate utf8 hPutStr to standard hPutStr #589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspecting the textEncodingName
of TextEncoding
like you did seems like the right way.
47987f8
to
603dffc
Compare
src/Data/Text/IO.hs
Outdated
wantWritableHandle "hPutStr" h $ \h_ -> do | ||
bmode <- getSpareBuffer h_ | ||
return (bmode, haOutputNL h_) | ||
let isUTF8 = maybe False ((== "UTF-8") . textEncodingName) $ haCodec h_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use textEncodingName GHC.IO.Encoding.utf8
instead of hard-coded "UTF-8"
.
Do we have any idea of performance impact? Comparing strings is not cheap. Maybe we can use GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_)
. After all, it's just an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this the string comparison was hurting performance a lot when I benchmarked this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem is it is possible to make your own utf-8 encoding https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-IO-Encoding.html#t:TextEncoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But who does that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you crafted your own Unicode TextEncoding
, you would not benefit from optimized code path, but that's fine.
It's actually safer to check GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_)
than == "UTF-8"
, because everyone can set textEncodingName
to whatever they like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried implementing this but I am unfamiliar with pointer equality. If I implemented this correctly, please resolve. Otherwise, I'll try and resolve the errors.
Data.Text.IO.Utf8.hPutStr
was implemented in #503 and has much better performance thanData.Text.IO.hPutStr
when the encoding is"UTF-8"
and the neline isLF
, so I added it.Question: Is there a faster way to check for the encoding without checking for string equality?