Integrate utf8 hPutStr to standard hPutStr #589

BebeSparkelSparkel · 2024-04-28T17:03:20Z

Data.Text.IO.Utf8.hPutStr was implemented in #503 and has much better performance than Data.Text.IO.hPutStr when the encoding is "UTF-8" and the neline is LF, so I added it.

Question: Is there a faster way to check for the encoding without checking for string equality?

Lysxia

Inspecting the textEncodingName of TextEncoding like you did seems like the right way.

src/Data/Text/IO.hs

Bodigrim · 2024-05-30T20:56:21Z

src/Data/Text/IO.hs

       wantWritableHandle "hPutStr" h $ \h_ -> do
                     bmode <- getSpareBuffer h_
-                     return (bmode, haOutputNL h_)
+                     let isUTF8 = maybe False ((== "UTF-8") . textEncodingName) $ haCodec h_


Let's use textEncodingName GHC.IO.Encoding.utf8 instead of hard-coded "UTF-8".

Do we have any idea of performance impact? Comparing strings is not cheap. Maybe we can use GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_). After all, it's just an optimization.

Thanks for finding this the string comparison was hurting performance a lot when I benchmarked this.

One problem is it is possible to make your own utf-8 encoding https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-IO-Encoding.html#t:TextEncoding

But who does that?

If you crafted your own Unicode TextEncoding, you would not benefit from optimized code path, but that's fine.

It's actually safer to check GHC.Exts.unsafePtrEquality# GHC.IO.Encoding.utf8 (haCodec h_) than == "UTF-8", because everyone can set textEncodingName to whatever they like.

I have tried implementing this but I am unfamiliar with pointer equality. If I implemented this correctly, please resolve. Otherwise, I'll try and resolve the errors.

Lysxia reviewed Apr 30, 2024

View reviewed changes

src/Data/Text/IO.hs Outdated Show resolved Hide resolved

Lysxia mentioned this pull request May 6, 2024

Added file write benchmarks #585

Merged

Lysxia reviewed May 7, 2024

View reviewed changes

src/Data/Text/IO.hs Outdated Show resolved Hide resolved

Lysxia reviewed May 27, 2024

View reviewed changes

src/Data/Text/IO.hs Outdated Show resolved Hide resolved

Bodigrim reviewed May 29, 2024

View reviewed changes

src/Data/Text/IO.hs Outdated Show resolved Hide resolved

integrate utf8 hPutStr to standard hPutStr

603dffc

BebeSparkelSparkel force-pushed the directly-write-utf8 branch from 47987f8 to 603dffc Compare May 30, 2024 00:13

Bodigrim reviewed May 30, 2024

View reviewed changes

comparing encoding pointers instead of strings of encoding names

b56f072

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate utf8 hPutStr to standard hPutStr #589

Integrate utf8 hPutStr to standard hPutStr #589

BebeSparkelSparkel commented Apr 28, 2024

Lysxia left a comment

Bodigrim May 30, 2024

BebeSparkelSparkel May 30, 2024

BebeSparkelSparkel May 30, 2024

BebeSparkelSparkel May 30, 2024

Bodigrim May 30, 2024

BebeSparkelSparkel Jun 1, 2024 •

edited

Integrate utf8 hPutStr to standard hPutStr #589

Are you sure you want to change the base?

Integrate utf8 hPutStr to standard hPutStr #589

Conversation

BebeSparkelSparkel commented Apr 28, 2024

Lysxia left a comment

Choose a reason for hiding this comment

Bodigrim May 30, 2024

Choose a reason for hiding this comment

BebeSparkelSparkel May 30, 2024

Choose a reason for hiding this comment

BebeSparkelSparkel May 30, 2024

Choose a reason for hiding this comment

BebeSparkelSparkel May 30, 2024

Choose a reason for hiding this comment

Bodigrim May 30, 2024

Choose a reason for hiding this comment

BebeSparkelSparkel Jun 1, 2024 • edited

Choose a reason for hiding this comment

BebeSparkelSparkel Jun 1, 2024 •

edited