Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hPutBuilder for Text's Builder type #446

Open
ghost opened this issue Jun 22, 2022 · 11 comments
Open

Add hPutBuilder for Text's Builder type #446

ghost opened this issue Jun 22, 2022 · 11 comments

Comments

@ghost
Copy link

ghost commented Jun 22, 2022

Based on discussion in #442, having a way to write a text builder directly to a handle, like bytestring's hPutBuilder might be a nice performance win in some situations.

@nh2
Copy link
Member

nh2 commented Jul 4, 2023

I would like to have a function with which I can write a Text Builder to a Handle while specifying the encoding, e.g. hPutBuilderUtf8.

For example,

import System.IO
import qualified Data.Text.Lazy.IO as TL
main = withBinaryFile "test.txt" WriteMode $ \h -> TL.hPutStr h "Gebäude"

With text (at least version 1.*) this writes "garbage" into the file in the sense that it writes "modulo-truncated UTF-16", which cannot be read by most other programs.

For small values like the above one could call encodeUtf8 directly, but for large builder values one really wants to use an equivalent of TL.hPutStr which doesn't pick a non-useful encoding on withBinaryFile, and instead allows providing the encoding directly as an argument or in the function name.

@nh2
Copy link
Member

nh2 commented Jul 4, 2023

I suspect that Data.Text.Lazy.Encoding's encodeUtf8Builder :: Lazy.Text -> ByteString.Builder does the job:

Efficient file write without forcing the entire Text into memory.

{-# LANGUAGE OverloadedStrings #-}

import           Data.ByteString.Builder (hPutBuilder)
import qualified Data.Text.Lazy.Builder as TB
import           Data.Text.Lazy.Encoding (encodeUtf8Builder)
import           System.IO

main = do
  withBinaryFile "test.txt" WriteMode $ \h -> do
    hPutBuilder h (encodeUtf8Builder $ TB.toLazyText $ TB.fromText "Gebäude")

@phadej
Copy link
Contributor

phadej commented Jul 4, 2023

TB.toLazyText + encodeUtf8Builder is unnecessary indirection. The text's Builder is filling up buffers, these can be output directly to the Handle without allocating anything else in between (reusing the same buffer!)

@nh2
Copy link
Member

nh2 commented Jul 4, 2023

@phadej Agreed. It would be nice to have a function that goes straight from a text Builder to syscalls.

@BebeSparkelSparkel
Copy link
Contributor

BebeSparkelSparkel commented Apr 25, 2024

@phadej I do not see how this is any different using the lazy text hPutStr since Builder can only produce [StrictText]. It seems like this only cuts out the functions toLazyText and toChunks.

@BebeSparkelSparkel
Copy link
Contributor

It seems that for this to work well a few changes need to be made

  • ensureFree should use a given buffer size similar to toLasyTextWith so that it can match the handles buffer size
  • newBuffer should be able to use new or newPinned because newPinned will allow the handle to use the buffer without copying

@Lysxia
Copy link
Contributor

Lysxia commented May 27, 2024

The Builder type in text is defined as a producer of a list of chunks: its only purpose is to produce a LazyText. The only possible implementation of hPutBuilder is the composition of Lazy.hPutStr and Builder.toLazyText.

A better data structure for outputting to the OS is bytestring's Builder: it doesn't allocate its own buffers, allowing us to reuse a single one to send an output. There are already UTF-8 conversions to bytestring Builder from both StrictText and LazyText.

That's admittedly not very discoverable. I'm not sure how to improve this.

@BebeSparkelSparkel
Copy link
Contributor

@Lysxia You are correct with the current implementation of builder. However, in #586 I have modified Builder to be able to also be able to use pinned buffers. Pinned buffers allow the handle to use the same buffer that the builder uses thus skipping the copy from the builder buffer to the handle char buffer. This is helpful for all encodings that are not UTF-8.

@Lysxia
Copy link
Contributor

Lysxia commented May 29, 2024

I'm not convinced adding hPutBuilder is the right thing to do. If you want to append text values efficiently before printing them (which is where this issue originally comes from), you go through ByteString.Builder with encodeUtf8Builder. Text.Lazy.Builder is the wrong data structure for this (and it's specialized to UTF-8, whereas one could implement other encoders to ByteString.Builder), and since there is already a suitable data structure elsewhere, there's no point changing it.

@BebeSparkelSparkel
Copy link
Contributor

I see what you mean. Thanks

@BebeSparkelSparkel
Copy link
Contributor

Seems like this should be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants