Efficent way to read big files? #347

egorsmth · 2024-05-14T15:41:23Z

I need to read files in a paginated way. I tried 2 options:

parquetReader.iterator.slice(limit, limit + offset)
RecordFilter(index => index >= offset && index < offset + limit)

First option pretty fast in the beginning of the file and slows down when we move to end of file. Totally It is rather slow in my case. Second option reads each "page" in a consistent time, but each read rather slow compared with reads of first option in the beginning of file.

What is the right way to read big files?

mjakubowski84 · 2024-05-15T18:00:56Z

I am not sure what could be the reason of iterator + slice getting slower with time, especially that I do not know the rest of your code. Maybe you are loading the whole file into memory.

The second option can be quite slow in general, because you are opening a file each time.

In order to avoid memory issues and keep the high performance I recommend using a reactive solution that Parquet4S supports that is Akka, Pekko & FS2.

egorsmth · 2024-05-16T06:42:47Z

Yeap, I guess I have some problem with whole file loading each time. I will try fs2 thanks.

egorsmth closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficent way to read big files? #347

Efficent way to read big files? #347

egorsmth commented May 14, 2024

mjakubowski84 commented May 15, 2024

egorsmth commented May 16, 2024

Efficent way to read big files? #347

Efficent way to read big files? #347

Comments

egorsmth commented May 14, 2024

mjakubowski84 commented May 15, 2024

egorsmth commented May 16, 2024