[FEA] Explicitly guarantee row group ordering in the parquet reader. #15697
Labels
cuIO
cuIO issue
feature request
New feature or request
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
From @devavret , the question came up as to whether we guarantee the relative ordering of row groups across multiple input files in the parquet reader. That is, if you have two files
[f1, f2]
and the row groups within the files (in one column) are specified as[[r0,r3], [r0,r1]]
, do we guarantee the output ordering would be[f1r0, f1r3, f2r0, f2r1]
The code does in fact do this for both the explicitly specified case and the unspecified (empty user input / all row groups), but we don't make any guarantees about it. Seems like a safe and easy thing to add.
cudf/cpp/src/io/parquet/reader_impl_helpers.cpp
Line 663 in 5d244df
The text was updated successfully, but these errors were encountered: