-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-33759] [flink-parquet] Add support for nested array with row/map/array type #24795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The PR looks good to me.
Hi @JingsongLi, please also take a look. I think we can port this to fix apache/paimon#1730
Hey @xccui, thanks for the quick review! Just checking in to keep this PR alive. |
Not really, I found out this issue because we were trying to save Parquet files to S3 and the writer would fail due to empty array being invalid in Parquet. When stepping in the debugger I realized that the methods were empty. After digging a bit i found 2 PRs here that jointly solved the problem, but both inactive. I simply created this PR merging both of them to facilitate the review/merge process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ViktorCosenza for taking care of it. The PR looks overall good. Just left one small question about the background context.
|
||
MapData mapData = row.getMap(ordinal); | ||
@Override | ||
public void write(ArrayData arrayData, int ordinal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Not sure why those write(ArrayData arrayData, int ordinal) methods didn't get implemented. It should be trivial to do it while building those Writers. Do you have any hints about the background info?
Also, as @xccui mentioned, there is a very similar issue/code with methods missing on apache-paimon, so maybe this code was copied around a bit without these methods |
It looks like those methods were skipped on purpose in #17542 and #17542 (comment) |
I see, Ive got the impression that they were forgotten, not purposely left
out because no tests covered writing nested structures ( if there were, the
tests would fail and the methods would have been implemented before)
Did you got the impression it wasn’t added on purpose? I could add more
tests if you think it would help
|
I talked with @JingsongLi offline. It seems that at that time there was no such requirement and therefore skipped those implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ViktorCosenza would you please squash your commits and rebase it? Once it passed the CI, I will merge the PR. Thanks! |
Thanks for the hint! Would you mind if I picked up some of your code to paimon? |
166c40d
to
2cdec63
Compare
@flinkbot run azure |
Done! But I wasn't able to trigger the CI |
Sure, go ahead! |
@JingGe Are you able to trigger the CI manually? I think I't wasnt triggered after the squash because no changes were detected. |
@flinkbot run azure |
@ViktorCosenza did you rebase? |
Yes, i did an interactive rebase and then force-pushed the squashed commit |
@JingGe CI Passed, I dindn't realize the bot just editted the same comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What is the purpose of the change
This Pull request adds commits from #24029 and #23881 to fully resolve issue https://issues.apache.org/jira/browse/FLINK-33759
Brief change log
Adds support for ParquetWriter nested Array, Map and Row Types. Previously the writer just wrote nothing. Silently failing.
Verifying this change
Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation