[FLINK-34379][table] Fix adding catalogtable logic #24788

jeyhunkarimov · 2024-05-15T07:33:25Z

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

Fix the catalog table adding logic

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)

flinkbot · 2024-05-15T07:40:00Z

CI report:

01a2aaa Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

JingGe · 2024-05-15T10:39:47Z

...planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java

-                        hasAdded = true;
-                    }
+            boolean alreadyExists = false;
+            for (ContextResolvedTable table : tables) {


Not sure how big the tables could be and how often the method will be called. Does it make sense to use e.g. a Map<K,V> to avoid the loop?

Good point. But note that tables is of type Set already

Sorry, I don't get your point. I meant looping the Set might have performance issue.

I think there are many reasons to use Map instead of Set:

the logic is point search instead of loop search as I mentioned below.

O(1) than O(n) for better performance, because the The DynamicPartitionPruningUtils class will be used centrally for batch jobs[1], i.e. for large projects with many tables, it could be a bottleneck.

less code while using e.g. Map.putIfAbsent(K, V)

[1]

flink/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkDynamicPartitionPruningProgram.java

Line 103 in 0737220

if (DynamicPartitionPruningUtils.isDppDimSide(leftSide)) {

reswqa

Thanks @jeyhunkarimov, LGTM.

I think we should waiting for @JingGe's +1 also.

JingGe

LGTM

JingGe · 2024-05-23T15:55:09Z

...planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java

-                        hasAdded = true;
-                    }
+            boolean alreadyExists = false;
+            for (ContextResolvedTable table : tables) {


Sorry, I don't get your point. I meant looping the Set might have performance issue.

JingGe · 2024-05-23T15:56:10Z

...planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java

-                    }
+            boolean alreadyExists = false;
+            for (ContextResolvedTable table : tables) {
+                if (table.getIdentifier().equals(catalogTable.getIdentifier())) {


This is a typical hash map logic

JingGe · 2024-05-23T16:11:43Z

...planner/src/main/java/org/apache/flink/table/planner/utils/DynamicPartitionPruningUtils.java

-                        hasAdded = true;
-                    }
+            boolean alreadyExists = false;
+            for (ContextResolvedTable table : tables) {


I think there are many reasons to use Map instead of Set:

the logic is point search instead of loop search as I mentioned below.

O(1) than O(n) for better performance, because the The DynamicPartitionPruningUtils class will be used centrally for batch jobs[1], i.e. for large projects with many tables, it could be a bottleneck.

less code while using e.g. Map.putIfAbsent(K, V)

[1]

flink/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/optimize/program/FlinkDynamicPartitionPruningProgram.java

Line 103 in 0737220

if (DynamicPartitionPruningUtils.isDppDimSide(leftSide)) {

jeyhunkarimov mentioned this pull request May 15, 2024

[FLINK-34379][table] Fix OutOfMemoryError with large queries #24600

Merged

flinkbot added the component=TableSQL/Planner label May 15, 2024

JingGe reviewed May 15, 2024

View reviewed changes

reswqa approved these changes May 21, 2024

View reviewed changes

jeyhunkarimov force-pushed the FLINK-34379 branch from 5ceaf0e to bdd150d Compare May 24, 2024 12:39

[FLINK-34379][table] Fix adding catalogtable logic

01a2aaa

jeyhunkarimov force-pushed the FLINK-34379 branch from bdd150d to 01a2aaa Compare May 24, 2024 18:40

JingGe approved these changes May 27, 2024

View reviewed changes

JingGe merged commit 87b7193 into apache:master May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-34379][table] Fix adding catalogtable logic #24788

[FLINK-34379][table] Fix adding catalogtable logic #24788

jeyhunkarimov commented May 15, 2024

flinkbot commented May 15, 2024 •

edited

JingGe May 15, 2024

jeyhunkarimov May 15, 2024

JingGe May 23, 2024

JingGe May 23, 2024

reswqa left a comment

JingGe left a comment

JingGe May 23, 2024

JingGe May 23, 2024

JingGe May 23, 2024

[FLINK-34379][table] Fix adding catalogtable logic #24788

[FLINK-34379][table] Fix adding catalogtable logic #24788

Conversation

jeyhunkarimov commented May 15, 2024

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented May 15, 2024 • edited

CI report:

JingGe May 15, 2024

Choose a reason for hiding this comment

jeyhunkarimov May 15, 2024

Choose a reason for hiding this comment

JingGe May 23, 2024

Choose a reason for hiding this comment

JingGe May 23, 2024

Choose a reason for hiding this comment

reswqa left a comment

Choose a reason for hiding this comment

JingGe left a comment

Choose a reason for hiding this comment

JingGe May 23, 2024

Choose a reason for hiding this comment

JingGe May 23, 2024

Choose a reason for hiding this comment

JingGe May 23, 2024

Choose a reason for hiding this comment

flinkbot commented May 15, 2024 •

edited