Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-34379][table] Fix adding catalogtable logic #24788

Merged
merged 1 commit into from
May 27, 2024

Conversation

jeyhunkarimov
Copy link
Contributor

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

  • Fix the catalog table adding logic

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented May 15, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

hasAdded = true;
}
boolean alreadyExists = false;
for (ContextResolvedTable table : tables) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how big the tables could be and how often the method will be called. Does it make sense to use e.g. a Map<K,V> to avoid the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. But note that tables is of type Set already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get your point. I meant looping the Set might have performance issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are many reasons to use Map instead of Set:

  1. the logic is point search instead of loop search as I mentioned below.
  2. O(1) than O(n) for better performance, because the The DynamicPartitionPruningUtils class will be used centrally for batch jobs[1], i.e. for large projects with many tables, it could be a bottleneck.
  3. less code while using e.g. Map.putIfAbsent​(K, V)

[1]

Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeyhunkarimov, LGTM.

I think we should waiting for @JingGe's +1 also.

Copy link
Contributor

@JingGe JingGe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hasAdded = true;
}
boolean alreadyExists = false;
for (ContextResolvedTable table : tables) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get your point. I meant looping the Set might have performance issue.

}
boolean alreadyExists = false;
for (ContextResolvedTable table : tables) {
if (table.getIdentifier().equals(catalogTable.getIdentifier())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typical hash map logic

hasAdded = true;
}
boolean alreadyExists = false;
for (ContextResolvedTable table : tables) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are many reasons to use Map instead of Set:

  1. the logic is point search instead of loop search as I mentioned below.
  2. O(1) than O(n) for better performance, because the The DynamicPartitionPruningUtils class will be used centrally for batch jobs[1], i.e. for large projects with many tables, it could be a bottleneck.
  3. less code while using e.g. Map.putIfAbsent​(K, V)

[1]

@JingGe JingGe merged commit 87b7193 into apache:master May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants