Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occupancy improvement for Hash table build #15700

Draft
wants to merge 15 commits into
base: branch-24.06
Choose a base branch
from

Conversation

tgujar
Copy link
Contributor

@tgujar tgujar commented May 8, 2024

Description

Prototype implementation for: #15502

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented May 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 8, 2024
@tgujar
Copy link
Contributor Author

tgujar commented May 8, 2024

I think the approach of specializing the type dispatcher is very cumbersome and will lead to a lot of code replication. Currently, I have the conditional dispatch working for device_row_hasher but I am unsure if there is a better way to implement this. We could introduce a macro here to generate the code, what do you think?

@PointKernel PointKernel added non-breaking Non-breaking change 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Performance Performance related issue labels May 8, 2024
@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

PointKernel commented May 14, 2024

@tgujar I've updated the docs to unblock CI. Have you noticed any performance regressions for other use cases? It seems that it improves the performance for mixed join but the performance drops significantly in other cases using row hasher.

Comment on lines +48 to +50
id_to_type<type_id::DECIMAL128>,
id_to_type<type_id::DECIMAL64>,
id_to_type<type_id::DECIMAL32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think decimal types are complex type. They are just a wrapper around some integer type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality operator for Decimal will perform scaling which uses exponentiation.

CUDF_HOST_DEVICE inline bool operator==(fixed_point<Rep1, Rad1> const& lhs,

I see a reduction in register usage if I comment out decimal types in #15502. I think we can still decide on the types excluded in the branches later on

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

@tgujar Could you take a look at the failing tests?

@PointKernel
Copy link
Member

/ok to test

@PointKernel
Copy link
Member

/ok to test

* @throw cudf::logic_error if the input tables were preprocessed to transform any nested children
* columns into integer columns but `PhysicalElementComparator` is not
* @throw cudf::logic_error if the input tables were preprocessed to transform any nested
* children columns into integer columns but `PhysicalElementComparator` is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that a significant number of changes to this file are due to reformatting comments.
Would it be possible to undo those changes? This particular change is certainly not desirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will fix. This was caused because of the clang-format extension on vscode

@davidwendt
Copy link
Contributor

This PR needs to be rebased on branch-24.08.

@tgujar
Copy link
Contributor Author

tgujar commented May 30, 2024

Specializing both the comparator and the hasher drops the register usage to 54 instead of the expected 46 for the mixed semi join case. Investigating why the register pressure is different from commenting out the code paths.
The current plan is to avoid using a macro(as mentioned here) and instead do dynamic dispatch on CPU side using std::variant and std::visit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants