Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce custom hash table data structures. #3940

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

chandlerc
Copy link
Contributor

The hash table design is heavily based on Abseil's "Swiss Tables" design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted.

TODO: document the design and PR more fully

@chandlerc chandlerc requested a review from josh11b May 7, 2024 10:21
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to give feedback on map.h early, before you update set.h

common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/hashing.h Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, definitely useful to get map.h tidied up before I port it to set.h. I think I've responded to all the comments there.

common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/hashing.h Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what comments I have so far.

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/map.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback so far, PTAL.

common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/map.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a pass through raw_hashtable.h

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

@chandlerc chandlerc marked this pull request as ready for review May 24, 2024 01:42
@chandlerc chandlerc requested a review from josh11b May 24, 2024 01:42
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
The hash table design is heavily based on Abseil's ["Swiss
Tables"][swiss-tables] design. It uses an array of bytes storing
metadata about each entry and an array of entries where each is a pair
of key and value. The metadata byte consists of 7-bits of hash of the
key (distinct from the bits used to index the table), and one bit
indicating the presence of a special entry -- either empty or deleted.

[swiss-tables]: https://abseil.io/about/design/swisstables

See the comments in `raw_hashtable.h` for a detailed overview of the
design.

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
@chandlerc
Copy link
Contributor Author

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

  • I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.

  • I've found and cleaned up a number of innocuous inconsistencies in the code.

  • I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

@josh11b
Copy link
Contributor

josh11b commented May 28, 2024

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

  • I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.
  • I've found and cleaned up a number of innocuous inconsistencies in the code.
  • I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants