Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected performance of hash maps #785

Open
jannisteunissen opened this issue Apr 2, 2024 · 8 comments
Open

Unexpected performance of hash maps #785

jannisteunissen opened this issue Apr 2, 2024 · 8 comments

Comments

@jannisteunissen
Copy link
Contributor

Today I started exploring the stdlib (which I think is a great effort!) and I did a performance test of the hash map implementation. The test I did is rather simple: generate keys (with duplicates), and then add a key if it is not yet present, and remove a key if it is already present. The code is copied below. I compared performance against another Fortran hash map implementation I once made (https://github.com/jannisteunissen/ffhash), which was based on khash from https://github.com/attractivechaos/klib

With standard compile flags (-O2) and a test size of 5 million, on an Intel i7-1185G7 laptop CPU, I got the following numbers:

open_hashmap_type from stdlib, using fnv_1_hasher:
Elapsed time (s) 0.2749E+01
Entries/s 0.1819E+07

while the other code (ffhash, murmur3 hasher)

Elapsed time (s) 0.2541E+00
Entries/s 0.1968E+08

A factor 10 seems a bit much. I would expect some overhead due to the class-type interface, but not this much. Perhaps this benchmark can be useful in improving the performance.

program open_hashmap_benchmark
  use stdlib_kinds, only: dp, int8, int32
  use stdlib_hashmaps, only : open_hashmap_type
  use stdlib_hashmap_wrappers

  implicit none

  type(open_hashmap_type)     :: map
  integer, parameter          :: n_max = 5*1000*1000
  integer                     :: n
  integer(int32), allocatable :: keys(:)
  integer, allocatable        :: key_counts(:)
  real(dp)                    :: t_start, t_end
  real(dp), allocatable       :: r_uniform(:)
  type(key_type)              :: key
  logical                     :: present
  integer(int8)               :: int32_as_int8(4)

  call map%init(fnv_1_hasher, slots_bits=10)

  allocate(keys(n_max), r_uniform(n_max))

  call random_number(r_uniform)
  keys = nint(r_uniform * n_max * 0.25_dp)

  call cpu_time(t_start)
  do n = 1, n_max
     int32_as_int8 = transfer(keys(n), int32_as_int8)
     call set(key, int32_as_int8)

     call map%key_test(key, present)

     if (present) then
        call map%remove(key)
     else
        call map%map_entry(key)
     end if
  end do
  call cpu_time(t_end)

  write(*, "(A,E12.4)") "Elapsed time (s) ", t_end - t_start
  write(*, "(A,E12.4)") "Entries/s        ", n_max/(t_end - t_start)
  write(*, "(A,I12)")   "n_occupied       ", map%entries()
  write(*, "(A,I12)")   "n_buckets        ", map%num_slots()

  ! Count number of keys that occur an odd number of times
  allocate(key_counts(minval(keys):maxval(keys)))
  key_counts = 0
  do n = 1, n_max
     key_counts(keys(n)) = key_counts(keys(n)) + 1
  end do
  n = sum(iand(key_counts, 1))

  if (n /= map%entries()) then
     error stop "FAILED"
  else
     print *, "PASSED"
  end if

  ! Clean up allocated storage
  deallocate(key_counts)

end program open_hashmap_benchmarka

The benchmark I compared against is https://github.com/jannisteunissen/ffhash/blob/master/example_benchmark.f90

@jvdp1
Copy link
Member

jvdp1 commented Apr 2, 2024

Thank you @jannisteunissen for this report. I tested your code on my machine, and I got similar results with gfortran and the fpm options used with the profile release. I tested other hashers and got similar times.

@gareth-nx
Copy link
Contributor

@wclodius2

@gareth-nx
Copy link
Contributor

gareth-nx commented Apr 4, 2024

I have limited knowledge of hash maps, but am puzzled by this small modification of @jannisteunissen's program.

If replace the open_hashmap_type with a chaining_hashmap_type then the speed is significantly faster, but the program fails at the error stop.

From inspection, it seems that for chaining_hashmap_type a call to map%remove() does not always lead to map%entries() being reduced by 1. Whereas that does happen for the open_hashmap_type.

Does anyone know if this is expected? @wclodius2

program chaining_hashmap_benchmark
  use stdlib_kinds, only: dp, int8, int32
  use stdlib_hashmaps, only : open_hashmap_type, chaining_hashmap_type
  use stdlib_hashmap_wrappers

  implicit none

  !! Choice of hashmap -- should these do the same thing? 
  !type(open_hashmap_type)     :: map 
  type(chaining_hashmap_type)     :: map

  integer, parameter          :: n_max = 5*1000*1000
  integer                     :: n
  integer(int32), allocatable :: keys(:)
  integer, allocatable        :: key_counts(:)
  real(dp)                    :: t_start, t_end
  real(dp), allocatable       :: r_uniform(:)
  type(key_type)              :: key
  logical                     :: present
  integer(int8)               :: int32_as_int8(4)

  call map%init(fnv_1_hasher, slots_bits=10)

  allocate(keys(n_max), r_uniform(n_max))

  call random_number(r_uniform)
  keys = nint(r_uniform * n_max * 0.25_dp)

  call cpu_time(t_start)
  do n = 1, n_max
     int32_as_int8 = transfer(keys(n), int32_as_int8)
     call set(key, int32_as_int8)

     call map%key_test(key, present)

     if (present) then
        !print*, map%entries()
        call map%remove(key)
        !print*, map%entries() ! Did this reduce?
     else
        call map%map_entry(key)
     end if
  end do
  call cpu_time(t_end)

  write(*, "(A,E12.4)") "Elapsed time (s) ", t_end - t_start
  write(*, "(A,E12.4)") "Entries/s        ", n_max/(t_end - t_start)
  write(*, "(A,I12)")   "n_occupied       ", map%entries()
  write(*, "(A,I12)")   "n_buckets        ", map%num_slots()

  ! Count number of keys that occur an odd number of times
  allocate(key_counts(minval(keys):maxval(keys)))
  key_counts = 0
  do n = 1, n_max
     key_counts(keys(n)) = key_counts(keys(n)) + 1
  end do
  n = sum(iand(key_counts, 1))

  if (n /= map%entries()) then
     error stop "FAILED"
  else
     print *, "PASSED"
  end if

  ! Clean up allocated storage
  deallocate(key_counts)

end program chaining_hashmap_benchmark

@jvdp1
Copy link
Member

jvdp1 commented Apr 4, 2024

@gareth-nx I spotted the same issue, and plan to submit a PR soon to fix it.

@gareth-nx
Copy link
Contributor

gareth-nx commented Apr 4, 2024

@jvdp1 I was just looking at this, and found that if we append the line

map % num_entries = map % num_entries - 1

right at the end of remove_chaining_entry, then the test passes, while the speed remains much better than for open_hashmap_type.

I guess we need to add some additional tests of the hashmaps!

@jannisteunissen
Copy link
Contributor Author

Glad to see that this test was also useful in other ways! I'd be happy to contribute it to to the test suite. Let me also mention that the code I compared with used open addressing (and quadratic probing), so in principle better performance should be possible for open_hashmap_type.

@wclodius2
Copy link
Contributor

wclodius2 commented Apr 10, 2024 via email

@gareth-nx
Copy link
Contributor

No problem at all @wclodius2 -- enjoy your vacation -- and please don't feel obliged to respond unless you feel like it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants