Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce zero-init overhead #145

Open
mratsim opened this issue May 17, 2020 · 0 comments
Open

Reduce zero-init overhead #145

mratsim opened this issue May 17, 2020 · 0 comments

Comments

@mratsim
Copy link
Owner

mratsim commented May 17, 2020

To support destructible and sinkable types, in particular atomic refcounted types, tasks must zero-init their data buffer.
This is introduced in #144 to properly support the refcounted FlowEvent.

However there is a significant 17% overhead on very short running tasks like Fibonacci(40)

Note: significant is relative, fibonacci spawns 2^40 tasks which are in the trillions and each task is simpler than zero initialization

The change: https://github.com/mratsim/weave/pull/144/files#diff-c5d52e34ee454756d2c729faec306b62L113

proc newTaskFromCache*(): Task =
  result = workerContext.taskCache.pop()
  result = workerContext.taskCache.pop0()
  if result.isNil:
  if result.isNil:
    result = myMemPool().borrow(deref(Task))
    result = myMemPool().borrow0(deref(Task))
  # Zeroing is expensive, it's 96 bytes
  # The task must be fully zero-ed including the data buffer

  # otherwise datatypes that use custom destructors
  # result.fn = nil # Always overwritten
  # and that rely on "myPointer.isNil" to return early
  # result.parent = nil # Always overwritten
  # may read recycled garbage data.
  # result.scopedBarrier = nil # Always overwritten
  # "FlowEvent" is such an example
  result.prev = nil

  result.next = nil
  # TODO: The perf cost to the following is 17% as measured on fib(40)
  result.start = 0

  result.cur = 0
  # # Zeroing is expensive, it's 96 bytes
  result.stop = 0
  # # result.fn = nil # Always overwritten
  result.stride = 0
  # # result.parent = nil # Always overwritten
  result.futures = nil
  # # result.scopedBarrier = nil # Always overwritten
  result.isLoop = false
  # result.prev = nil
  result.hasFuture = false
  # result.next = nil
  # result.start = 0
  # result.cur = 0
  # result.stop = 0
  # result.stride = 0
  # result.futures = nil
  # result.isLoop = false
  # result.hasFuture = false

The simple optimization would be to only zero init the part of the buffer that will be overwritten.
An alternative would be to zero init the buffer only for non-trivial types as detected by supportsCopyMem.
And a third possiblity would be to do both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant