Make SequenceLearner points hashable by passing the sequence to the function. #266

basnijholt · 2020-04-14T15:15:27Z

Description

With this change, the function of the SequenceLearner contains the sequence.

This is needed because points that are passed to the Runner need to be hashable, and this is not required by the entries of the sequence in SequenceLearner.

Fixes #265

Checklist

Fixed style issues using pre-commit run --all (first install using pip install pre-commit)
pytest passed

Type of change

Check relevant option(s).

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

adaptive/learner/sequence_learner.py

Co-Authored-By: Anton Akhmerov <anton.akhmerov@gmail.com>

adaptive/tests/test_sequence_learner.py

akhmerov · 2020-04-16T15:13:14Z

I propose to:

Repurpose SequenceLearner with unhashable entries in sequence break Runner #265 to an issue that runner breaks on unhashable inputs: that's the real problem.
Make this PR dedicated to a simplification of the sequence learner (which happens to close the current SequenceLearner with unhashable entries in sequence break Runner #265 a side effect).
Also throw away the unnecessary __getstate__/__setstate__ logic from _IndexToPoint.

@basnijholt what do you think?

adaptive/learner/sequence_learner.py

codecov-io · 2020-04-16T15:35:05Z

Codecov Report

Merging #266 into master will increase coverage by 0.16%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
+ Coverage   79.52%   79.68%   +0.16%     
==========================================
  Files          32       33       +1     
  Lines        4424     4416       -8     
  Branches      815      814       -1     
==========================================
+ Hits         3518     3519       +1     
+ Misses        779      768      -11     
- Partials      127      129       +2

Impacted Files	Coverage Δ
adaptive/learner/sequence_learner.py	`86.66% <100.00%> (+0.95%)`	⬆️
adaptive/tests/test_learners.py	`98.63% <100.00%> (-0.08%)`	⬇️
adaptive/tests/test_sequence_learner.py	`100.00% <100.00%> (ø)`
adaptive/learner/learner1D.py	`91.71% <0.00%> (-0.62%)`	⬇️
adaptive/runner.py	`69.66% <0.00%> (+3.37%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d9397d...230b356. Read the comment docs.

basnijholt · 2020-04-16T15:37:15Z

@akhmerov, I've implemented the changes you suggested and opened #267.

Thanks for your review! 🙌

akhmerov · 2020-04-16T15:39:07Z

Sorry, I'm having second thoughts. Passing the complete sequence of arguments to the function means we're distributing it across all workers. Is that definitely OK? It might be, but I think we should consider this carefully.

adaptive/tests/test_sequence_learner.py

akhmerov · 2020-04-16T15:53:34Z

Passing the complete sequence of arguments to the function means we're distributing it across all workers. Is that definitely OK?

Ultimately I think this would only be a problem in extremely exotic cases, and the simplification of the codebase justifies this downside.

basnijholt · 2020-04-16T16:52:44Z

I actually lean towards not accepting the overhead.

From learning the following function on a ProcessPoolExecutor:

class FailOnce:
    def __init__(self):
        self.n = 0

    def __call__(self, value):
        self.n += 1
        return self.n

Which will always return 1, I learn that the sequence will be serialized and send each point!

As an alternative solution, I have opened https://github.com/python-adaptive/adaptive/pull/268/files. Which appeared to be way less work than I thought initially.

basnijholt · 2020-04-23T10:39:10Z

Superseded by #268.

akhmerov · 2020-04-23T11:00:47Z

I'm actually not sure whether creating multiple copies of a sequence is an important limitation and I am wondering whether we should merge this anyway. I still believe this implementation of a sequence learner is simpler, and therefore deserves consideration separate of its relation to #265.

@jbweston, what do you think?

basnijholt · 2020-04-23T11:06:43Z

@akhmerov, I've benchmarked this and the overhead is completely unacceptable.

For example, at this exact moment I am doing simulations with SequenceLearners of ~1e6 items with 15 parameters.

import pickle
import numpy as np
sequence = list(np.random.rand(1_000_000, 15))
%timeit pickle.dumps(sequence)

4.92 s ± 44.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
An that happens on each function evaluation. And this does not even include unserializing and sending it over the network.

akhmerov · 2020-04-23T11:08:39Z

OK, fair enough.

basnijholt added 3 commits April 14, 2020 18:14

pass the sequence with the function, fixes #265

396ed34

add test_fail_with_sequence_of_unhashable

91990fd

no not treat the SequenceLearner specially

093f148

basnijholt force-pushed the fix-sequence-learner branch from 897d4c8 to 093f148 Compare April 14, 2020 16:14

basnijholt requested review from akhmerov and jbweston April 14, 2020 16:15