Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.1 random_walk_fastest is slightly incorrect: does not include starting position 0 #119

Open
manifoldhiker opened this issue Jan 16, 2024 · 1 comment

Comments

@manifoldhiker
Copy link

manifoldhiker commented Jan 16, 2024

There is a subtle incorrectness in random_walk_fastest. It does not return the first position 0, compared to other two implementations. The following code demonstrates this:

N = 99
print(len(random_walk(N)), len(random_walk_faster(N)), len(random_walk_fastest(N)))
# >>> (100, 100, 99)

This may look as a minor distinction, but may cause bugs. There are two ways to remove this inconsistency: exclude initial 0 position from first two functions (trivial) or prepend it to the last function. There are such options for the latter path:

  • Convert steps to numpy list after cumsum and concat with [0]
  • Concatenate np.array([0])
  • Allocate empty array in advance, and use out parameter of cumsum

My implementation of these options:

N = 10_000

# Option 1: Convert `steps` to python list after `cumsum` and concat with `[0]`
def random_walk_option1(n=N):
    steps = np.random.choice([-1,+1], n)
    steps = np.cumsum(steps)
    return [0] + steps.tolist()

# Option 2.1: Concatenate `np.array([0])`
def random_walk_option2_1(n=N):
    steps = np.random.choice([-1,+1], n)
    steps = np.cumsum(steps)
    return np.concatenate((np.array([0]), steps))

# Option 2.2: Concatenate `[0]`
def random_walk_option2_2(n=N):
    steps = np.random.choice([-1,+1], n)
    steps = np.cumsum(steps)
    return np.concatenate(([0], steps))

# Option 3: Allocate empty array in advance, and use `out` parameter of `cumsum`
def random_walk_option3_1(n=N):
    walk = np.empty(n+1)
    walk[0] = 0
    steps = np.random.choice([-1,+1], n)
    np.cumsum(steps, out=walk[1:])
    return walk


def random_walk_option3_2(n=N):
    walk = np.zeros(n+1)
    steps = np.random.choice([-1,+1], n)
    np.cumsum(steps, out=walk[1:])
    return walk

I also benchmarked them to see the difference: https://colab.research.google.com/drive/19OJdrD4SZLk4ug6OI6DC6wq3XwHQPAgr?usp=sharing

While option_2_1 is slightly better, the benchmark is not stable. Maybe it is because the difference in the number of operations is up to the constant, the difference is not significant?

Fixing it in a book may be nasty/non-trivial because will break the simplicity of implementation and add more confusion for the reader :)

@rougier
Copy link
Owner

rougier commented Jan 22, 2024

Thanks for the report (you're totally right) and all the proposed fixes!
I think Option 1 is the most readable and can go inside the book. Can you make a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants