Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anatomy of an array introduction. Obvious way is the fastest. #29

Open
ichernob opened this issue Jan 2, 2017 · 10 comments
Open

Anatomy of an array introduction. Obvious way is the fastest. #29

ichernob opened this issue Jan 2, 2017 · 10 comments

Comments

@ichernob
Copy link

ichernob commented Jan 2, 2017

Hello,
I've tried this code:

Z = np.ones(4 * 1000000, np.float32)
timeit("Z[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.complex128)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())

And gave following results:
100 loops, best of 3: 905 usec per loop
100 loops, best of 3: 918 usec per loop
100 loops, best of 3: 925 usec per loop
100 loops, best of 3: 915 usec per loop
100 loops, best of 3: 910 usec per loop
100 loops, best of 3: 912 usec per loop
100 loops, best of 3: 902 usec per loop
100 loops, best of 3: 1.9 msec per loop
100 loops, best of 3: 1.91 msec per loop

And i don't understand the root cause of such opposite results. Could you kindly clarify?
Thanks in advance.

P.S. I'm using python 3.5.2 64bit version along with Anaconda.
The sysinfo() output:
Date: 01/02/17
Python: 3.5.2
Numpy: 1.11.1
Scipy: 0.17.1
Matplotlib: 1.5.1

@rougier
Copy link
Owner

rougier commented Jan 2, 2017

Thanks for the report. Your results are surprising. Could you also test using IPython and the magic %timeit (just to be sure I did not mess up the timeit function) ?

Note: I edited your post because the listing was not displayed properly.

@ichernob
Copy link
Author

ichernob commented Jan 2, 2017

Thanks for the answering. I will try a little bit later and post here the results

@ichernob
Copy link
Author

ichernob commented Jan 2, 2017

Well, unfirtunately, right now i'm unable to use numpy via ironpython (never met it before, really can't understand how to get numpy without pip). But i've ran the same code from another computer and get different results:
100 loops, best of 3: 1.21 msec per loop
100 loops, best of 3: 1.21 msec per loop
100 loops, best of 3: 1.26 msec per loop
100 loops, best of 3: 1.22 msec per loop
100 loops, best of 3: 1.21 msec per loop
10 loops, best of 3: 4.3 msec per loop
10 loops, best of 3: 4.22 msec per loop
100 loops, best of 3: 2.21 msec per loop
100 loops, best of 3: 1.01 msec per loop
Also, from PTVS results have differend trend:
image

@claws
Copy link

claws commented Jan 6, 2017

@ruichernob, I think you have confusing IronPython with IPython. IPython is what you want, not IronPython. You can install IPython into your existing Python using pip:

$ pip install ipython

@rougier rougier closed this as completed Feb 28, 2018
@godaygo
Copy link

godaygo commented Mar 21, 2018

Hi! To start, thank you for great tutorial!
I am experiencing the same issue with times as OP. I've measured the following snippets with yours timeit function (I've also tested with %timeit the results are very close):

timeit("Z[...] = 0", globals())
timeit("Z.view(np.float64)[...] = 0", globals())
timeit("Z.view(np.float32)[...] = 0", globals())
timeit("Z.view(np.float16)[...] = 0", globals())
timeit("Z.view(np.complex)[...] = 0", globals())
timeit("Z.view(np.int64)[...] = 0", globals())
timeit("Z.view(np.int32)[...] = 0", globals())
timeit("Z.view(np.int16)[...] = 0", globals())
timeit("Z.view(np.int8)[...] = 0", globals())
timeit("Z.fill(0)", globals())  

I've measured on two computers, with:

Python 3.6.4
numpy 1.14.2

The specs of the first computer:
Windows 10
CPU: Intel Xenon E5-1650v4 3.60GHz
RAM: 128GB DDR4-2400
Times:

100 loops, best of 3: 750 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 1.06 msec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 757 usec per loop
100 loops, best of 3: 760 usec per loop
100 loops, best of 3: 758 usec per loop
100 loops, best of 3: 747 usec per loop

The specs of the second computer:
Windows 7
CPU: Intel Pentium P6100 2.00GHz
RAM: 4GB DDR3-1333
Times:

100 loops, best of 3: 2.59 msec per loop
10 loops, best of 3: 3.38 msec per loop
10 loops, best of 3: 2.59 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 3.26 msec per loop
100 loops, best of 3: 2.69 msec per loop
100 loops, best of 3: 2.62 msec per loop
100 loops, best of 3: 2.63 msec per loop
10 loops, best of 3: 3.32 msec per loop
100 loops, best of 3: 2.55 msec per loop

As you can see, the results are somewhat consistent with each other, but do not match your observations.

@rougier
Copy link
Owner

rougier commented Mar 22, 2018

Given the consistent output from you and @ruichernob it looks that I might be wrong. I don't remember how did I come to this conclusion. I'm pretty sure I got the results written in the book but I might be the only one in the end 😄. Would you mind proposing a PR to fix what's written in the book?

@godaygo
Copy link

godaygo commented Mar 22, 2018

It would be great if you had the opportunity to recheck these results on your computer with current version of numpy. After all, everything can be :) And of course the results posted in the book could be fair before.

Since the basic idea of this section is that the obvious method is not optimal, just a change in the timings will make this section meaningless. As for me, the only obvious way to fill the entire array with some value is to use the .fill method of ndarray and obviously this interface was introduced for this purpose.

I've tried to come up with a same simple example where such tricks will allow to overtake another obvious way, but unfortunately not yet found :) In addition, "There should be one-- and preferably only one --obvious way to do it." Having said this, if the fresh results you rechecked will be in agreement, I would just skip this example so as not to be misleading. I apologize that I can not offer an example for replacement.

@rougier
Copy link
Owner

rougier commented Mar 23, 2018

On OSX 10.13.3, Pyton 3.6.4, numpy 1.14.2, I got:

>>> Z.view(np.float16)[...] = 0
100 loops, best of 3: 2.85 msec per loop
>>> Z.view(np.int16)[...] = 0
100 loops, best of 3: 2.87 msec per loop
>>> Z.view(np.int32)[...] = 0
100 loops, best of 3: 1.46 msec per loop
>>> Z.view(np.float32)[...] = 0
100 loops, best of 3: 1.58 msec per loop
>>> Z.view(np.int64)[...] = 0
100 loops, best of 3: 1 msec per loop
>>> Z.view(np.float64)[...] = 0
100 loops, best of 3: 1.01 msec per loop
>>> Z.view(np.complex128)[...] = 0
100 loops, best of 3: 918 usec per loop
>>> Z.view(np.int8)[...] = 0
100 loops, best of 3: 614 usec per loop

@godaygo
Copy link

godaygo commented Mar 23, 2018

Thank you, interesting results! Could you still timeit with array.fill method. If you do not mind, I would ask a question about this on SO?

@rougier
Copy link
Owner

rougier commented Mar 28, 2018

More or less the same:

>>> Z.view(np.float16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int16).fill(0)
100 loops, best of 3: 2.82 msec per loop
>>> Z.view(np.int32).fill(0)
100 loops, best of 3: 1.48 msec per loop
>>> Z.view(np.float32).fill(0)
100 loops, best of 3: 1.52 msec per loop
>>> Z.view(np.int64).fill(0)
100 loops, best of 3: 1.05 msec per loop
>>> Z.view(np.float64).fill(0)
100 loops, best of 3: 1.04 msec per loop
>>> Z.view(np.complex128).fill(0)
100 loops, best of 3: 930 usec per loop
>>> Z.view(np.int8).fill(0)
100 loops, best of 3: 601 usec per loop

@rougier rougier reopened this Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants