Skip to content

CLN/PERF: remove used functions; use C skip list for rolling median #11450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 2, 2015

Conversation

kawochen
Copy link
Contributor

removes some unused code
reverts this commit a40226e

performance consideration

import pandas
import numpy
arr = numpy.random.rand(1000000)
%timeit pandas.rolling_median(arr, 1000)

master

1 loops, best of 3: 4.94 s per loop

branch

1 loops, best of 3: 821 ms per loop

@kawochen kawochen force-pushed the CLN-PERF-roll-median branch from bc82dee to 83f555a Compare October 28, 2015 00:51
@jreback jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 28, 2015
@jreback jreback added this to the 0.17.1 milestone Oct 28, 2015
@@ -58,7 +58,7 @@ Performance Improvements


- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)

- ``rolling_median`` uses c skip list implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the pr number (as the issue number)

@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

you may be able to insert nogil (at the top-level), and make almost all of skiplist.pyx nogil (obviously where you raise you have re-acquire).

@kawochen kawochen force-pushed the CLN-PERF-roll-median branch 3 times, most recently from a0be0fc to 42a9468 Compare October 30, 2015 05:33
@kawochen
Copy link
Contributor Author

In [2]: import pandas
import numpy
arr = numpy.random.rand(1000000)

In [4]: from pandas.util import testing

In [5]: @testing.test_parallel(2)
def g():
    pandas.rolling_median(arr, 1000)
   ...:     

In [6]: @testing.test_parallel(1)
def f():
    pandas.rolling_median(arr, 1000)


In [7]: %timeit f()
1 loops, best of 3: 752 ms per loop


In [8]: %timeit g()
1 loops, best of 3: 972 ms per loop

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

can you add an asv benchmark? not sure we have much for rolling in general......

@@ -59,7 +59,7 @@ Performance Improvements


- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)

- ``rolling_median`` uses c skip list implementation (:issue:`11450`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say Improved performance of ....

@kawochen
Copy link
Contributor Author

OK. might as well release the GIL on roll_*

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

yep - same or another

@kawochen kawochen force-pushed the CLN-PERF-roll-median branch from 42a9468 to 521bbb2 Compare November 1, 2015 20:43
@kawochen kawochen force-pushed the CLN-PERF-roll-median branch from 521bbb2 to 11c8427 Compare November 1, 2015 20:50
@kawochen
Copy link
Contributor Author

kawochen commented Nov 1, 2015

Added asv benchmarks for the gil release. But I can't get any of those to show in asv bench at all. Have I written the tests incorrectly?
I do asv continuous upstream/master HEAD -b gil while on my branch.

A simple timeit does show the improvement, e.g.
for rolling_mean

import pandas
import numpy
from pandas.util.testing import test_parallel
arr = numpy.random.rand(1000000)
@test_parallel(num_threads=2)
def f():
     pandas.rolling_mean(arr, 100)
%timeit f()

branch

100 loops, best of 3: 12.5 ms per loop

master

10 loops, best of 3: 19.5 ms per loop

for rolling_kurt
branch

10 loops, best of 3: 65.5 ms per loop

master

10 loops, best of 3: 98.8 ms per loop

the time_rolling_median in stat_ops.py shows the same 5x improvement I get using timeit.

jreback added a commit that referenced this pull request Nov 2, 2015
CLN/PERF: remove used functions; use C skip list for rolling median
@jreback jreback merged commit eb66bcc into pandas-dev:master Nov 2, 2015
@jreback
Copy link
Contributor

jreback commented Nov 2, 2015

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants