CLN/PERF: remove used functions; use C skip list for rolling median #11450

kawochen · 2015-10-28T00:13:33Z

removes some unused code
reverts this commit a40226e

performance consideration

import pandas
import numpy
arr = numpy.random.rand(1000000)
%timeit pandas.rolling_median(arr, 1000)

master

1 loops, best of 3: 4.94 s per loop

branch

1 loops, best of 3: 821 ms per loop

jreback · 2015-10-28T10:09:14Z

doc/source/whatsnew/v0.17.1.txt

@@ -58,7 +58,7 @@ Performance Improvements


 - Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
-
+- ``rolling_median`` uses c skip list implementation


add the pr number (as the issue number)

jreback · 2015-10-28T10:22:51Z

you may be able to insert nogil (at the top-level), and make almost all of skiplist.pyx nogil (obviously where you raise you have re-acquire).

kawochen · 2015-10-30T05:58:07Z

In [2]: import pandas
import numpy
arr = numpy.random.rand(1000000)

In [4]: from pandas.util import testing

In [5]: @testing.test_parallel(2)
def g():
    pandas.rolling_median(arr, 1000)
   ...:     

In [6]: @testing.test_parallel(1)
def f():
    pandas.rolling_median(arr, 1000)


In [7]: %timeit f()
1 loops, best of 3: 752 ms per loop


In [8]: %timeit g()
1 loops, best of 3: 972 ms per loop

jreback · 2015-10-30T12:47:28Z

can you add an asv benchmark? not sure we have much for rolling in general......

jreback · 2015-10-30T12:47:53Z

doc/source/whatsnew/v0.17.1.txt

@@ -59,7 +59,7 @@ Performance Improvements


 - Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
-
+- ``rolling_median`` uses c skip list implementation (:issue:`11450`)



say Improved performance of ....

kawochen · 2015-10-30T13:48:43Z

OK. might as well release the GIL on roll_*

jreback · 2015-10-30T13:54:31Z

yep - same or another

kawochen · 2015-11-01T22:31:44Z

Added asv benchmarks for the gil release. But I can't get any of those to show in asv bench at all. Have I written the tests incorrectly?
I do asv continuous upstream/master HEAD -b gil while on my branch.

A simple timeit does show the improvement, e.g.
for rolling_mean

import pandas
import numpy
from pandas.util.testing import test_parallel
arr = numpy.random.rand(1000000)
@test_parallel(num_threads=2)
def f():
     pandas.rolling_mean(arr, 100)
%timeit f()

branch

100 loops, best of 3: 12.5 ms per loop

master

10 loops, best of 3: 19.5 ms per loop

for rolling_kurt
branch

10 loops, best of 3: 65.5 ms per loop

master

10 loops, best of 3: 98.8 ms per loop

the time_rolling_median in stat_ops.py shows the same 5x improvement I get using timeit.

CLN/PERF: remove used functions; use C skip list for rolling median

jreback · 2015-11-02T11:45:02Z

thanks!

kawochen force-pushed the CLN-PERF-roll-median branch from bc82dee to 83f555a Compare October 28, 2015 00:51

jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 28, 2015

jreback added this to the 0.17.1 milestone Oct 28, 2015

jreback reviewed Oct 28, 2015
View reviewed changes

kawochen force-pushed the CLN-PERF-roll-median branch 3 times, most recently from a0be0fc to 42a9468 Compare October 30, 2015 05:33

jreback reviewed Oct 30, 2015
View reviewed changes

kawochen force-pushed the CLN-PERF-roll-median branch from 42a9468 to 521bbb2 Compare November 1, 2015 20:43

CLN/PERF: remove used functions; use C skip list for rolling median

11c8427

kawochen force-pushed the CLN-PERF-roll-median branch from 521bbb2 to 11c8427 Compare November 1, 2015 20:50

jreback added a commit that referenced this pull request Nov 2, 2015

Merge pull request #11450 from kawochen/CLN-PERF-roll-median

eb66bcc

CLN/PERF: remove used functions; use C skip list for rolling median

jreback merged commit eb66bcc into pandas-dev:master Nov 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/PERF: remove used functions; use C skip list for rolling median #11450

CLN/PERF: remove used functions; use C skip list for rolling median #11450

kawochen commented Oct 28, 2015

jreback Oct 28, 2015

jreback commented Oct 28, 2015

kawochen commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback Oct 30, 2015

kawochen commented Oct 30, 2015

jreback commented Oct 30, 2015

kawochen commented Nov 1, 2015

jreback commented Nov 2, 2015

		@@ -58,7 +58,7 @@ Performance Improvements


		- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)

		- ``rolling_median`` uses c skip list implementation

		@@ -59,7 +59,7 @@ Performance Improvements


		- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)

		- ``rolling_median`` uses c skip list implementation (:issue:`11450`)

CLN/PERF: remove used functions; use C skip list for rolling median #11450

CLN/PERF: remove used functions; use C skip list for rolling median #11450

Conversation

kawochen commented Oct 28, 2015

jreback Oct 28, 2015

Choose a reason for hiding this comment

jreback commented Oct 28, 2015

kawochen commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback Oct 30, 2015

Choose a reason for hiding this comment

kawochen commented Oct 30, 2015

jreback commented Oct 30, 2015

kawochen commented Nov 1, 2015

jreback commented Nov 2, 2015