ENH: Add DatetimeIndexResampler.nlargest #17791

edschofield · 2017-10-05T04:38:34Z

Code Sample, a copy-pastable example if possible

With this setup:

import numpy as np
n = 1000
dates = pd.date_range(start='2010-01-01', periods=n)
rain_random = pd.Series(data=np.random.uniform(size=n), index=dates)

these two operations given different results:

rain_random.groupby(rain_random.index.year).nlargest(3)

rain_random.resample('A').nlargest(3)

Problem description

The Series.resample().nlargest() operation is inconsistent with DataFrame.resample()[column].nlargest() and the groupby equivalent. It emits a warning

Output:

/Users/schofield/miniconda/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.resample() is now a deferred operation
You called nlargest(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  """Entry point for launching an IPython kernel.
Out[427]:
2010-12-31    0.507550
2012-12-31    0.490082
2011-12-31    0.478356
dtype: float64

Expected output:

Date        Date      
1930-12-31  1930-10-06      288.135370
            1930-10-05      285.587734
            1930-10-07      259.439935
            1930-10-08      227.587389
            1930-10-09      190.054844
1931-12-31  1931-01-26     3052.104566
            1931-01-25     2839.126102
            1931-01-29     2196.167129
            1931-02-01     1953.331709
            1931-01-27     1893.975328
1932-12-31  1932-01-19     9526.953864
            1932-01-20     4278.291105
            1932-03-03     2952.348903
            1932-03-02     2946.385433
            1932-03-04     2098.108897

pd.show_versions() output:

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.5.0

The text was updated successfully, but these errors were encountered:

jreback · 2017-10-05T12:27:37Z

nlargest is not a first class operation on resample, so you need to do this.

In [4]: rain_random.resample('A').apply(lambda x: x.nlargest(3))
Out[4]: 
2010-12-31  2010-09-24    0.998530
            2010-04-27    0.997371
            2010-03-09    0.996582
2011-12-31  2011-11-30    0.999936
            2011-02-20    0.997470
            2011-01-17    0.992270
2012-12-31  2012-07-23    0.999762
            2012-06-20    0.998130
            2012-02-25    0.998010
dtype: float64

discort · 2018-08-23T15:30:07Z

@jreback @sinhrks
The bug is not reproducible now:

In [14]: rain_random.resample('A').nlargest(3)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-ec29cc197ee8> in <module>()
----> 1 rain_random.resample('A').nlargest(3)

/Users/discort/python/fun/pandas/pandas/core/resample.py in __getattr__(self, attr)
     96             return self[attr]
     97
---> 98         return object.__getattribute__(self, attr)
     99
    100     def __iter__(self):

AttributeError: 'DatetimeIndexResampler' object has no attribute 'nlargest'

vesion

INSTALLED VERSIONS

commit: 9122952
python: 3.5.3.candidate.1
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+510.g9122952d8
pytest: 3.7.2
pip: 18.0
setuptools: 33.1.1
Cython: 0.28.4
numpy: 1.12.0
scipy: None
pyarrow: None
xarray: None
IPython: 5.2.2
sphinx: 1.6.6
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

hellojinwoo · 2020-06-07T18:15:26Z

I am using pandas version 1.0.1 and rain_random.resample('A').nlargest(3) is still not working. Hope this function is added in the following updates.

AttributeError                            Traceback (most recent call last)
<ipython-input-102-7057e1436432> in <module>
----> 1 rain_random.resample('A').nlargest(3)

~\anaconda3\lib\site-packages\pandas\core\resample.py in __getattr__(self, attr)
    105             return self[attr]
    106 
--> 107         return object.__getattribute__(self, attr)
    108 
    109     def __iter__(self):

AttributeError: 'DatetimeIndexResampler' object has no attribute 'nlargest'

jreback · 2020-06-07T19:36:20Z

pull requests are accepted;
this is how issues get addressed in open source

hellojinwoo · 2020-06-08T11:28:00Z

pull requests are accepted;
this is how issues get addressed in open source

May I ask what you mean by "this is how issues get addressed in open source?"

jreback · 2020-06-08T11:55:34Z

pandas and virtually all open source project are all volunteer

the core team will review pull requests

since there are 3000+ open issue most patches must come from the community

issues get fixed when folks like you open pull requests

hellojinwoo · 2020-06-08T12:05:40Z

pandas and virtually all open source project are all volunteer

the core team will review pull requests

since there are 3000+ open issue most patches must come from the community

issues get fixed when folks like you open pull requests

Yeah I know that Pandas is an open-source project. But regarding this issue resample('D').nlargest(3), I cannot see neither the assignees nor linked pull requests, which can be found on the right side of this webpage. So I was curious to know what you meant by "pull requests are accepted".

And since this issue was raised about 2 years and a half ago, I just wanted to point out that this has not been resolved yet. So it made me puzzle a little bit when you said "this is how issues get addressed in open source".

jreback · 2020-06-08T14:08:15Z

there are no assignees (who would we assign?)

and PRs would be linked to the issue

that’s the point here - no one has submitted anything

you or anyone else are welcome to do so

in this or any other issue

noting that something is not done is not that helpful - the issue is marked open

what IS helpful is submitting changes / examples /
tests

hellojinwoo · 2020-06-08T16:50:24Z

there are no assignees (who would we assign?)

and PRs would be linked to the issue

that’s the point here - no one has submitted anything

you or anyone else are welcome to do so

in this or any other issue

noting that something is not done is not that helpful - the issue is marked open

what IS helpful is submitting changes / examples /
tests

Now I can see why your replies have been sour. I am new to this pandas-dev zone, so if it is rude to report an old issue once again, I would like to apologize. You don't need to be sulky like that either because that IS NOT helpful either, right? Good day

jreback · 2020-06-08T19:00:56Z

@hellojinwoo thanks for the apology

we have been getting the: why has this x year old issue not been resolved

many times

and to be honest it’s very rude of folks to do this but i guess new folks just don’t realize this so ok

people work extremely hard in open source and volunteer much time - yet continued comments like this (and to be clear i am not calling you out at all) cause burnout for this thankless task

so thank you for commenting in the issue
as i said above - if you would like to help out great

sinhrks added Groupby Resample resample method labels Oct 6, 2017

mroeschke added the Enhancement label May 11, 2020

rhshadrach changed the title ~~Series.resample().nlargest produces incorrect output~~ ENH: Add DatetimeIndexResampler.nlargest Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add DatetimeIndexResampler.nlargest #17791

ENH: Add DatetimeIndexResampler.nlargest #17791

edschofield commented Oct 5, 2017

INSTALLED VERSIONS

jreback commented Oct 5, 2017

discort commented Aug 23, 2018

INSTALLED VERSIONS

hellojinwoo commented Jun 7, 2020 •

edited

Loading

jreback commented Jun 7, 2020

hellojinwoo commented Jun 8, 2020 •

edited

Loading

jreback commented Jun 8, 2020

hellojinwoo commented Jun 8, 2020 •

edited

Loading

jreback commented Jun 8, 2020

hellojinwoo commented Jun 8, 2020 •

edited

Loading

jreback commented Jun 8, 2020

ENH: Add DatetimeIndexResampler.nlargest #17791

ENH: Add DatetimeIndexResampler.nlargest #17791

Comments

edschofield commented Oct 5, 2017

Code Sample, a copy-pastable example if possible

Problem description

Output:

Expected output:

INSTALLED VERSIONS

jreback commented Oct 5, 2017

discort commented Aug 23, 2018

vesion

INSTALLED VERSIONS

hellojinwoo commented Jun 7, 2020 • edited Loading

jreback commented Jun 7, 2020

hellojinwoo commented Jun 8, 2020 • edited Loading

jreback commented Jun 8, 2020

hellojinwoo commented Jun 8, 2020 • edited Loading

jreback commented Jun 8, 2020

hellojinwoo commented Jun 8, 2020 • edited Loading

jreback commented Jun 8, 2020

hellojinwoo commented Jun 7, 2020 •

edited

Loading

hellojinwoo commented Jun 8, 2020 •

edited

Loading

hellojinwoo commented Jun 8, 2020 •

edited

Loading

hellojinwoo commented Jun 8, 2020 •

edited

Loading