Skip to content

Improve the performance of instantiating a Series object with dictionary data and a datetimeindex #14894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nateyoder opened this issue Dec 16, 2016 · 0 comments
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Milestone

Comments

@nateyoder
Copy link
Contributor

The current code path always results in an exception on:

data = lib.fast_multiget(data, index.astype('O'),

which is then caught. Only a slight performance advantage is seen but hopefully the code change makes it less confusing for newcomers like me.

Code Sample, a copy-pastable example if possible

dr = pd.date_range(
            start=datetime(2015, 10, 26),
            end=datetime(2016, 1, 1),
            freq='10s'
        )
data = {d: v for d, v in zip(dr, range(len(dr)))}
s = Series(data=data, index=dr)

Problem description

The current code path always results in an exception on:

data = lib.fast_multiget(data, index.astype('O'),

which is then caught. Only a slight performance advantage is seen but hopefully the code change makes it less confusing for newcomers like me.

ASV output of new benchmark
Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[ 0.00%] · For pandas commit hash 5f05fdc:
[ 0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.................................
[ 0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running ...x.time_series_constructor_no_data_datetime_index 3.26s
[ 50.00%] · For pandas commit hash 3ba2cff:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[100.00%] ··· Running ...x.time_series_constructor_no_data_datetime_index 3.77s before after ratio
[3ba2cff] [5f05fdc]

  • 3.77s      3.26s      0.87  series_methods.series_constructor_dict_data_datetime_index.time_series_constructor_no_data_datetime_index
    
@sinhrks sinhrks added Performance Memory or execution speed performance Datetime Datetime data dtype labels Dec 16, 2016
@jreback jreback added this to the 0.19.2 milestone Dec 16, 2016
ischurov pushed a commit to ischurov/pandas that referenced this issue Dec 19, 2016
closes pandas-dev#14894
Fix usage of fast_multiget with index which was always throwing an
exception that was then caught; add ASV that show slight improvement

Author: Nate Yoder <[email protected]>

Closes pandas-dev#14895 from nateyoder/series_dict_index and squashes the following commits:

56be091 [Nate Yoder] Update whatsnew and fix pep8 issue
5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this issue Dec 24, 2016
closes pandas-dev#14894
Fix usage of fast_multiget with index which was always throwing an
exception that was then caught; add ASV that show slight improvement

Author: Nate Yoder <[email protected]>

Closes pandas-dev#14895 from nateyoder/series_dict_index and squashes the following commits:

56be091 [Nate Yoder] Update whatsnew and fix pep8 issue
5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement

(cherry picked from commit e503d40)
ShaharBental pushed a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016
closes pandas-dev#14894
Fix usage of fast_multiget with index which was always throwing an
exception that was then caught; add ASV that show slight improvement

Author: Nate Yoder <[email protected]>

Closes pandas-dev#14895 from nateyoder/series_dict_index and squashes the following commits:

56be091 [Nate Yoder] Update whatsnew and fix pep8 issue
5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants