Skip to content

BUG: Index dtype may not be applied properly #11017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Sep 7, 2015

Fixed 2 problems:

  • Specified dtype is not applied to other iterables.
import numpy as np
import pandas as pd

pd.Index([1, 2, 3], dtype=int)
# Index([1, 2, 3], dtype='object')
  • Specifying category to ndarray-like results in TypeError
pd.Index(np.array([1, 2, 3]), dtype='category')
# TypeError: data type "category" not understood

@sinhrks sinhrks added Bug Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 7, 2015
@sinhrks sinhrks added this to the 0.17.0 milestone Sep 7, 2015
@jreback
Copy link
Contributor

jreback commented Sep 7, 2015

pls run perf check on this -

@jreback
Copy link
Contributor

jreback commented Sep 9, 2015

@sinhrks can you see if this affects perf?

if is_categorical_dtype(data) or is_categorical_dtype(dtype):
return CategoricalIndex(data, copy=copy, name=name, **kwargs)

from pandas.tseries.index import DatetimeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave all of the imports where they were, no reason to actually import them until they are used (see below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually on 2nd thought this is fine

@sinhrks
Copy link
Member Author

sinhrks commented Sep 10, 2015

Following is a asv result. Will consider a better path.

    before     after       ratio
  [789e07d9] [ce89f9d1]
    33.94μs    44.57μs      1.31  ctors.index_from_series_ctor.time_index_from_series_ctor
   121.86μs   173.21μs      1.42  frame_methods.frame_get_dtype_counts.time_frame_get_dtype_co
    16.45μs    27.07μs      1.65  index_object.index_float64_construct.time_index_float64_construct

@jreback
Copy link
Contributor

jreback commented Sep 10, 2015

this is prob just from the imports (e.g. a Float64Index) doesn't care about Datetimeindex, so the import adds the extra time (or the check for the import anyhow)

@sinhrks
Copy link
Member Author

sinhrks commented Sep 11, 2015

@jreback Thanks, suggested changes improve perf a little. I assume other slowness is caused by categorical condition moved to the top.

All benchmarks:

    before     after       ratio
  [4d4a2e33] [c3eaeb07]
    33.85μs    41.24μs      1.22  ctors.index_from_series_ctor.time_index_from_series_ctor
   129.48μs   160.86μs      1.24  frame_methods.frame_get_dtype_counts.time_frame_get_dtype_counts
    16.36μs    22.88μs      1.40  index_object.index_float64_construct.time_index_float64_construct

@jreback
Copy link
Contributor

jreback commented Sep 11, 2015

master (this is index_object.index_float64_construct.time_index_float64_construct) benchmark

In [1]: arr = np.arange(1000000.0)
In [2]: %timeit Index(arr)
The slowest run took 12.23 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 15.7 µs per loop

this branch

In [2]: %timeit Index(arr)
The slowest run took 7.49 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 19.8 µs per loop

fix jreback@29c0325

In [2]: %timeit Index(arr)
The slowest run took 11.74 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.8 µs per loop

will merge in a bit, thanks @sinhrks

@sinhrks
Copy link
Member Author

sinhrks commented Sep 11, 2015

Wow, great. Thanks!

@jreback
Copy link
Contributor

jreback commented Sep 11, 2015

merged via ead3ca8 (my change in another commit)

thanks!

I don't believe we had an issue assosicated, correct?

@sinhrks
Copy link
Member Author

sinhrks commented Sep 11, 2015

Though this was based on gitter chat, #5196 refers to the same issue. Closed.

@sinhrks sinhrks deleted the index_dtype branch September 11, 2015 14:29
@jreback
Copy link
Contributor

jreback commented Sep 11, 2015

@sinhrks awesome thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants