Skip to content

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
plammens opened this issue May 22, 2019 · 1 comment · Fixed by #37675
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@plammens
Copy link
Contributor

plammens commented May 22, 2019

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

Details

Code Sample

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

import pandas as pd

series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs'])
series
## 1       0
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.index
## Index([1, 'spam', 2, 'eggs'], dtype='object')

series.loc['spam':'eggs']
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.loc[1:'eggs']
# raises TypeError

Problem description

When indexing a pd.Series (or an equivalent pandas object) that has an index of dtype=object (upcasted for example from a list of ints and strs), with .loc's label-based indexing, passing a slice that contains an integer bound (as in 1:'eggs') results in a TypeError, even if the integer is indeed a label in the index of the series:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer
    kind=kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side, kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound
    self._invalid_indexer('slice', label)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer
    kind=type(key)))
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

Other non-integer (and non-float) slice bounds are accepted, as shown above.

Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label in the indexing doc page, the second warning says:

.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.

What exactly is meant by "compatible or convertible"? An int is definitely an object, so it should be compatible with the index type.

Furthermore, the first paragraph in the section reads (emphasis mine):

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.

And, finally, the subsection Slicing with labels says:

When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned

By reading this, to me it would seem that using a slice such as 1:'eggs' or 1:2 in the above example with loc should be perfectly valid. The indexers are present in the index, and, as stated above, integers are valid labels.


The exception originates at the _maybe_cast_slice_bound check, which rejects all integers regardless of whether the Index contains any integers:

def _maybe_cast_slice_bound(self, label, side, kind):
assert kind in ['ix', 'loc', 'getitem', None]
# We are a plain index here (sub-class override this method if they
# wish to have special treatment for floats/ints, e.g. Float64Index and
# datetimelike Indexes
# reject them
if is_float(label):
if not (kind in ['ix'] and (self.holds_integer() or
self.is_floating())):
self._invalid_indexer('slice', label)
# we are trying to find integer bounds on a non-integer based index
# this is rejected (generally .loc gets you here)
elif is_integer(label):
self._invalid_indexer('slice', label)
return label

Expected Output

I expected series.loc[1:'eggs'] not to raise TypeError because of the integer label. I expected that expression to return the following slice view of the Series:

1       0
spam    1
2       2
eggs    3
dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: 6d2398a58fda68e40f116f199439504558c7774c
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.25.0.dev0+598.g6d2398a58
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.3.0
pyarrow: 0.13.0
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 22, 2019 via email

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 23, 2019
@jreback jreback added this to the 1.2 milestone Nov 8, 2020
@jreback jreback added the Bug label Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants