Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

plammens · 2019-05-22T16:39:13Z

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

Details

Code Sample

When indexing a pandas axis that has an index of dtype=object, with label-based indexing, passing a slice that contains an integer bound results in a TypeError, even if the integer is indeed a label in the index of the series.

import pandas as pd

series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs'])
series
## 1       0
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.index
## Index([1, 'spam', 2, 'eggs'], dtype='object')

series.loc['spam':'eggs']
## spam    1
## 2       2
## eggs    3
## dtype: int64

series.loc[1:'eggs']
# raises TypeError

Problem description

When indexing a pd.Series (or an equivalent pandas object) that has an index of dtype=object (upcasted for example from a list of ints and strs), with .loc's label-based indexing, passing a slice that contains an integer bound (as in 1:'eggs') results in a TypeError, even if the integer is indeed a label in the index of the series:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer
    kind=kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side, kind)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound
    self._invalid_indexer('slice', label)
  File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer
    kind=type(key)))
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>

Other non-integer (and non-float) slice bounds are accepted, as shown above.

Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label in the indexing doc page, the second warning says:

.loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError.

What exactly is meant by "compatible or convertible"? An int is definitely an object, so it should be compatible with the index type.

Furthermore, the first paragraph in the section reads (emphasis mine):

pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position.

And, finally, the subsection Slicing with labels says:

When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned

By reading this, to me it would seem that using a slice such as 1:'eggs' or 1:2 in the above example with loc should be perfectly valid. The indexers are present in the index, and, as stated above, integers are valid labels.

The exception originates at the _maybe_cast_slice_bound check, which rejects all integers regardless of whether the Index contains any integers:

pandas/pandas/core/indexes/base.py

Lines 4846 to 4863 in 6d2398a

    
           def _maybe_cast_slice_bound(self, label, side, kind): 
        
               assert kind in ['ix', 'loc', 'getitem', None] 
        
               # We are a plain index here (sub-class override this method if they 
        
               # wish to have special treatment for floats/ints, e.g. Float64Index and 
        
               # datetimelike Indexes 
        
               # reject them 
        
               if is_float(label): 
        
                   if not (kind in ['ix'] and (self.holds_integer() or 
        
                                               self.is_floating())): 
        
                       self._invalid_indexer('slice', label) 
        
               # we are trying to find integer bounds on a non-integer based index 
        
               # this is rejected (generally .loc gets you here) 
        
               elif is_integer(label): 
        
                   self._invalid_indexer('slice', label) 
        
               return label

Expected Output

I expected series.loc[1:'eggs'] not to raise TypeError because of the integer label. I expected that expression to return the following slice view of the Series:

1       0
spam    1
2       2
eggs    3
dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: 6d2398a58fda68e40f116f199439504558c7774c
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.25.0.dev0+598.g6d2398a58
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.3.0
pyarrow: 0.13.0
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-05-22T19:41:52Z

This raising probably isn't intentional.

…

On Wed, May 22, 2019 at 11:39 AM Paolo Lammens ***@***.***> wrote: Code Sample import pandas as pd series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs']) series## 1 0## spam 1## 2 2## eggs 3## dtype: int64 series.index## Index([1, 'spam', 2, 'eggs'], dtype='object') series.loc['spam':'eggs']## spam 1## 2 2## eggs 3## dtype: int64 series.loc[1:'eggs']# raises TypeError Problem description When indexing a pd.Series (or an equivalent pandas object) that has an index of dtype=object (upcasted from example from a list of ints and strs), with .loc's label-based indexing, passing a slice that contains an integer bound (as in 1:'eggs') results in a TypeError, even if the integer *is indeed a label* in the index of the series: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis return self._get_slice_axis(key, axis=axis) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis slice_obj.step, kind=self.name) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer kind=kind) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs start_slice = self.get_slice_bound(start, 'left', kind) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound label = self._maybe_cast_slice_bound(label, side, kind) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound self._invalid_indexer('slice', label) File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer kind=type(key))) TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'> Other non-integer (and non-float) slice bounds are accepted, as shown above. Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label <http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-label> in the indexing doc page, the second warning says: .loc is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a DatetimeIndex. These will raise a TypeError. What exactly is meant by "compatible or convertible"? An int is definitely an object, so it should be compatible with the index type. Furthermore, the first paragraph in the section reads (emphasis mine): pandas provides a suite of methods in order to have purely label based indexing. This is a strict inclusion based protocol. Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the stop bound are included, if present in the index. *Integers are valid labels*, but they refer to the label and not the position. And, finally, the subsection Slicing with labels <http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-with-labels> says: When using .loc with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned By reading this, to me it would seem that using a slice such as 1:'eggs' or 1:2 in the above example with the loc indexer should be perfectly valid. The indexer is present in the index, and, as stated above, integers are valid labels. ------------------------------ The exception originates at the _maybe_cast_slice_bound check, which rejects all integers regardless of whether the Index contains any integers: https://github.com/pandas-dev/pandas/blob/6d2398a58fda68e40f116f199439504558c7774c/pandas/core/indexes/base.py#L4846-L4863 Maybe it should first check if self.holds_integer? Expected Output I expected series.loc[1:'eggs'] not to raise TypeError because of the integer label. I expected that expression to return the following slice view of the Series: 1 0 spam 1 2 2 eggs 3 dtype: int64 Output of pd.show_versions() INSTALLED VERSIONS ------------------ commit: 6d2398a python: 3.7.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.25.0.dev0+598.g6d2398a58 pytest: 4.5.0 pip: 19.1.1 setuptools: 41.0.1 Cython: 0.29.7 numpy: 1.16.3 scipy: 1.3.0 pyarrow: 0.13.0 xarray: 0.12.1 IPython: 7.5.0 sphinx: 2.0.1 patsy: 0.5.1 dateutil: 2.8.0 pytz: 2019.1 blosc: 1.8.1 bottleneck: 1.2.1 tables: 3.5.1 numexpr: 2.6.9 feather: None matplotlib: 3.1.0 openpyxl: 2.6.2 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.8 lxml.etree: 4.3.3 bs4: 4.7.1 html5lib: 1.0.1 sqlalchemy: 1.3.3 pymysql: None psycopg2: None jinja2: 2.10.1 s3fs: 0.2.1 fastparquet: 0.3.1 pandas_gbq: None pandas_datareader: None gcsfs: None — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26491?email_source=notifications&email_token=AAKAOIWGITX5CKEBSOWTHETPWVZMNA5CNFSM4HOWDLXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVIJ3NA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITHWFXL2PLAGKAAD3DPWVZMNANCNFSM4HOWDLXA> .

jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 23, 2019

phofl mentioned this issue Nov 6, 2020

Bug in loc raised for numeric label even when label is in Index #37675

Merged

5 tasks

jreback added this to the 1.2 milestone Nov 8, 2020

jreback added the Bug label Nov 8, 2020

jreback closed this as completed in #37675 Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

plammens commented May 22, 2019 •

edited

Loading

TomAugspurger commented May 22, 2019 via email

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491

Comments

plammens commented May 22, 2019 • edited Loading

Details

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented May 22, 2019 via email

plammens commented May 22, 2019 •

edited

Loading

Output of `pd.show_versions()`