-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Label-based indexing on a Series with an index of dtype=object raises TypeError when using slices with integer bound #26491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This raising probably isn't intentional.
…On Wed, May 22, 2019 at 11:39 AM Paolo Lammens ***@***.***> wrote:
Code Sample
import pandas as pd
series = pd.Series(range(4), index=[1, 'spam', 2, 'eggs'])
series## 1 0## spam 1## 2 2## eggs 3## dtype: int64
series.index## Index([1, 'spam', 2, 'eggs'], dtype='object')
series.loc['spam':'eggs']## spam 1## 2 2## eggs 3## dtype: int64
series.loc[1:'eggs']# raises TypeError
Problem description
When indexing a pd.Series (or an equivalent pandas object) that has an
index of dtype=object (upcasted from example from a list of ints and strs),
with .loc's label-based indexing, passing a slice that contains an
integer bound (as in 1:'eggs') results in a TypeError, even if the
integer *is indeed a label* in the index of the series:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1504, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1871, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexing.py", line 1537, in _get_slice_axis
slice_obj.step, kind=self.name)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4784, in slice_indexer
kind=kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 5002, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4914, in get_slice_bound
label = self._maybe_cast_slice_bound(label, side, kind)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 4861, in _maybe_cast_slice_bound
self._invalid_indexer('slice', label)
File "C:\Users\Paolo\Code\PycharmProjects\pandas\pandas\core\indexes\base.py", line 3154, in _invalid_indexer
kind=type(key)))
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
Other non-integer (and non-float) slice bounds are accepted, as shown
above.
Is this the expected behaviour? Possibly it's a documentation issue.
According to the section Selection by Label
<http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-label>
in the indexing doc page, the second warning says:
.loc is strict when you present slicers that are not compatible (or
convertible) with the index type. For example using integers in a
DatetimeIndex. These will raise a TypeError.
What exactly is meant by "compatible or convertible"? An int is
definitely an object, so it should be compatible with the index type.
Furthermore, the first paragraph in the section reads (emphasis mine):
pandas provides a suite of methods in order to have purely label based
indexing. This is a strict inclusion based protocol. Every label asked for
must be in the index, or a KeyError will be raised. When slicing, both the
start bound AND the stop bound are included, if present in the index. *Integers
are valid labels*, but they refer to the label and not the position.
And, finally, the subsection Slicing with labels
<http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#slicing-with-labels>
says:
When using .loc with slices, if both the start and the stop labels are
present in the index, then elements located between the two (including
them) are returned
By reading this, to me it would seem that using a slice such as 1:'eggs'
or 1:2 in the above example with the loc indexer should be perfectly
valid. The indexer is present in the index, and, as stated above, integers
are valid labels.
------------------------------
The exception originates at the _maybe_cast_slice_bound check, which
rejects all integers regardless of whether the Index contains any
integers:
https://github.com/pandas-dev/pandas/blob/6d2398a58fda68e40f116f199439504558c7774c/pandas/core/indexes/base.py#L4846-L4863
Maybe it should first check if self.holds_integer?
Expected Output
I expected series.loc[1:'eggs'] not to raise TypeError because of the
integer label. I expected that expression to return the following slice
view of the Series:
1 0
spam 1
2 2
eggs 3
dtype: int64
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 6d2398a
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.25.0.dev0+598.g6d2398a58
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.3.0
pyarrow: 0.13.0
xarray: 0.12.1
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: 0.3.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26491?email_source=notifications&email_token=AAKAOIWGITX5CKEBSOWTHETPWVZMNA5CNFSM4HOWDLXKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVIJ3NA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITHWFXL2PLAGKAAD3DPWVZMNANCNFSM4HOWDLXA>
.
|
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When indexing a pandas axis that has an
index
ofdtype=object
, with label-based indexing, passing a slice that contains an integer bound results in aTypeError
, even if the integer is indeed a label in theindex
of the series.Details
Code Sample
When indexing a pandas axis that has an
index
ofdtype=object
, with label-based indexing, passing a slice that contains an integer bound results in aTypeError
, even if the integer is indeed a label in theindex
of the series.Problem description
When indexing a
pd.Series
(or an equivalent pandas object) that has anindex
ofdtype=object
(upcasted for example from a list ofint
s andstr
s), with.loc
's label-based indexing, passing a slice that contains an integer bound (as in1:'eggs'
) results in aTypeError
, even if the integer is indeed a label in theindex
of the series:Other non-integer (and non-float) slice bounds are accepted, as shown above.
Is this the expected behaviour? Possibly it's a documentation issue. According to the section Selection by Label in the indexing doc page, the second warning says:
What exactly is meant by "compatible or convertible"? An
int
is definitely anobject
, so it should be compatible with the index type.Furthermore, the first paragraph in the section reads (emphasis mine):
And, finally, the subsection Slicing with labels says:
By reading this, to me it would seem that using a slice such as
1:'eggs'
or1:2
in the above example withloc
should be perfectly valid. The indexers are present in the index, and, as stated above, integers are valid labels.The exception originates at the
_maybe_cast_slice_bound
check, which rejects all integers regardless of whether theIndex
contains any integers:pandas/pandas/core/indexes/base.py
Lines 4846 to 4863 in 6d2398a
Expected Output
I expected
series.loc[1:'eggs']
not to raiseTypeError
because of the integer label. I expected that expression to return the following slice view of theSeries
:Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: