Bug in Fancy/Boolean Indexing with nested lists #2702

jim22k · 2013-01-15T21:43:41Z

Fancy or Boolean indexing on a Series has two strange behaviors. My examples only show the behavior with Fancy indexing, but it's the same for Boolean indexing.

LHS vs RHS length

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = range(27)
    >>> list(s)
    [0, 1, 2]

I would have expected an error, similar to what I get with slice indexing

    >>> s = pd.Series(list('abc'))
    >>> s[0:3] = range(27)
    ValueError: cannot copy sequence with size 27 to array axis with dimension 3

An even odder behavior is when you have too few items in the RHS

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = range(2)
    >>> list(s)
    [0, 1, 0]

It seems to be using something like itertools.cycle which seems very arbitrary to me

Nested RHS

This may seem like a strange use of pandas, but I need to store Python lists

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = [[100,200], [300,400], [500,600]]
    >>> list(s)
    [100, 200, 300]

Very strange. It's like it flattens the input first.
But this flattening only happens if the nested levels are all the same size.

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = [[100,200], [300,400], [500,600, 601, 602]]
    >>> list(s)
    [[100,200], [300,400], [500,600, 601, 602]]

I know in numpy the array constructor would make a distinction between these two inputs, so maybe that's the reason for the difference, but I still don't see why ndarrays are being flattened.

I can work around the issue by converting the RHS to a 1-D array and passing that in.

    >>> s = pd.Series(list('abc'))
    >>> rhs = np.empty(3).astype('object')
    >>> rhs[:] = [[100,200], [300,400], [500,600]]
    >>> s[[0,1,2]] = rhs
    >>> list(s)
    [[100,200], [300,400], [500,600]]

Slice indexing doesn't have this problem at all

    >>> s = pd.Series(list('abc'))
    >>> s[0:3] = [[100,200], [300,400], [500,600]]
    >>> list(s)
    [[100,200], [300,400], [500,600]]

My Question: Are these behaviors a bug or a "feature"? I think Fancy/Boolean indexing should operate the same as slice indexing -- i.e. check for matching lengths and don't auto-convert to numpy array.

The text was updated successfully, but these errors were encountered:

wesm · 2013-01-20T19:06:11Z

Oh boy. Hitting a bunch of buggy/underspecified NumPy stuff here. I'm having a look but may kick this can down the road

wesm · 2013-01-20T19:09:29Z

This is all NumPy behavior. It's going to be too much work for me to fix this anytime soon. I'm already completely fed up with the NumPy library so i would like to overhaul all this mess to make it consistent at some point in the future

jim22k · 2013-01-21T15:35:34Z

You're right. I just validated the same bugs on a plain ndarray. Do you think there is any value in raising this issue on a NumPy forum?

Thanks for looking into these corner cases. Pandas just keeps getting better and I find myself using it more and more when dealing with any non-trivial dataset.

jtratner · 2013-09-05T00:27:49Z

@jreback is this resolved for pandas now that Series isn't an ndarray anymore?

cpcloud · 2013-09-05T00:31:44Z

Did I miss something? Series is no longer an NDFrame?

jreback · 2013-09-05T00:36:25Z

I will take a look - haven't seen this issue before

jtratner · 2013-09-05T00:39:47Z

@cpcloud whoops! miswrote - mean no longer an ndarray

cpcloud · 2013-09-05T00:46:31Z

@jtratner No worries! Figured it was something like that....just wanted to stay in the loop!

jreback · 2013-09-05T15:00:16Z

This is easy to make all of these act the same, just an extension in where. Right for ndim==1 we basically handle a single element and a single list element on the rhs, as well as a boolean indexer that matches the rhs.

so this good (#2745)

In [3]: s = Series([1, 2])

In [4]: s[[True, False]] = [0, 1]

In [5]: s
Out[5]: 
0    0
1    2
dtype: int64

else it is converted to a ndarray. So just need to deal with shorter/longer ones and raise a ValueError.
https://github.com/pydata/pandas/blob/master/pandas/core/generic.py#L2285

ghost assigned wesm Jan 20, 2013

jreback mentioned this issue Sep 5, 2013

BUG/ER: (GH2702) Bug with Series indexing not raising an error when the right-hand-side has an incorrect length #4756

Merged

jreback closed this as completed in #4756 Sep 6, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in Fancy/Boolean Indexing with nested lists #2702

Bug in Fancy/Boolean Indexing with nested lists #2702

jim22k commented Jan 15, 2013

wesm commented Jan 20, 2013

wesm commented Jan 20, 2013

jim22k commented Jan 21, 2013

jtratner commented Sep 5, 2013

cpcloud commented Sep 5, 2013

jreback commented Sep 5, 2013

jtratner commented Sep 5, 2013

cpcloud commented Sep 5, 2013

jreback commented Sep 5, 2013

Bug in Fancy/Boolean Indexing with nested lists #2702

Bug in Fancy/Boolean Indexing with nested lists #2702

Comments

jim22k commented Jan 15, 2013

LHS vs RHS length

Nested RHS

wesm commented Jan 20, 2013

wesm commented Jan 20, 2013

jim22k commented Jan 21, 2013

jtratner commented Sep 5, 2013

cpcloud commented Sep 5, 2013

jreback commented Sep 5, 2013

jtratner commented Sep 5, 2013

cpcloud commented Sep 5, 2013

jreback commented Sep 5, 2013