DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

briangerke · 2014-08-04T04:51:34Z

I submit that this should be a valid pandas DataFrame, odd though it may appear

df = DataFrame([[np.mean, np.median],['mean','median']],
               columns=MultiIndex.from_tuples([('functs','mean'),
                                               ('functs','median')]), 
               index=['function', 'name'])

It looks like this:

In [45]: print df
                                functs                                
                                  mean                          median
function  <function mean at 0x33fb8c0>  <function median at 0x34fac80>
name                              mean                          median

However, it can't always be indexed using the .ix or .loc indexers.

This works as expected:

print df[('functs','mean')]['function']

As does this:

print df.loc['name'][('functs','mean')]

But this raises an AttributeError, because np.mean does not have an ndim attribute, but the indexing code assumes that it does:

print df.loc['function'][('functs','mean')]

The traceback is as follows:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-43-95d403db0b5a> in <module>()
----> 1 df.loc['function',('functs','mean')]

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1140     def __getitem__(self, key):
   1141         if type(key) is tuple:
-> 1142             return self._getitem_tuple(key)
   1143         else:
   1144             return self._getitem_axis(key, axis=0)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    653     def _getitem_tuple(self, tup):
    654         try:
--> 655             return self._getitem_lowerdim(tup)
    656         except IndexingError:
    657             pass

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    761         # we may have a nested tuples indexer here
    762         if self._is_nested_tuple_indexer(tup):
--> 763             return self._getitem_nested_tuple(tup)
    764 
    765         # we maybe be using a tuple to represent multiple dimensions here

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_nested_tuple(self, tup)
    842             # has the dim of the obj changed?
    843             # GH 7199
--> 844             if obj.ndim < current_ndim:
    845 
    846                 # GH 7516

AttributeError: 'function' object has no attribute 'ndim'

This worked for me until I upgraded to 0.14.1. Either it is a bug or I am wrong that DataFrames are type-agnostic containers. (In the latter case, it seems that some kind of error-handling should be in place to check for disallowed datatypes.)

The text was updated successfully, but these errors were encountered:

jreback · 2014-08-04T13:34:39Z

did you upgrade from < 0.14.0?

you should always use this form of access: df.loc['function',('functs','mean')], FYI

this is a bug a np.isscalar(np.mean) is False (which IMHO is sort of weird). But I think I'll just check for a .ndim attribute. It does this to figure out whether to return the object or iterate thru additional dimensions.

briangerke · 2014-08-04T18:29:08Z

Thanks for the quick turnaround. I don't know if it's useful to answer your question at this point, but I don't remember what I had before upgrading. I think it was some flavor of 0.13; I don't recall going to 0.14.0 on the system I was using.

And I know I should use .loc for indexing. Using the "wrong" idiom was just a handy way to discover and demonstrate the issue.

jreback · 2014-08-04T18:59:52Z

@briangerke that 's what I figured. This path was introduced in 0.14.0 (and prob where the bug cropped up). fixed in master now. thanks for the report.

jreback added Bug labels Aug 4, 2014

jreback added this to the 0.15.0 milestone Aug 4, 2014

jreback added the MultiIndex label Aug 4, 2014

jreback mentioned this issue Aug 4, 2014

REGR: Regression in multi-index indexing with a non-scalar type object (GH7914) #7921

Merged

jreback closed this as completed in #7921 Aug 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

briangerke commented Aug 4, 2014

jreback commented Aug 4, 2014

briangerke commented Aug 4, 2014

jreback commented Aug 4, 2014

DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

Comments

briangerke commented Aug 4, 2014

jreback commented Aug 4, 2014

briangerke commented Aug 4, 2014

jreback commented Aug 4, 2014