Skip to content

DataFrame with MultiIndex columns implicitly assumes that stored data has an ndim attribute #7914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
briangerke opened this issue Aug 4, 2014 · 3 comments · Fixed by #7921
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@briangerke
Copy link

I submit that this should be a valid pandas DataFrame, odd though it may appear

df = DataFrame([[np.mean, np.median],['mean','median']],
               columns=MultiIndex.from_tuples([('functs','mean'),
                                               ('functs','median')]), 
               index=['function', 'name'])

It looks like this:

In [45]: print df
                                functs                                
                                  mean                          median
function  <function mean at 0x33fb8c0>  <function median at 0x34fac80>
name                              mean                          median

However, it can't always be indexed using the .ix or .loc indexers.

This works as expected:

print df[('functs','mean')]['function']

As does this:

print df.loc['name'][('functs','mean')]

But this raises an AttributeError, because np.mean does not have an ndim attribute, but the indexing code assumes that it does:

print df.loc['function'][('functs','mean')]

The traceback is as follows:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-43-95d403db0b5a> in <module>()
----> 1 df.loc['function',('functs','mean')]

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1140     def __getitem__(self, key):
   1141         if type(key) is tuple:
-> 1142             return self._getitem_tuple(key)
   1143         else:
   1144             return self._getitem_axis(key, axis=0)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    653     def _getitem_tuple(self, tup):
    654         try:
--> 655             return self._getitem_lowerdim(tup)
    656         except IndexingError:
    657             pass

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup)
    761         # we may have a nested tuples indexer here
    762         if self._is_nested_tuple_indexer(tup):
--> 763             return self._getitem_nested_tuple(tup)
    764 
    765         # we maybe be using a tuple to represent multiple dimensions here

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_nested_tuple(self, tup)
    842             # has the dim of the obj changed?
    843             # GH 7199
--> 844             if obj.ndim < current_ndim:
    845 
    846                 # GH 7516

AttributeError: 'function' object has no attribute 'ndim'

This worked for me until I upgraded to 0.14.1. Either it is a bug or I am wrong that DataFrames are type-agnostic containers. (In the latter case, it seems that some kind of error-handling should be in place to check for disallowed datatypes.)

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

did you upgrade from < 0.14.0?

you should always use this form of access: df.loc['function',('functs','mean')], FYI

this is a bug a np.isscalar(np.mean) is False (which IMHO is sort of weird). But I think I'll just check for a .ndim attribute. It does this to figure out whether to return the object or iterate thru additional dimensions.

@briangerke
Copy link
Author

Thanks for the quick turnaround. I don't know if it's useful to answer your question at this point, but I don't remember what I had before upgrading. I think it was some flavor of 0.13; I don't recall going to 0.14.0 on the system I was using.

And I know I should use .loc for indexing. Using the "wrong" idiom was just a handy way to discover and demonstrate the issue.

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

@briangerke that 's what I figured. This path was introduced in 0.14.0 (and prob where the bug cropped up). fixed in master now. thanks for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants