Skip to content

BUG/Not Implemented Panel.to_frame() with MultiIndex #5402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Oct 31, 2013 · 8 comments · Fixed by #5417
Closed

BUG/Not Implemented Panel.to_frame() with MultiIndex #5402

TomAugspurger opened this issue Oct 31, 2013 · 8 comments · Fixed by #5417
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Should this be doable?

In [39]: df = pd.DataFrame({'A': [1, 2], 'B': pd.to_datetime(['a', 'b'])},
                  index=pd.MultiIndex.from_tuples([(1, 'one'), (1, 'two')]))

In [40]: df
Out[40]: 
       A  B
1 one  1  a
  two  2  b

In [41]: wp = pd.Panel({'i1': df, 'i2': df})

In [42]: wp.to_frame()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-e49d9f2f9609> in <module>()
----> 1 wp.to_frame()

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.12.0_993_gda89834-py2.7-macosx-10.8-x86_64.egg/pandas/core/panel.pyc in to_frame(self, filter_observations)
    846         index = MultiIndex(levels=[self.major_axis, self.minor_axis],
    847                            labels=[major_labels, minor_labels],
--> 848                            names=[maj_name, min_name], verify_integrity=False)
    849 
    850         return DataFrame(data, index=index, columns=self.items)

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.12.0_993_gda89834-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in __new__(cls, levels, labels, sortorder, names, copy, verify_integrity)
   1880         if names is not None:
   1881             # handles name validation
-> 1882             subarr._set_names(names)
   1883 
   1884         if sortorder is not None:

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.12.0_993_gda89834-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in _set_names(self, values, validate)
   2150         # set the name
   2151         for name, level in zip(values, self.levels):
-> 2152             level.rename(name, inplace=True)
   2153 
   2154     names = property(

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.12.0_993_gda89834-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in set_names(self, names, inplace)
    333         """
    334         if not com.is_list_like(names):
--> 335             raise TypeError("Must pass list-like as `names`.")
    336         if inplace:
    337             idx = self

TypeError: Must pass list-like as `names`.

I think the issue comes when the index of the lower dimensional DataFrame (df in this case) is already a MultiIndex. These two work:

In [45]: wp.transpose(1, 0, 2).to_frame()
Out[45]: 
              1    
            one two
major minor        
i1    A       1   2
      B       a   b
i2    A       1   2
      B       a   b

In [46]: wp.transpose(1, 2, 0).to_frame()
Out[46]: 
              1    
            one two
major minor        
A     i1      1   2
      i2      1   2
B     i1      a   b
      i2      a   b

I was expecting that wp.to_frame() would create a new MultiIndex with 3 levels:

In [63]: df = pd.DataFrame({'A': [1, 2, 1, 2], 'B': pd.to_datetime(['a', 'b', 'a', 'b'])},
                  index=pd.MultiIndex.from_tuples([('i1', 1, 'one'), ('i1', 1, 'two'), ('i2', 1, 'one'), ('i2', 1, 'two')]))

In [64]: df
Out[64]: 
          A  B
i1 1 one  1  a
     two  2  b
i2 1 one  1  a
     two  2  b

The ordering of the new MultiIndex (with wp.items inserted) is ambiguous... But something like that. You could always swaplevels later.

(side note to myself: check on if verify_integrity is validate in MultiIndex land. It doesn't get passed to _set_names).

@jtratner
Copy link
Contributor

verify_integrity vs validate:

verify_integrity means 'everything matches up' and is non-trivial (ie all
labels are the same length, no label is >= length of corresponding
levelset)

validate means 'check that the length of the names/levels/labels are the
same as the length of the other levels/names/labels or vice versa'. You
need this because at some point you have to set levels labels and names on the constructor and at that point some of those things don't exist.

So validate is internal to MI, whereas verify_integrity is more external (because internal functions that reconstruct MI know that they are valid integrity-wise, so they can set that to False).

doesn't matter to verify integrity on set_names, because they just get set on levelsets themselves and don't matter for integrity purposes (if the lengths match, you're golden).

@jtratner
Copy link
Contributor

edited my comment to be complete explanation.

@jtratner
Copy link
Contributor

In particular, this means that doing assert len(mi.names) == len(mi.levels) is totally meaningless. because mi.names iterates over levels to return the names. It might make sense to cache that eventually though...

@TomAugspurger
Copy link
Contributor Author

Ok thanks for the explanation. The thread where you worked on MultiIndex was also hugely informative.

Are there any fundamental objections to being able to do Panel.to_frame when we'd be shoving a new level into a MultiIndex? I may be missing edge cases but I don't see any problems. The example I gave at the bottom of my OP should have probably put the new level (the items i1 and i2) on the innermost level of the MultiIndex to be consistent with stack(). I can look at this over the weekend.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2013

@TomAugspurger would welcome this.....to_frame prob not tested all that well (and related to your other issue), some construction issues in Panel as well (from dicts)

@TomAugspurger
Copy link
Contributor Author

Ok. I'll write some more tests to go along with this.

The bug comes from trying to build the result MultiIndex. The major and minor indices get passed to the MultiIndex constructor (levels argument). This isn't a problem when they're just Indexes but in this case its trying to pass a MultiIndex instance to the MultiIndex constructor. Are there any other cases where MultiIndex() accepting another (smaller) MultiIndex would be useful? If not it should be pretty easy to put fix in Panel.to_frame(). Just flatten things out and replicate the labels for each additional level.

@jtratner
Copy link
Contributor

jtratner commented Nov 1, 2013

it's a little weird, but not impossible, for MultiIndex(MultiIndex) to just
return the object. Just means that both __init__ and __new__ have to be
aware of it. I'm not clear on why you'd need to do that though.

@TomAugspurger
Copy link
Contributor Author

Ya I couldn't think of any reasons that would be useful either.

On Nov 1, 2013, at 4:04 PM, "Jeff Tratner" <[email protected]mailto:[email protected]> wrote:

it's a little weird, but not impossible, for MultiIndex(MultiIndex) to just
return the object. Just means that both __init__ and __new__ have to be
aware of it. I'm not clear on why you'd need to do that though.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5402#issuecomment-27600877.

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Jan 15, 2014
Fixes bug that caused failure on Panel.to_frame()
if major_axis was a MultiIndex (pandas-dev#5402)

refactor reindexing to index.py

API/REF: refactor bits to index. rename
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants