Skip to content

Incorrect index label displayed on MultiIndex DataFrame #14882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fonnesbeck opened this issue Dec 14, 2016 · 5 comments
Closed

Incorrect index label displayed on MultiIndex DataFrame #14882

fonnesbeck opened this issue Dec 14, 2016 · 5 comments
Labels
Milestone

Comments

@fonnesbeck
Copy link

fonnesbeck commented Dec 14, 2016

I have a DataFrame with a hierarchical index as follows:

MultiIndex(levels=[[30, 40, 50], [6, 12, 24], ['MRIgFUS', 'ablation', 'hysterectomy', 'iud', 'myomectomy', 'none', 'uae']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]],
           names=['age', 'followup', 'next intervention'])

Notice in particular the values of the first two levels, and that they are balanced in the labels. However, the display of the table is incorrect. Here is what it looks like:

Notice the second label of the first two levels of the hierarchical index are repeated (40 and 40 instead of 40 and 50, and 12 and 12 instead of 12 and 24).

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 31.0.1
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@fonnesbeck
Copy link
Author

NB confirmed that this occurs in a build from the current master as well.

@fonnesbeck
Copy link
Author

This issue disappears if I set notebook_repr_html to False. Does this make it a Jupyter issue?

@TomAugspurger
Copy link
Contributor

Probably a pandas issue.

I'm having trouble reproducing it with just

from pandas import *

idx = MultiIndex(levels=[[30, 40, 50], [6, 12, 24], ['MRIgFUS', 'ablation', 'hysterectomy', 'iud', 'myomectomy', 'none', 'uae']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], [0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]],
           names=['age', 'followup', 'next intervention'])
df = pd.DataFrame({"A": 1}, index=idx)
df

I suspect it's related to the particular sorting of your index (do you have repr issues with df.sort_index()?) Could you maybe paste your df.index.values.tolist()?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 14, 2016

It's a pandas bug (pandas creates the html that jupyter displays), and has probably something to do with the truncation code. Because when you set the number of rows higher, eg pd.options.display.max_rows = 100, it displays correctly.

@jorisvandenbossche jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string labels Dec 14, 2016
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 22, 2016

I'm attempting to chase this down. @jorisvandenbossche is correct. It is the truncation code.

@jreback jreback added Docs Difficulty Intermediate and removed Docs Output-Formatting __repr__ of pandas objects, to_string labels Dec 22, 2016
@jreback jreback added this to the Next Major Release milestone Dec 22, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 3, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants