Skip to content

df.head() and .tail() weirdness #5370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Oct 29, 2013 · 9 comments · Fixed by #5373
Closed

df.head() and .tail() weirdness #5370

michaelaye opened this issue Oct 29, 2013 · 9 comments · Fixed by #5373

Comments

@michaelaye
Copy link
Contributor

I am data wrangling some Excel data into a dataframe.

At some point it has an Index like so:

Index([u'(K)', -440.0, -439.0, -438.0, -437.0, -436.0, -435.0, -434.0, -433.0, -432.0], dtype='object')

Naturally, I want to drop the "u'(K)'" index. Interestingly, after doing so the head() and tail() calculation totally goes haywire, and that irrespective of using this Index with dtype object or after converting it to the new Float64Index.:

<class 'pandas.core.frame.DataFrame'>
Float64Index: 446 entries, -440.0 to 5.0
Data columns (total 7 columns):
Channel 3 (A3)    446  non-null values
Channel 4 (A4)    446  non-null values
Channel 5 (A5)    446  non-null values
Channel 6 (A6)    446  non-null values
Channel 7 (B1)    446  non-null values
Channel 8 (B2)    446  non-null values
Channel 9 (B3)    446  non-null values
dtypes: float64(7)

or before, with dtypes still as 'object':

<class 'pandas.core.frame.DataFrame'>
Float64Index: 446 entries, -440.0 to 5.0
Data columns (total 7 columns):
Channel 3 (A3)    446  non-null values
Channel 4 (A4)    446  non-null values
Channel 5 (A5)    446  non-null values
Channel 6 (A6)    446  non-null values
Channel 7 (B1)    446  non-null values
Channel 8 (B2)    446  non-null values
Channel 9 (B3)    446  non-null values
dtypes: object(7)

I think I can see what's happening here: The head calculation somehow picks up the 5.0 value in the Index and choses to display the dataframe until then. Which now makes me question: Did the API of head() change and I didn't notice? My apologies if this actually is a case of PEBKAC.

The process of my data wrangling can be seen here:
http://nbviewer.ipython.org/7208717

pandas version: '0.12.0-1000-gea97682'

@jreback
Copy link
Contributor

jreback commented Oct 29, 2013

@michaelaye you have a Float64Index which select only on values; you must use iloc to select by the number, see here: http://pandas.pydata.org/pandas-docs/dev/indexing.html#float64index, this is new in 0.13.0

you exposed a bug in what head/tail were doing, fixed in #5373

@jtratner
Copy link
Contributor

This emphasizes the need for a convert_objects call in Excel reader,
because right now all numbers get read in as floats, so you can end up with
issues like that Float64Index. [though in this specific case it actually
makes zero difference because the '(K)' messes things up. We should
consider giving convert_objects a cast_index option too.]

@jreback
Copy link
Contributor

jreback commented Oct 29, 2013

this has nothing to do with conversion

not should convert objects have anything to do with an index conversions (which cannot be done in any event as it's by definition object nj this case)

@jtratner
Copy link
Contributor

Just that you end up with surprising dtypes from Excel.

@jreback
Copy link
Contributor

jreback commented Oct 29, 2013

@jtratner oh....yes...you almost certainly need to do a convert_objects from excel (maybe infer=True) kw (or even allow dtype={} to specifiy...but that's a separate issue

@michaelaye
Copy link
Contributor Author

@jreback I'm a bit confused as I didn't do any selection, I was only calling head(). Or is it that head() internally calls exactly the same mechanisms as I would do when I select the first five rows?

@michaelaye
Copy link
Contributor Author

Yeah, okay, spare your typing, squash other bugs instead! ;) I had a look at your PR, github is so cool...

@jreback
Copy link
Contributor

jreback commented Oct 29, 2013

@michaelaye head was basically doing the wrong kind of selection (well it works on all non-float indexes).....fixed up....thanks for the report!

@jtratner
Copy link
Contributor

Yeah, shouldn't have hijacked this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants