-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG GH11600 - MultiIndex column level names lost when to_sparse() called #11606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def test_to_sparse_preserve_multiindex_names_on_columns(self): | ||
sparse_multiindex_frame = self.dense_multiindex_frame.to_sparse() | ||
self.assertTrue(self.dense_multiindex_frame.columns.equals(sparse_multiindex_frame.columns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should all be 2 test and just use
assert_sp_frame_equal or dense version to compare versus and expected frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I can change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into assert_sp_frame_equal it appears that if I feed it a dense reference frame and a sparse frame it converts the sparse frame to dense and compares them in the dense state. This would make an assert_sp_frame of the sparse and dense matrices the same as the round trip test (i.e. it only tests the sparse frame after passing it through to_dense).
I am trying to actually test the column names while still sparse against the reference column names to make sure we don't wind up just preserving the column names during conversion and putting them back on when going back to_dense. So I actually want a test of the sparse column names without making it dense again. I agree the round_trip_test goes much better with assert_frame_equal, but it looks like testing the column names are actually valid while sparse can't use assert_sp_frame_equal, do I have that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more concise, I think two tests can go to a single assert_sp_frame, but I think the other two (which could be made one test with two asserts) still needs to directly test the column names, perhaps using assert_index_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you prob want 2 tests here. one of which is a full round-trip (dense->sparse->dense) to check for preservation, then another which takes a constructed sparse and check for correct attribute preservation on construction (with sparse again)
Hi Jeff- Hopefully I understood what you were looking for in the tests, it now has one roundtrip and one check after sparse construction, both using the pandas-specific asserts. If this was not what you meant just let me know how it could be better. Cheers, Zeke |
self.dense_multiindex_frame = dense_multiindex_frame.fillna(value=3.14) | ||
|
||
def test_to_sparse_preserve_multiindex_names_columns(self): | ||
sparse_multiindex_frame = self.dense_multiindex_frame.to_sparse().copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to copy
couple of points; pls add a whatsnew note (bug fixes) for 0.17.1. ping when green. |
pls squash as well |
Code all fixed up but squashing still left me two commits for some reason. I'll look into the Git stuff but it might take a couple days as I have some other stuff to do. |
BUG GH11600 - MultiIndex column level names were getting lost in sparse conversion Updated testing and whatsnew to follow project preferences
d131971
to
8d2f7fa
Compare
Squashed all my stuff down to one commit and updated pull request. Looks like you have a commit that's mixed in there to adjust a warning but github says its okay. Let me know if anything can be better, I think I hit everything you asked for. |
merged via 207e0ce thanks! |
Great! I updated on stackoverflow with a link to the issue here just in case somebody (possibly who hasn't updated) should ever run into it. Seems perhaps unlikely for such an edge case but better that than somebody finding the SO question and not knowing its fixed if they update! |
closes #11600
Fixed problem with multi-index column level names not propagating into sparse frames or back out to dense on a round trip through sparse. Includes 4 new tests to cover some relevant scenarios. Problem fixed for the conventions to_sparse path, I'm not sure about other paths where something else is passed to SparseDataSeries directly, those would be outside scope of bug.