Skip to content

Confusing "MergeError: incompatible merge keys [1] category and category, must be the same type" #26136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrish42 opened this issue Apr 18, 2019 · 3 comments · Fixed by #26242
Labels
Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@chrish42
Copy link
Contributor

I'm using pd.merges_asof() to merge two dataframes together. The by keys are categoricals, but not equal, apparently (one has a few values more than the other). It took me a little bit to figure that out, however, because the error message for this was a bit confusing: "incompatible merge keys [1] category and category, must be the same type". It'd be nice if it was clearer. I can do the pull request, once we figure out how to make this better.

I see the following possible solutions:

  1. Special-case the error message for dtypes that take parameters and so are not necessarily all equal. Something a bit like: "incompatible merge keys: both are category, but not equal ones" (Easiest solution.)
  2. Make a nicer error message for categories, by digging a little bit into what makes them not equal. Something like "incompatible merge keys: both categories, but the left one has 3 levels more", or "but they have different levels: ..."
  3. Change the __str__ method for CategoricalDType to print something a bit more informative than "category". (No guarantee though that what we would print would always allow people to distinguish two not-equal categoricals as not equal.. unless we were to print out all the levels.)

Anything that sounds good here?

@WillAyd
Copy link
Member

WillAyd commented Apr 18, 2019

This is pretty tricky. Considering both option 1 and 2 to be a form of special casing I wouldn't be in favor of either of them. Not sure how option 3 would end up looking or if it even makes sense.

What would even be the requirements here? That the categoricals being merged would have to be ordered, monotonic, and that the right categorical would have to be a subset of the left?

@WillAyd WillAyd added the Categorical Categorical Data Type label Apr 18, 2019
@chrish42
Copy link
Contributor Author

This is about categoricals on both sides of the by key of pd.merge_asof(). My superficial understanding is that they need to be equal, because there's the equivalent of a groupby on those happening somewhere under the hood. But I'll let someone who knows more chime in.

@jreback
Copy link
Contributor

jreback commented Apr 21, 2019

@chrish42 this is correct, categorical must be exactly equal. I suppose the error message could be enhanced if you wanted to do a pull request. soln 1 is the most reasonable. the others are non-trivial / hard.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Apr 21, 2019
@jreback jreback added this to the 0.25.0 milestone May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants