You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using pd.merges_asof() to merge two dataframes together. The by keys are categoricals, but not equal, apparently (one has a few values more than the other). It took me a little bit to figure that out, however, because the error message for this was a bit confusing: "incompatible merge keys [1] category and category, must be the same type". It'd be nice if it was clearer. I can do the pull request, once we figure out how to make this better.
I see the following possible solutions:
Special-case the error message for dtypes that take parameters and so are not necessarily all equal. Something a bit like: "incompatible merge keys: both are category, but not equal ones" (Easiest solution.)
Make a nicer error message for categories, by digging a little bit into what makes them not equal. Something like "incompatible merge keys: both categories, but the left one has 3 levels more", or "but they have different levels: ..."
Change the __str__ method for CategoricalDType to print something a bit more informative than "category". (No guarantee though that what we would print would always allow people to distinguish two not-equal categoricals as not equal.. unless we were to print out all the levels.)
Anything that sounds good here?
The text was updated successfully, but these errors were encountered:
This is pretty tricky. Considering both option 1 and 2 to be a form of special casing I wouldn't be in favor of either of them. Not sure how option 3 would end up looking or if it even makes sense.
What would even be the requirements here? That the categoricals being merged would have to be ordered, monotonic, and that the right categorical would have to be a subset of the left?
This is about categoricals on both sides of the by key of pd.merge_asof(). My superficial understanding is that they need to be equal, because there's the equivalent of a groupby on those happening somewhere under the hood. But I'll let someone who knows more chime in.
@chrish42 this is correct, categorical must be exactly equal. I suppose the error message could be enhanced if you wanted to do a pull request. soln 1 is the most reasonable. the others are non-trivial / hard.
I'm using
pd.merges_asof()
to merge two dataframes together. Theby
keys are categoricals, but not equal, apparently (one has a few values more than the other). It took me a little bit to figure that out, however, because the error message for this was a bit confusing: "incompatible merge keys [1] category and category, must be the same type". It'd be nice if it was clearer. I can do the pull request, once we figure out how to make this better.I see the following possible solutions:
__str__
method for CategoricalDType to print something a bit more informative than "category". (No guarantee though that what we would print would always allow people to distinguish two not-equal categoricals as not equal.. unless we were to print out all the levels.)Anything that sounds good here?
The text was updated successfully, but these errors were encountered: