-
Notifications
You must be signed in to change notification settings - Fork 48
feat: df.apply(axis=1)
to support remote function with mutiple params
#851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have docs to update?
def nary_remote_function_op_impl( | ||
*operands: ibis_types.Value, op: ops.NaryRemoteFunctionOp | ||
): | ||
ibis_node = getattr(op.func, "ibis_node", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to do this in the current pr - but we need to move away from storing ibis values in the op definition. We will want to generate this at compile-time only to allow non-ibis compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. b/356686746
# For this to work the following condition must be true: | ||
# 1. The number or input params in the function must be same | ||
# as the number of columns in the dataframe | ||
# 2. The dtypes of the columns in the dataframe must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to accept compatible dtypes? eg the column is int, but the function takes decimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function taking decimal is not something we support right now. There is a longer term desire to expand the datatype support.
RF_SUPPORTED_IO_PYTHON_TYPES = {bool, bytes, float, int, str} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, are there other places where we reconcile dtypes across dataframes or in operations?
reprojected_series = bigframes.series.Series( | ||
series_list[0]._block._force_reproject() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wyh do we need a force_reproject?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just copy-pasting the pattern introduced in this change: ff3bb89#diff-8718ceb6a8f6b68d7b06a15e84043fb866c500d5bfb1f33ad8c945f06815a140
Is the reasoning (got a bit detached unintentionally, sitting at the beginning of the function) still valid?
# Reproject as workaround to applying filter too late. This forces the filter
# to be applied before passing data to remote function, protecting from bad
# inputs causing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it does make a difference, quickly tested in #874 and series.mask
doctest is failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the comment back close to reproject, PTAL
The code samples added should suffice for this change. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕