Skip to content

feat: df.apply(axis=1) to support remote function with mutiple params #851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 2, 2024

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Jul 22, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@shobsi shobsi requested review from a team as code owners July 22, 2024 17:25
@shobsi shobsi requested a review from GarrettWu July 22, 2024 17:25
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 22, 2024
Copy link
Contributor

@GarrettWu GarrettWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have docs to update?

@shobsi shobsi requested a review from TrevorBergeron July 23, 2024 02:15
def nary_remote_function_op_impl(
*operands: ibis_types.Value, op: ops.NaryRemoteFunctionOp
):
ibis_node = getattr(op.func, "ibis_node", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to do this in the current pr - but we need to move away from storing ibis values in the op definition. We will want to generate this at compile-time only to allow non-ibis compilation.

Copy link
Contributor Author

@shobsi shobsi Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. b/356686746

# For this to work the following condition must be true:
# 1. The number or input params in the function must be same
# as the number of columns in the dataframe
# 2. The dtypes of the columns in the dataframe must be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to accept compatible dtypes? eg the column is int, but the function takes decimal?

Copy link
Contributor Author

@shobsi shobsi Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function taking decimal is not something we support right now. There is a longer term desire to expand the datatype support.

RF_SUPPORTED_IO_PYTHON_TYPES = {bool, bytes, float, int, str}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, are there other places where we reconcile dtypes across dataframes or in operations?

Comment on lines +3533 to +3535
reprojected_series = bigframes.series.Series(
series_list[0]._block._force_reproject()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wyh do we need a force_reproject?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just copy-pasting the pattern introduced in this change: ff3bb89#diff-8718ceb6a8f6b68d7b06a15e84043fb866c500d5bfb1f33ad8c945f06815a140

Is the reasoning (got a bit detached unintentionally, sitting at the beginning of the function) still valid?

# Reproject as workaround to applying filter too late. This forces the filter
# to be applied before passing data to remote function, protecting from bad
# inputs causing errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does make a difference, quickly tested in #874 and series.mask doctest is failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the comment back close to reproject, PTAL

@shobsi
Copy link
Contributor Author

shobsi commented Aug 2, 2024

Do we have docs to update?

The code samples added should suffice for this change.

@shobsi shobsi merged commit 2158818 into main Aug 2, 2024
22 checks passed
@shobsi shobsi deleted the shobs-rf-multiple-params branch August 2, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants