Skip to content

refactor: Decorate api methods that require total ordering #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 28, 2024
Merged

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 20, 2024
@TrevorBergeron TrevorBergeron marked this pull request as ready for review June 20, 2024 20:25
@TrevorBergeron TrevorBergeron requested review from a team as code owners June 20, 2024 20:25
@TrevorBergeron TrevorBergeron requested review from tswast and removed request for chelsea-lin June 20, 2024 20:26
@GarrettWu GarrettWu removed their assignment Jun 21, 2024
@@ -424,6 +433,8 @@ def dropna(self, how: typing.Literal["all", "any"] = "any") -> Index:
return Index(result)

def drop_duplicates(self, *, keep: str = "first") -> Index:
if keep is not False:
enforce_ordered(self, "duplicated")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? I assume it should reflect the public API name used, right?

Suggested change
enforce_ordered(self, "duplicated")
enforce_ordered(self, "drop_duplicates")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -30,6 +30,7 @@
import bigframes.core.expression as ex
import bigframes.core.ordering as order
import bigframes.core.utils as utils
from bigframes.core.validate import enforce_ordered, requires_strict_ordering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@property
def row_bounded(self):
# relevant for determining if window requires total ordering for determinism.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This could be transformed into a docstring. I like that you shared the "why" but I'd also like a little more explanation on what row_bounded means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to docstring

@@ -61,6 +61,7 @@
import bigframes.core.indexes as indexes
import bigframes.core.ordering as order
import bigframes.core.utils as utils
from bigframes.core.validate import enforce_ordered, requires_strict_ordering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import packages not methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -27,6 +27,7 @@
import bigframes.core.blocks as blocks
import bigframes.core.ordering as order
import bigframes.core.utils as utils
from bigframes.core.validate import requires_strict_ordering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import packages not methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd prefer a noun for a package name. How about validations.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -42,6 +42,11 @@ def uses_total_row_ordering(self):
def can_order_by(self):
return False

@property
def order_independent(self):
"""Whether the output of the operator depends on the ordering of input rows. Navigation functions are a notable case."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the title line short.

Suggested change
"""Whether the output of the operator depends on the ordering of input rows. Navigation functions are a notable case."""
"""True if the output of the operator depends on the ordering of input rows.
Navigation functions are a notable case.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -78,6 +83,11 @@ def name(self) -> str:
def arguments(self) -> int:
...

@property
def order_independent(self):
# Almost all aggregation functions are order independent, excepting array and string agg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a docstring too.

Suggested change
# Almost all aggregation functions are order independent, excepting array and string agg
"""True if results don't depend on the order of the input.
Almost all aggregation functions are order independent, excepting ``array_agg`` and ``string_agg``.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -43,6 +43,7 @@
import bigframes.core.ordering as order
import bigframes.core.scalar as scalars
import bigframes.core.utils as utils
from bigframes.core.validate import enforce_ordered, requires_strict_ordering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pytest.param(
lambda x: x.cumsum(),
id="cumsum",
marks=pytest.mark.xfail(raises=bigframes.exceptions.OrderRequiredError),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all of these tests fail with the same exception, prefer with pytest.raises(...) in the test function body. Ideally we check for the function name in the error message with match= as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Jun 26, 2024
@product-auto-label product-auto-label bot removed the size: m Pull request size is medium. label Jun 26, 2024
@TrevorBergeron TrevorBergeron requested a review from tswast June 26, 2024 17:16
@tswast tswast enabled auto-merge (squash) June 28, 2024 14:51
@tswast tswast merged commit ec5b068 into main Jun 28, 2024
23 checks passed
@tswast tswast deleted the unordered_mode branch June 28, 2024 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants