Skip to content

deps: use pandas-gbq to determine schema in load_table_from_dataframe #2095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Feb 21, 2025

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Dec 26, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Towards go/pandas-gbq-and-bigframes-redundancy
🦕

@tswast tswast requested review from a team as code owners December 26, 2024 22:28
@tswast tswast requested a review from chalmerlowe December 26, 2024 22:28
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Dec 26, 2024
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Dec 26, 2024
@tswast
Copy link
Contributor Author

tswast commented Dec 27, 2024

tests/unit/test_client.py::TestClientUpload::test_load_table_from_dataframe_w_higher_scale_decimal128_datatype looks like a real failure and might need a change in pandas-gbq.

@tswast
Copy link
Contributor Author

tswast commented Jan 3, 2025

I've tested locally with googleapis/python-bigquery-pandas#844 and that PR should fix the remaining system test.

(venv) ➜  python-bigquery git:(b323176126-pandas-gbq) pytest tests/unit/test_client.py::TestClientUpload::test_load_table_from_dataframe_w_higher_scale_decimal128_datatype
==================================================== test session starts ====================================================
platform linux -- Python 3.12.6, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/swast/src/github.com/googleapis/python-bigquery
configfile: pyproject.toml
collected 1 item                                                                                                            

tests/unit/test_client.py .                                                                                           [100%]

===================================================== 1 passed in 0.08s =====================================================

@tswast tswast changed the title feat: use pandas-gbq to determine schema in load_table_from_dataframe deps: use pandas-gbq to determine schema in load_table_from_dataframe Jan 7, 2025
@tswast tswast requested a review from Linchin January 7, 2025 15:58
@tswast
Copy link
Contributor Author

tswast commented Jan 22, 2025

test_load_table_from_dataframe_w_partial_schema_extra_types

Looks like a real failure (the warning message changed slightly in pandas-gbq). I'll try to update the test to be a bit more robust.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 23, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 23, 2025
@tswast
Copy link
Contributor Author

tswast commented Jan 23, 2025

@Linchin @chalmerlowe I believe this is finally ready for review. The failing checks are for infrastructure / Kokoro problems for a new setup, possibly?

@@ -1259,7 +1259,7 @@ def test_upload_time_and_datetime_56(bigquery_client, dataset_id):
df = pandas.DataFrame(
dict(
dt=[
datetime.datetime(2020, 1, 8, 8, 0, 0),
datetime.datetime(2020, 1, 8, 8, 0, 0, tzinfo=datetime.timezone.utc),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure is we should have defined what happens when you mix and match naive datetime (DATETIME in BQ) with datetime + timezone (TIMESTAMP in BQ). Since the test expects TIMESTAMP, I update this to be a datetime + timezone.

We still mix/match timezones, so I think that part is important to make sure we end up returning UTC values.

Copy link
Collaborator

@chalmerlowe chalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By and large, this looks good.
I have a question that might be a potential blocker.

@@ -74,6 +74,9 @@ bqstorage = [
]
pandas = [
"pandas >= 1.1.0",
"pandas-gbq >= 0.26.1; python_version >= '3.8'",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this means we are testing without pandas-gbq on Python 3.7.

@tswast
Copy link
Contributor Author

tswast commented Feb 5, 2025

@chalmerlowe In 9a5656b I have explicitly uninstalled pandas-gbq in at least one unit and system test version. This should be robust to when we drop Python 3.7 and 3.8. I have verified the pip freeze output locally to confirm that pandas-gbq is not present in the sessions where we uninstall it.

Copy link
Collaborator

@chalmerlowe chalmerlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Approved.

@tswast tswast enabled auto-merge (squash) February 18, 2025 22:35
@tswast tswast merged commit 7603bd7 into main Feb 21, 2025
15 of 23 checks passed
@tswast tswast deleted the b323176126-pandas-gbq branch February 21, 2025 17:45
chalmerlowe added a commit that referenced this pull request Feb 21, 2025
…e` (#2095)

* feat: use pandas-gbq to determine schema in `load_table_from_dataframe`

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix some unit tests

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* bump minimum pandas-gbq to 0.26.1

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* drop pandas-gbq from python 3.7 extras

* relax warning message text assertion

* use consistent time zone presense/absense in time datetime system test

* Update google/cloud/bigquery/_pandas_helpers.py

* Update google/cloud/bigquery/_pandas_helpers.py

Co-authored-by: Chalmer Lowe <[email protected]>

* remove pandas-gbq from at least 1 unit test and system test session

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Chalmer Lowe <[email protected]>
chalmerlowe added a commit that referenced this pull request Feb 28, 2025
* Initial batch of changes to remove 3.7 and 3.8

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* more updates to remove 3.7 and 3.8

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* updates samples/geography/reqs

* updates samples/magics/reqs

* updates samples/notebooks/reqs

* updates linting

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* updates conf due to linting issue

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* updates reqs.txt, fix mypy, lint, and debug in noxfile

* Updates owlbot to correct spacing issue in conf.py

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* updates owlbot imports

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* removes kokoro samples configs for 3.7 & 3.8

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* removes owlbots attempt to restore kokoro samples configs

* removes kokoro system-3.8.cfg

* edits repo sync settings

* updates assorted noxfiles for samples and pyproject.toml

* update test-samples-impl.sh

* updates install_deps template

* Edits to the contributing documentation

* deps: use pandas-gbq to determine schema in `load_table_from_dataframe` (#2095)

* feat: use pandas-gbq to determine schema in `load_table_from_dataframe`

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* fix some unit tests

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* bump minimum pandas-gbq to 0.26.1

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* drop pandas-gbq from python 3.7 extras

* relax warning message text assertion

* use consistent time zone presense/absense in time datetime system test

* Update google/cloud/bigquery/_pandas_helpers.py

* Update google/cloud/bigquery/_pandas_helpers.py

Co-authored-by: Chalmer Lowe <[email protected]>

* remove pandas-gbq from at least 1 unit test and system test session

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Chalmer Lowe <[email protected]>

* Feat: Adds foreign_type_info attribute to table class and adds unit tests. (#2126)

* adds foreign_type_info attribute to table

* feat: Adds foreign_type_info attribute and tests

* updates docstrings for foreign_type_info

* Updates property handling, especially as regards set/get_sub_prop

* Removes extraneous comments and debug expressions

* Refactors build_resource_from_properties w get/set_sub_prop

* updates to foreign_type_info, tests and wiring

* Adds logic to detect non-Sequence schema.fields value

* updates assorted tests and logic

* deps: updates required checks list in github (#2136)

* deps: updates required checks list in github

* deps: updates snippet and system checks in github to remove 3.9

* changes the order of two items in the list.

* updates linting

* reverts pandas back to 1.1.0

* Revert changes related to pandas <1.5

* Revert noxfile.py changes related to pandas <1.5

* Revert constraints-3.9 changes related to pandas <1.5

* Revert test_query_pandas.py changes related to pandas <1.5

* Revert test__pandas_helpers.py changes related to pandas <1.5

* Revert test__versions_helpers.py changes related to pandas <1.5

* Revert tnoxfile.py changes related to pandas <1.5

* Revert test__versions_helpers.py changes related to pandas <1.5

* Revert test_table.py changes related to pandas <1.5

* Update noxfile changes related to pandas <1.5

* Update pyproject.toml changes related to pandas <1.5

* Update constraints-3.9.txt changes related to pandas <1.5

* Update test_legacy_types.py changes related to pandas <1.5

* Updates magics.py as part of reverting from pandas 1.5

* Updates noxfile.py in reverting from pandas 1.5

* Updates pyproject.toml in reverting from pandas 1.5

* Updates constraints.txt in reverting from pandas 1.5

* Updates test_magics in reverting from pandas 1.5

* Updates test_table in reverting from pandas 1.5

* Updates in tests re: reverting from pandas 1.5

* Updates pyproject to match constraints.txt

* updates pyproject.toml to mirror constraints

* remove limit on virtualenv

* updates owlbot.py for test-samples-impl.sh

* updates to owlbot.py

* updates to test-samples-impl.sh

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

* further updates to owlbot.py

* removes unneeded files

* adds presubmit.cfg back in

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants