Skip to content

docs: set index_cols in read_gbq as a best practice #624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions third_party/bigframes_vendored/pandas/io/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,17 @@ def read_gbq(
):
"""Loads a DataFrame from BigQuery.

BigQuery tables are an unordered, unindexed data source. By default,
the DataFrame will have an arbitrary index and ordering.

Set the `index_col` argument to one or more columns to choose an
index. The resulting DataFrame is sorted by the index columns. For the
best performance, ensure the index columns don't contain duplicate
values.
BigQuery tables are an unordered, unindexed data source. To add support
pandas-compatibility, the following indexing options are supported:

* (Default behavior) Add an arbitrary sequential index and ordering
using an an analytic windowed operation that prevents filtering
push down.
* (Recommended) Set the ``index_col`` argument to one or more columns.
Unique values for the row labels are recommended. Duplicate labels
are possible, but note that joins on a non-unique index can duplicate
rows and operations like ``cumsum()`` that window across a non-unique
index can have some non-deternimism.

.. note::
By default, even SQL query inputs with an ORDER BY clause create a
Expand Down