Skip to content

Commit a32b747

Browse files
authored
chore: sync latest changes from internal repo (#7)
docs: highlight bigframes is open-source docs: correct the return types of Dataframe and Series docs: create subfolders for notebooks feat: add `bigframes.get_global_session()` and `bigframes.reset_session()` aliases chore: mark ml.llm tests flaky chore: make kokoro/build.sh executable feat: add `Series.str` methods `isalpha`, `isdigit`, `isdecimal`, `isalnum`, `isspace`, `islower`, `isupper`, `zfill`, `center` chore: pin max pytest-retry plugin version in tests docs: sample ML Drug Name Generation notebook docs: add samples and best practices to `read_gbq` docs chore: fix Python download path in docs-presubmit tests perf: add local cache for `__repr_*__` methods feat: support `DataFrame.pivot` fix: don't use query cache for Session construction feat: add `bigframes.pandas.read_pickle` function feat: support MultiIndex for DataFrame columns chore: change the docs kokoro setup to Gerrit path docs: transform remote function user guide into sample code fix: raise exception for invalid function in `read_gbq_function` docs: add release status to table of contents feat: add `fit_transform` to `bigquery.ml` transformers feat: use `pandas.Index` for column labels docs: add ML section under Overview fix: check that types are specified in `read_gbq_function` fix: add error message to `set_index`
1 parent 76f4daa commit a32b747

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+5909
-3807
lines changed

.kokoro/build.sh

100644100755
File mode changed.

.kokoro/docker/docs/Dockerfile

+4-9
Original file line numberDiff line numberDiff line change
@@ -60,19 +60,16 @@ RUN apt-get update \
6060
&& rm -rf /var/lib/apt/lists/* \
6161
&& rm -f /var/cache/apt/archives/*.deb
6262

63-
###################### Install python 3.9.13 and 3.10.5
63+
###################### Install python 3.9.13
6464

65-
# Download python 3.9.13 and 3.10.5
65+
# Download python 3.9.13
6666
RUN wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tgz
67-
RUN wget https://www.python.org/ftp/python/3.9.13/Python-3.10.5.tgz
6867

6968
# Extract files
7069
RUN tar -xvf Python-3.9.13.tgz
71-
RUN tar -xvf Python-3.10.5.tgz
7270

73-
# Install python 3.9.13 and 3.10.5
71+
# Install python 3.9.13
7472
RUN ./Python-3.9.13/configure --enable-optimizations
75-
RUN ./Python-3.10.5/configure --enable-optimizations
7673
RUN make altinstall
7774

7875
###################### Install pip
@@ -82,7 +79,5 @@ RUN wget -O /tmp/get-pip.py 'https://bootstrap.pypa.io/get-pip.py' \
8279

8380
# Test pip
8481
RUN python3 -m pip
85-
RUN python3.9 -m pip
86-
RUN python3.10 -m pip
8782

88-
CMD ["python3.10"]
83+
CMD ["python3.9"]

.kokoro/docs/common.cfg

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ action {
1111
gfile_resources: "/bigstore/cloud-devrel-kokoro-resources/trampoline"
1212

1313
# Use the trampoline script to run in docker.
14-
build_file: "python-bigquery-dataframes/.kokoro/trampoline_v2.sh"
14+
build_file: "bigframes/.kokoro/trampoline_v2.sh"
1515

1616
# Configure the docker image for kokoro-trampoline.
1717
env_vars: {
@@ -20,7 +20,7 @@ env_vars: {
2020
}
2121
env_vars: {
2222
key: "TRAMPOLINE_BUILD_FILE"
23-
value: "github/python-bigquery-dataframes/.kokoro/publish-docs.sh"
23+
value: "git/bigframes/.kokoro/publish-docs.sh"
2424
}
2525

2626
env_vars: {

.kokoro/docs/docs-presubmit.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ env_vars: {
1313

1414
env_vars: {
1515
key: "TRAMPOLINE_BUILD_FILE"
16-
value: "github/python-bigquery-dataframes/.kokoro/build.sh"
16+
value: ".kokoro/build.sh"
1717
}
1818

1919
# Only run this nox session.

README.rst

+125-1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ powered by the BigQuery engine.
77
* ``bigframes.pandas`` provides a pandas-compatible API for analytics.
88
* ``bigframes.ml`` provides a scikit-learn-like API for ML.
99

10+
BigQuery DataFrames is an open-source package. You can run
11+
``pip install --upgrade bigframes`` to install the latest version.
12+
1013
Documentation
1114
-------------
1215

@@ -65,6 +68,127 @@ querying is not in the US multi-region. If you try to read a table from another
6568
location, you get a NotFound exception.
6669

6770

71+
ML Capabilities
72+
---------------
73+
74+
The ML capabilities in BigQuery DataFrames let you preprocess data, and
75+
then train models on that data. You can also chain these actions together to
76+
create data pipelines.
77+
78+
Preprocess data
79+
^^^^^^^^^^^^^^^^^^^^^^^^
80+
81+
Create transformers to prepare data for use in estimators (models) by
82+
using the
83+
`bigframes.ml.preprocessing module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.preprocessing>`_
84+
and the `bigframes.ml.compose module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.compose>`_.
85+
BigQuery DataFrames offers the following transformations:
86+
87+
* Use the `OneHotEncoder class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.preprocessing.OneHotEncoder>`_
88+
in the ``bigframes.ml.preprocessing`` module to transform categorical values into numeric format.
89+
* Use the `StandardScaler class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.preprocessing.StandardScaler>`_
90+
in the ``bigframes.ml.preprocessing`` module to standardize features by removing the mean and scaling to unit variance.
91+
* Use the `ColumnTransformer class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.compose.ColumnTransformer>`_
92+
in the ``bigframes.ml.compose`` module to apply transformers to DataFrames columns.
93+
94+
95+
Train models
96+
^^^^^^^^^^^^
97+
98+
Create estimators to train models in BigQuery DataFrames.
99+
100+
**Clustering models**
101+
102+
Create estimators for clustering models by using the
103+
`bigframes.ml.cluster module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.cluster>`_.
104+
105+
* Use the `KMeans class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.cluster.KMeans>`_
106+
to create K-means clustering models. Use these models for
107+
data segmentation. For example, identifying customer segments. K-means is an
108+
unsupervised learning technique, so model training doesn't require labels or split
109+
data for training or evaluation.
110+
111+
**Decomposition models**
112+
113+
Create estimators for decomposition models by using the `bigframes.ml.decomposition module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.decomposition>`_.
114+
115+
* Use the `PCA class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.decomposition.PCA>`_
116+
to create principal component analysis (PCA) models. Use these
117+
models for computing principal components and using them to perform a change of
118+
basis on the data. This provides dimensionality reduction by projecting each data
119+
point onto only the first few principal components to obtain lower-dimensional
120+
data while preserving as much of the data's variation as possible.
121+
122+
123+
**Ensemble models**
124+
125+
Create estimators for ensemble models by using the `bigframes.ml.ensemble module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble>`_.
126+
127+
* Use the `RandomForestClassifier class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble.RandomForestClassifier>`_
128+
to create random forest classifier models. Use these models for constructing multiple
129+
learning method decision trees for classification.
130+
* Use the `RandomForestRegressor class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble.RandomForestRegressor>`_
131+
to create random forest regression models. Use
132+
these models for constructing multiple learning method decision trees for regression.
133+
* Use the `XGBClassifier class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble.XGBClassifier>`_
134+
to create gradient boosted tree classifier models. Use these models for additively
135+
constructing multiple learning method decision trees for classification.
136+
* Use the `XGBRegressor class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.ensemble.XGBRegressor>`_
137+
to create gradient boosted tree regression models. Use these models for additively
138+
constructing multiple learning method decision trees for regression.
139+
140+
141+
**Forecasting models**
142+
143+
Create estimators for forecasting models by using the `bigframes.ml.forecasting module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.forecasting>`_.
144+
145+
* Use the `ARIMAPlus class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.forecasting.ARIMAPlus>`_
146+
to create time series forecasting models.
147+
148+
**Imported models**
149+
150+
Create estimators for imported models by using the `bigframes.ml.imported module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.imported>`_.
151+
152+
* Use the `ONNXModel class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.imported.ONNXModel>`_
153+
to import Open Neural Network Exchange (ONNX) models.
154+
* Use the `TensorFlowModel class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.imported.TensorFlowModel>`_
155+
to import TensorFlow models.
156+
157+
**Linear models**
158+
159+
Create estimators for linear models by using the `bigframes.ml.linear_model module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.linear_model>`_.
160+
161+
* Use the `LinearRegression class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.linear_model.LinearRegression>`_
162+
to create linear regression models. Use these models for forecasting. For example,
163+
forecasting the sales of an item on a given day.
164+
* Use the `LogisticRegression class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.linear_model.LogisticRegression>`_
165+
to create logistic regression models. Use these models for the classification of two
166+
or more possible values such as whether an input is ``low-value``, ``medium-value``,
167+
or ``high-value``.
168+
169+
**Large language models**
170+
171+
Create estimators for LLMs by using the `bigframes.ml.llm module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm>`_.
172+
173+
* Use the `PaLM2TextGenerator class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.PaLM2TextGenerator>`_ to create PaLM2 text generator models. Use these models
174+
for text generation tasks.
175+
* Use the `PaLM2TextEmbeddingGenerator class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.PaLM2TextEmbeddingGenerator>`_ to create PaLM2 text embedding generator models.
176+
Use these models for text embedding generation tasks.
177+
178+
179+
Create pipelines
180+
^^^^^^^^^^^^^^^^
181+
182+
Create ML pipelines by using
183+
`bigframes.ml.pipeline module <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.pipeline>`_.
184+
Pipelines let you assemble several ML steps to be cross-validated together while setting
185+
different parameters. This simplifies your code, and allows you to deploy data preprocessing
186+
steps and an estimator together.
187+
188+
* Use the `Pipeline class <https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.pipeline.Pipeline>`_
189+
to create a pipeline of transforms with a final estimator.
190+
191+
68192
ML locations
69193
------------
70194

@@ -181,7 +305,7 @@ following IAM roles:
181305

182306

183307
Quotas and limits
184-
-----------------
308+
------------------
185309

186310
`BigQuery quotas <https://cloud.google.com/bigquery/quotas>`_
187311
including hardware, software, and network components.

bigframes/__init__.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,16 @@
1616

1717
from bigframes._config import options
1818
from bigframes._config.bigquery_options import BigQueryOptions
19+
from bigframes.core.global_session import get_global_session, reset_session
1920
from bigframes.session import connect, Session
2021
from bigframes.version import __version__
2122

2223
__all__ = [
24+
"options",
2325
"BigQueryOptions",
26+
"get_global_session",
27+
"reset_session",
2428
"connect",
25-
"options",
2629
"Session",
2730
"__version__",
2831
]

0 commit comments

Comments
 (0)