本教程介绍了如何将 AI.FORECAST
函数与 BigQuery ML 的内置 TimesFM 单变量模型搭配使用,根据给定列的历史值来预测该列的未来值。
本教程使用来自公开的 bigquery-public-data.san_francisco_bikeshare.bikeshare_trips
表中的数据。
目标
本教程将引导您将 AI.FORECAST 函数与内置的 TimesFM 模型结合使用,以预测共享单车行程。前两个部分介绍了如何预测单个时间序列的结果并直观呈现结果。第三部分介绍了如何预测多个时间序列。
费用
本教程使用 Google Cloud的可计费组件,包括以下组件:
- BigQuery
- BigQuery ML
如需详细了解 BigQuery 费用,请参阅 BigQuery 价格页面。
如需详细了解 BigQuery ML 费用,请参阅 BigQuery ML 价格。
准备工作
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- 新项目会自动启用 BigQuery。如需在现有项目中启用 BigQuery,请前往
Enable the BigQuery API.
预测单个共享单车行程时间序列
使用 AI.FORECAST
函数预测未来的时序值。
以下查询根据过去四个月的历史数据,预测下个月(约 720 小时)每小时的订阅者自行车共享行程数。confidence_level
参数指示查询会生成置信度为 95% 的预测区间。
请按照以下步骤使用 TimesFM 模型预测数据:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,粘贴以下查询,然后点击运行:
SELECT * FROM AI.FORECAST( ( SELECT TIMESTAMP_TRUNC(start_date, HOUR) as trip_hour, COUNT(*) as num_trips FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips` WHERE subscriber_type = 'Subscriber' AND start_date >= TIMESTAMP('2018-01-01') GROUP BY TIMESTAMP_TRUNC(start_date, HOUR) ), horizon => 720, confidence_level => 0.95, timestamp_col => 'trip_hour', data_col => 'num_trips');
结果类似于以下内容:
+-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+ | forecast_timestamp | forecast_value | confidence_level | prediction_interval_lower_bound | prediction_interval_upper_bound | ai_forecast_status | +-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+ | 2018-05-01 00:00:00 UTC | 26.3045959... | 0.95 | 21.7088378... | 30.9003540... | | +-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+ | 2018-05-01 01:00:00 UTC | 34.0890502... | 0.95 | 2.47682913... | 65.7012714... | | +-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+ | 2018-05-01 02:00:00 UTC | 24.2154693... | 0.95 | 2.87621605... | 45.5547226... | | +-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+ | ... | ... | ... | ... | ... | | +-------------------------+-------------------+------------------+---------------------------------+---------------------------------+--------------------+
将预测数据与输入数据进行比较
将 AI.FORECAST
函数输出与函数输入数据的一部分绘制到图表中,以查看它们之间的对比情况。
请按照以下步骤绘制函数输出图表:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,粘贴以下查询,然后点击运行:
WITH historical AS ( SELECT TIMESTAMP_TRUNC(start_date, HOUR) as trip_hour, COUNT(*) as num_trips FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips` WHERE subscriber_type = 'Subscriber' AND start_date >= TIMESTAMP('2018-01-01') GROUP BY TIMESTAMP_TRUNC(start_date, HOUR) ORDER BY TIMESTAMP_TRUNC(start_date, HOUR) ) SELECT * FROM ( (SELECT trip_hour as date, num_trips AS historical_value, NULL as forecast_value, 'historical' as type, NULL as prediction_interval_low, NULL as prediction_interval_upper_bound FROM historical ORDER BY historical.trip_hour DESC LIMIT 400) UNION ALL (SELECT forecast_timestamp AS date, NULL as historical_value, forecast_value as forecast_value, 'forecast' as type, prediction_interval_lower_bound, prediction_interval_upper_bound FROM AI.FORECAST( ( SELECT * FROM historical ), horizon => 720, confidence_level => 0.99, timestamp_col => 'trip_hour', data_col => 'num_trips'))) ORDER BY date asc;
查询运行完毕后,点击查询结果窗格中的图表标签页。生成的图表如下所示:
您可以看到,输入数据和预测数据显示了类似的共享单车使用情况。您还可以看到,随着预测时间点向未来延伸,预测区间的下限和上限会增大。
预测多个共享单车行程时序
以下查询根据过去四个月的历史数据,预测下个月(约 720 小时)每种订阅者类型的每小时骑行共享行程数。confidence_level
参数指示查询会生成置信度为 95% 的预测区间。
请按照以下步骤使用 TimesFM 模型预测数据:
在 Google Cloud 控制台中,前往 BigQuery 页面。
在查询编辑器中,粘贴以下查询,然后点击运行:
SELECT * FROM AI.FORECAST( ( SELECT TIMESTAMP_TRUNC(start_date, HOUR) as trip_hour, subscriber_type, COUNT(*) as num_trips FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips` WHERE start_date >= TIMESTAMP('2018-01-01') GROUP BY TIMESTAMP_TRUNC(start_date, HOUR), subscriber_type ), horizon => 720, confidence_level => 0.95, timestamp_col => 'trip_hour', data_col => 'num_trips', id_cols => ['subscriber_type']);
结果类似于以下内容:
+---------------------+--------------------------+------------------+------------------+---------------------------------+---------------------------------+--------------------+ | subscriber_type | forecast_timestamp | forecast_value | confidence_level | prediction_interval_lower_bound | prediction_interval_upper_bound | ai_forecast_status | +---------------------+--------------------------+------------------+------------------+---------------------------------+---------------------------------+--------------------+ | Subscriber | 2018-05-01 00:00:00 UTC | 26.3045959... | 0.95 | 21.7088378... | 30.9003540... | | +---------------------+--------------------------+------------------+------------------+---------------------------------+---------------------------------+--------------------+ | Subscriber | 2018-05-01 01:00:00 UTC | 34.0890502... | 0.95 | 2.47682913... | 65.7012714... | | +---------------------+-------------------+------------------+-------------------------+---------------------------------+---------------------------------+--------------------+ | Subscriber | 2018-05-01 02:00:00 UTC | 24.2154693... | 0.95 | 2.87621605... | 45.5547226... | | +---------------------+--------------------------+------------------+------------------+---------------------------------+---------------------------------+--------------------+ | ... | ... | ... | ... | ... | ... | | +---------------------+--------------------------+------------------+------------------+---------------------------------+---------------------------------+--------------------+
清理
为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用,请删除包含这些资源的项目,或者保留项目但删除各个资源。
删除项目
要删除项目,请执行以下操作:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
后续步骤
- 如需大致了解 BigQuery ML,请参阅 BigQuery 中的 AI 和机器学习简介。