blob: ca98933e4b45159d6d3470d618342a3fc21953ce [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Bot Details
2
Kenneth Russell9618adde2018-05-03 03:16:053This page describes in detail how the GPU bots are set up, which files affect
Kai Ninomiyaa6429fb32018-03-30 01:30:564their configuration, and how to both modify their behavior and add new bots.
5
6[TOC]
7
8## Overview of the GPU bots' setup
9
10Chromium's GPU bots, compared to the majority of the project's test machines,
11are physical pieces of hardware. When end users run the Chrome browser, they
12are almost surely running it on a physical piece of hardware with a real
13graphics processor. There are some portions of the code base which simply can
14not be exercised by running the browser in a virtual machine, or on a software
15implementation of the underlying graphics libraries. The GPU bots were
16developed and deployed in order to cover these code paths, and avoid
17regressions that are otherwise inevitable in a project the size of the Chromium
18browser.
19
20The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]
21waterfalls, and various tryservers, as described in [Using the GPU Bots].
22
Kenneth Russell9618adde2018-05-03 03:16:0523[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
24[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Kai Ninomiyaa6429fb32018-03-30 01:30:5625[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots
26
Kenneth Russell9618adde2018-05-03 03:16:0527All of the physical hardware for the bots lives in the Swarming pool, and most
John Budorickb2ff2242019-11-14 17:35:5928of it in the chromium.tests.gpu Swarming pool. The waterfall bots are simply
29virtual machines which spawn Swarming tasks with the appropriate tags to get
30them to run on the desired GPU and operating system type. So, for example, the
31[Win10 x64 Release (NVIDIA)] bot is actually a virtual machine which spawns all
32of its jobs with the Swarming parameters:
Kai Ninomiyaa6429fb32018-03-30 01:30:5633
Takuto Ikuta4fd6b4792019-08-19 21:37:3134[Win10 x64 Release (NVIDIA)]: https://ci.chromium.org/p/chromium/builders/ci/Win10%20x64%20Release%20%28NVIDIA%29
Kai Ninomiyaa6429fb32018-03-30 01:30:5635
36```json
37{
Yuly Novikov8e92b172020-02-07 17:40:1238 "gpu": "nvidia-quadro-p400-win10-stable",
Kai Ninomiyaa6429fb32018-03-30 01:30:5639 "os": "Windows-10",
John Budorickb2ff2242019-11-14 17:35:5940 "pool": "chromium.tests.gpu"
Kai Ninomiyaa6429fb32018-03-30 01:30:5641}
42```
43
44Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient
45to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows
467-like NVIDIA bots in the pool, which necessitates the OS specifier.)
47
48Details about the bots can be found on [chromium-swarm.appspot.com] and by
Takuto Ikuta2d01a492021-06-04 00:28:5849using `src/tools/luci-go/swarming`, for example `swarming bots`.
Kai Ninomiyaa6429fb32018-03-30 01:30:5650If you are authenticated with @google.com credentials you will be able to make
51queries of the bots and see, for example, which GPUs are available.
52
53[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/
54
55The waterfall bots run tests on a single GPU type in order to make it easier to
56see regressions or flakiness that affect only a certain type of GPU.
Kai Ninomiyaa6429fb32018-03-30 01:30:5657
Yuly Novikov8e92b172020-02-07 17:40:1258The tryservers like `win10_chromium_x64_rel_ng` which include GPU tests, on the other
Kai Ninomiyaa6429fb32018-03-30 01:30:5659hand, run tests on more than one GPU type. As of this writing, the Windows
60tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
61Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
62by *mirroring* how one or more waterfall bots work. This is an inherent
63property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate
64differences in behavior between the tryservers and waterfall bots. Since the
65tryservers mirror waterfall bots, if the waterfall bot is working, the
66tryserver must almost inherently be working as well.
67
John Palmer046f9872021-05-24 01:24:5668[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipes/chromium_trybot.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5669
Yuly Novikov8e92b172020-02-07 17:40:1270There are some GPU configurations on the waterfall backed by only one machine,
71or a very small number of machines in the Swarming pool. A few examples are:
Kai Ninomiyaa6429fb32018-03-30 01:30:5672
73<!-- XXX: update this list -->
Yves Gereya702f6222019-01-24 11:07:3074* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20%28AMD%29)
Yves Gereya702f6222019-01-24 11:07:3075* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/)
Kai Ninomiyaa6429fb32018-03-30 01:30:5676
77There are a couple of reasons to continue to support running tests on a
78specific machine: it might be too expensive to deploy the required multiple
79copies of said hardware, or the configuration might not be reliable enough to
80begin scaling it up.
81
82## Adding a new isolated test to the bots
83
84Adding a new test step to the bots requires that the test run via an isolate.
85Isolates describe both the binary and data dependencies of an executable, and
Yuly Novikov8e92b172020-02-07 17:40:1286are the underpinning of how the Swarming system works. See the [LUCI] documentation for
87background on [Isolates] and [Swarming].
Kai Ninomiyaa6429fb32018-03-30 01:30:5688
Yuly Novikov8e92b172020-02-07 17:40:1289[LUCI]: https://github.com/luci/luci-py
90[Isolates]: https://github.com/luci/luci-py/blob/master/appengine/isolate/doc/README.md
91[Swarming]: https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:5692
93### Adding a new isolate
94
951. Define your target using the `template("test")` template in
Takuto Ikutaf5333252019-11-06 16:07:0896 [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
Kai Ninomiyaa6429fb32018-03-30 01:30:5697 [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
98 example which invokes a series of scripts which finally launches the
Yuly Novikov8e92b172020-02-07 17:40:1299 browser, see `telemetry_gpu_integration_test` in [`chrome/test/BUILD.gn`][chrome/test/BUILD.gn].
Kai Ninomiyaa6429fb32018-03-30 01:30:561002. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
101 your target. Find a similar target to yours in order to determine the
Yuly Novikov8e92b172020-02-07 17:40:12102 `type`. The type is referenced in [`src/tools/mb/mb.py`][mb.py].
Kai Ninomiyaa6429fb32018-03-30 01:30:56103
John Palmer046f9872021-05-24 01:24:56104[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/main/testing/test.gni
105[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/main/gpu/BUILD.gn
106[chrome/test/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/main/chrome/test/BUILD.gn
107[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/gn_isolate_map.pyl
108[mb.py]: https://chromium.googlesource.com/chromium/src/+/main/tools/mb/mb.py
Kai Ninomiyaa6429fb32018-03-30 01:30:56109
110At this point you can build and upload your isolate to the isolate server.
111
112See [Isolated Testing for SWEs] for the most up-to-date instructions. These
113instructions are a copy which show how to run an isolate that's been uploaded
114to the isolate server on your local machine rather than on Swarming.
115
116[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes
117
118If `cd`'d into `src/`:
119
1201. `./tools/mb/mb.py isolate //out/Release [target name]`
121 * For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`
Junji Watanabe160300022021-09-27 03:09:531221. `./tools/luci-go/isolate batcharchive -cas-instance chromium-swarm out/Release/[target name].isolated.gen.json`
123 * For example: `./tools/luci-go/isolate batcharchive -cas-instance chromium-swarm out/Release/angle_end2end_tests.isolated.gen.json`
Kai Ninomiyaa6429fb32018-03-30 01:30:56124See the section below on [isolate server credentials](#Isolate-server-credentials).
125
126### Adding your new isolate to the tests that are run on the bots
127
128See [Adding new steps to the GPU bots] for details on this process.
129
130[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots
131
132## Relevant files that control the operation of the GPU bots
133
Yuly Novikov8e92b172020-02-07 17:40:12134In the [`tools/build`][tools/build] workspace:
Kai Ninomiyaa6429fb32018-03-30 01:30:56135
Yuly Novikov55b23a62020-10-02 18:23:43136* `recipes/recipe_modules/chromium_tests/`:
Yuly Novikov8e92b172020-02-07 17:40:12137 * [`chromium_gpu.py`][chromium_gpu.py] and
138 [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] define the following for
Kai Ninomiyaa6429fb32018-03-30 01:30:56139 each builder and tester:
140 * How the workspace is checked out (e.g., this is where top-of-tree
141 ANGLE is specified)
142 * The build configuration (e.g., this is where 32-bit vs. 64-bit is
143 specified)
144 * Various gclient defines (like compiling in the hardware-accelerated
145 video codecs, and enabling compilation of certain tests, like the
146 dEQP tests, that can't be built on all of the Chromium builders)
147 * Note that the GN configuration of the bots is also controlled by
Yuly Novikov8e92b172020-02-07 17:40:12148 [`mb_config.pyl`][mb_config.pyl] in the Chromium workspace; see below.
149 * [`trybots.py`][trybots.py] defines how try bots *mirror* one or more
Kai Ninomiyaa6429fb32018-03-30 01:30:56150 waterfall bots.
151 * The concept of try bots mirroring waterfall bots ensures there are
152 no differences in behavior between the waterfall bots and the try
153 bots. This helps ensure that a CL will not pass the commit queue
154 and then break on the waterfall.
155 * This file defines the behavior of the following GPU-related try
156 bots:
Yuly Novikov8e92b172020-02-07 17:40:12157 * `linux-rel`, `mac-rel`, `win10_chromium_x64_rel_ng` and
158 `android-marshmallow-arm64-rel`, which run against every
Stephen Martinis089f5f02019-02-12 02:42:24159 Chromium CL, and which mirror the behavior of bots on the
160 chromium.gpu waterfall.
Kai Ninomiyaa6429fb32018-03-30 01:30:56161 * The ANGLE try bots, which run against ANGLE CLs, and mirror the
162 behavior of the chromium.gpu.fyi waterfall (including using
163 top-of-tree ANGLE, and running additional tests not run by the
164 regular Chromium try bots)
Yuly Novikov8e92b172020-02-07 17:40:12165 * The optional GPU try servers `linux_optional_gpu_tests_rel`,
166 `mac_optional_gpu_tests_rel`, `win_optional_gpu_tests_rel` and
167 `android_optional_gpu_tests_rel`, which are added automatically
168 to CLs which modify a selected set of subdirectories and
169 run some tests which can't be run on the regular Chromium try
170 servers mainly due to lack of hardware capacity.
171 * Manual GPU trybots, starting with `gpu-try-` and `gpu-fyi-try-`
172 prefixes, which can be added manually to CLs targeting a
173 specific hardware configuration.
Kai Ninomiyaa6429fb32018-03-30 01:30:56174
Yuly Novikov8e92b172020-02-07 17:40:12175[tools/build]: https://chromium.googlesource.com/chromium/tools/build/
John Palmer046f9872021-05-24 01:24:56176[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/builders/chromium_gpu.py
177[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py
178[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/trybots.py
Kai Ninomiyaa6429fb32018-03-30 01:30:56179
Yuly Novikov8e92b172020-02-07 17:40:12180In the [`chromium/src`][chromium/src] workspace:
Kai Ninomiyaa6429fb32018-03-30 01:30:56181
Yuly Novikov8e92b172020-02-07 17:40:12182* [`src/testing/buildbot`][src/testing/buildbot]:
183 * [`chromium.gpu.json`][chromium.gpu.json] and
184 [`chromium.gpu.fyi.json`][chromium.gpu.fyi.json] define which steps are
185 run on which bots. These files are autogenerated. Don't modify them
186 directly!
187 * [`waterfalls.pyl`][waterfalls.pyl],
188 [`test_suites.pyl`][test_suites.pyl], [`mixins.pyl`][mixins.pyl] and
189 [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] define the
190 confugation for the autogenerated json files above.
191 Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
192 generate the json files after you modify these pyl files.
193 * [`generate_buildbot_json.py`][generate_buildbot_json.py]
194 * The generator script for all the waterfalls, including
195 `chromium.gpu.json` and `chromium.gpu.fyi.json`.
196 * See the [README for generate_buildbot_json.py] for documentation
197 on this script and the descriptions of the waterfalls and test
198 suites.
199 * When modifying this script, don't forget to also run it, to
200 regenerate the JSON files. Don't worry; the presubmit step will
201 catch this if you forget.
202 * See [Adding new steps to the GPU bots] for more details.
203 * [`gn_isolate_map.pyl`][gn_isolate_map.pyl] defines all of the isolates'
204 behavior in the GN build.
Kai Ninomiyaa6429fb32018-03-30 01:30:56205* [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
206 * Defines the GN arguments for all of the bots.
Yuly Novikov8e92b172020-02-07 17:40:12207* [`src/infra/config`][src/infra/config]:
208 * Definitions of how bots are organized on the waterfall,
209 how builds are triggered, which VMs or machines are used for the
210 builder itself, i.e. for compilation and scheduling swarmed tasks
Takuto Ikuta2d01a492021-06-04 00:28:58211 on GPU hardware. See
John Palmer046f9872021-05-24 01:24:56212 [README.md](https://chromium.googlesource.com/chromium/src/+/main/infra/config/README.md)
Yuly Novikov8e92b172020-02-07 17:40:12213 in this directory for up to date information.
Kai Ninomiyaa6429fb32018-03-30 01:30:56214
Yuly Novikov8e92b172020-02-07 17:40:12215[chromium/src]: https://chromium.googlesource.com/chromium/src/
John Palmer046f9872021-05-24 01:24:56216[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot
217[src/infra/config]: https://chromium.googlesource.com/chromium/src/+/main/infra/config
218[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.json
219[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.fyi.json
220[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/gn_isolate_map.pyl
221[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/main/tools/mb/mb_config.pyl
222[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/generate_buildbot_json.py
223[mixins.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/mixins.pyl
224[waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/waterfalls.pyl
225[test_suites.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/test_suites.pyl
226[test_suite_exceptions.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/test_suite_exceptions.pyl
Kenneth Russell8a386d42018-06-02 09:48:01227[README for generate_buildbot_json.py]: ../../testing/buildbot/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:56228
Yuly Novikov8e92b172020-02-07 17:40:12229In the [`infradata/config`][infradata/config] workspace (Google internal only,
230sorry):
Kai Ninomiyaa6429fb32018-03-30 01:30:56231
Yuly Novikov8e92b172020-02-07 17:40:12232* [`gpu.star`][gpu.star]
233 * Defines a `chromium.tests.gpu` Swarming pool which contains all of the
234 specialized hardware, except some hardware shared with Chromium:
235 for example, the Windows and Linux NVIDIA
Kai Ninomiyaa6429fb32018-03-30 01:30:56236 bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
237 GPUs. New GPU hardware should be added to this pool.
Yuly Novikov8e92b172020-02-07 17:40:12238 * Also defines the GCEs, Mac VMs and Mac machines used for CI builders
239 on GPU and GPU.FYI waterfalls and trybots.
Yuly Novikov8e92b172020-02-07 17:40:12240* [`pools.cfg`][pools.cfg]
241 * Defines the Swarming pools for GCEs and Mac VMs used for manually
242 triggered trybots.
Kai Ninomiyaa6429fb32018-03-30 01:30:56243
244[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
John Palmer046f9872021-05-24 01:24:56245[gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/starlark/bots/chromium/gpu.star
246[chromium.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/starlark/bots/chromium/chromium.star
247[pools.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/pools.cfg
248[main.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/main.star
249[vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/gce-provider/vms.cfg
Kai Ninomiyaa6429fb32018-03-30 01:30:56250
251## Walkthroughs of various maintenance scenarios
252
253This section describes various common scenarios that might arise when
254maintaining the GPU bots, and how they'd be addressed.
255
256### How to add a new test or an entire new step to the bots
257
258This is described in [Adding new tests to the GPU bots].
259
John Palmer046f9872021-05-24 01:24:56260[Adding new tests to the GPU bots]: https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing.md#Adding-New-Tests-to-the-GPU-Bots
Kai Ninomiyaa6429fb32018-03-30 01:30:56261
Jamie Madillf71bf712019-01-09 14:41:21262### How to set up new virtual machine instances
263
264The tests use virtual machines to build binaries and to trigger tests on
Yuly Novikov8e92b172020-02-07 17:40:12265physical hardware. VMs don't run any tests themselves. There are 3 types of
266bots:
Jamie Madillf71bf712019-01-09 14:41:21267
Yuly Novikov8e92b172020-02-07 17:40:12268* Builders - these bots build test binaries, upload them to storage and trigger
269 tester bots (see below). Builds must be done on the same OS on which the
270 tests will run, except for Android tests, which are built on Linux.
271* Testers - these bots trigger tests to execute in Swarming and merge results
272 from multiple shards. 2-core Linux GCEs are sufficient for this task.
273* Builder/testers - these are the combination of the above and have same OS
274 constraints as builders. All trybots are of this type, while for CI bots
275 it is optional.
Jamie Madillf71bf712019-01-09 14:41:21276
Yuly Novikov8e92b172020-02-07 17:40:12277The process is:
Jamie Madillf71bf712019-01-09 14:41:21278
Yuly Novikov8e92b172020-02-07 17:40:122791. Follow [go/request-chrome-resources](go/request-chrome-resources) to get
280 approval for the VMs. Use `GPU` project resource group.
281 See this [example ticket](http://crbug.com/1012805).
282 You'll need to determine how many VMs are required, which OSes, how many
283 cores and in which swarming pools they will be (see below for different
284 scenarios).
285 * If setting up a new GPU hardware pool, some VMs will also be needed
286 for manual trybots, usually 2 VMs as of this writing.
287 * Additional action is needed for Mac VMs, the GPU resource owner will
288 assign the bug to Labs to deploy them. See this
289 [example ticket](http://crbug.com/964355).
2901. Once GCE resource request is approved / Mac VMs are deployed, the VMs need
291 to be added to the right Swarming pools in a CL in the
292 [`infradata/config`][infradata/config] (Google internal) workspace.
293 1. GCEs for Windows CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43294 `luci-chromium-gpu-ci-win10-8` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12295 1. GCEs for Linux and Android CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43296 `luci-chromium-gpu-ci-xenial-8` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12297 1. VMs for Mac CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43298 `builderfull_gpu_ci_bots` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12299 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1166889).
300 1. GCEs for CI testers for all OSes should be added to
Yuly Novikov55b23a62020-10-02 18:23:43301 `luci-chromium-gpu-ci-xenial-2` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12302 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2016410).
303 1. GCEs and VMs for CQ and optional CQ GPU trybots for should be added to
304 a corresponding `gpu_try_bots` group in [`gpu.star`][gpu.star].
305 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1561384).
306 These trybots are "builderful", i.e. these GCEs can't be shared among
307 different bots. This is done in order to limit the number of concurrent
308 builds on these bots (until [crbug.com/949379](crbug.com/949379) is
309 fixed) to prevent oversubscribing GPU hardware.
310 `win_optional_gpu_tests_rel` is an exception, its GCEs come from
311 `luci-chromium-try-win10-*-8` groups in
312 [`chromium.star`][chromium.star], see
313 [CL](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1708723).
314 This can cause oversubscription to Windows GPU hardware, however,
315 Chrome Infra insisted on making this bot builderless due to frequent
316 interruptions they get from limiting the number of concurrent builds on
317 it, see discussion in
318 [CL](https://chromium-review.googlesource.com/c/chromium/src/+/1775098).
319 1. GCEs and VMs for manual GPU trybots should be added to a corresponding
320 pool in "Manually-triggered GPU trybots" in [`gpu.star`][gpu.star].
321 If adding a new pool, it should also be added to
322 [`pools.cfg`][pools.cfg].
323 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2433332).
324 This is a different mechanism to limit the load on GPU hardware,
325 by having a small pool of GCEs which corresponds to some GPU hardware
326 resource, and all trybots that target this GPU hardware compete for
327 GCEs from this small pool.
328 1. Run [`main.star`][main.star] to regenerate
329 `configs/chromium-swarm/bots.cfg` and `configs/gce-provider/vms.cfg`.
Takuto Ikuta2d01a492021-06-04 00:28:58330 Double-check your work there.
Yuly Novikov8e92b172020-02-07 17:40:12331 Note that previously [`vms.cfg`][vms.cfg] had to be edited manually.
332 Part of the difficulty was in choosing a zone. This should soon no
333 longer be necessary per [crbug.com/942301](http://crbug.com/942301),
334 but consult with the Chrome Infra team to find out which of the
335 [zones](https://cloud.google.com/compute/docs/regions-zones/) has
Yuly Novikov55b23a62020-10-02 18:23:43336 available capacity. This also can be checked on viceroy
337 [dashboard](https://viceroy.corp.google.com/chrome_infra/Quota/chrome?duration=7d).
Yuly Novikov8e92b172020-02-07 17:40:12338 1. Get this reviewed and landed. This step associates the VM or pool of VMs
339 with the bot's name on the waterfall for "builderful" bots or increases
Takuto Ikuta2d01a492021-06-04 00:28:58340 swarmed pool capacity for "builderless" bots.
Yuly Novikov8e92b172020-02-07 17:40:12341 Note: CR+1 is not sticky in this repo, so you'll have to ping for
342 re-review after every change, like rebase.
Jamie Madillf71bf712019-01-09 14:41:21343
Kenneth Russell3a8e5c022018-05-04 21:14:49344### How to add a new tester bot to the chromium.gpu.fyi waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56345
346When deploying a new GPU configuration, it should be added to the
347chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved
348for those GPUs which are tested on the commit queue. (Some of the bots violate
349this rule – namely, the Debug bots – though we should strive to eliminate these
350differences.) Once the new configuration is ready to be fully deployed on
351tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers
352changed to mirror them.
353
354In order to add Release and Debug waterfall bots for a new configuration,
355experience has shown that at least 4 physical machines are needed in the
356swarming pool. The reason is that the tests all run in parallel on the Swarming
357cluster, so the load induced on the swarming bots is higher than it would be
Kenneth Russell9618adde2018-05-03 03:16:05358if the tests were run strictly serially.
Kai Ninomiyaa6429fb32018-03-30 01:30:56359
Kenneth Russell9618adde2018-05-03 03:16:05360With these prerequisites, these are the steps to add a new (swarmed) tester bot.
361(Actually, pair of bots -- Release and Debug. If deploying just one or the
362other, ignore the other configuration.) These instructions assume that you are
363reusing one of the existing builders, like [`GPU FYI Win Builder`][GPU FYI Win
364Builder].
Kai Ninomiyaa6429fb32018-03-30 01:30:56365
3661. Work with the Chrome Infrastructure Labs team to get the (minimum 4)
367 physical machines added to the Swarming pool. Use
Takuto Ikuta2d01a492021-06-04 00:28:58368 [chromium-swarm.appspot.com] or `src/tools/luci-go/swarming bots`
Kai Ninomiyaa6429fb32018-03-30 01:30:56369 to determine the PCI IDs of the GPUs in the bots. (These instructions will
370 need to be updated for Android bots which don't have PCI buses.)
Kenneth Russell9618adde2018-05-03 03:16:05371
John Budorickb2ff2242019-11-14 17:35:59372 1. Make sure to add these new machines to the chromium.tests.gpu Swarming
Yuly Novikov8e92b172020-02-07 17:40:12373 pool by creating a CL against [`gpu.star`][gpu.star] in the
374 [`infradata/config`][infradata/config] (Google internal) workspace.
375 Git configure your user.email to @google.com if necessary. Here is one
376 [example CL](https://chrome-internal-review.googlesource.com/913528)
377 and a
378 [second example](https://chrome-internal-review.googlesource.com/1111456).
Kenneth Russell9618adde2018-05-03 03:16:05379
Yuly Novikov8e92b172020-02-07 17:40:12380 1. Run [`main.star`][main.star] to regenerate
381 `configs/chromium-swarm/bots.cfg`. Double-check your work there.
Kenneth Russellfb27e2d2019-03-29 22:19:55382
3831. Allocate new virtual machines for the bots as described in [How to set up
384 new virtual machine
385 instances](#How-to-set-up-new-virtual-machine-instances).
Kenneth Russell9618adde2018-05-03 03:16:05386
Kenneth Russell9618adde2018-05-03 03:16:053871. Create a CL in the Chromium workspace which does the following. Here's an
Yuly Novikov8e92b172020-02-07 17:40:12388 [example CL](https://chromium-review.googlesource.com/c/chromium/src/+/1752291).
389 1. Adds the new machines to [`waterfalls.pyl`][waterfalls.pyl] directly or
390 to [`mixins.pyl`][mixins.pyl], referencing the new mixin in
391 [`waterfalls.pyl`][waterfalls.pyl].
Kai Ninomiyaa6429fb32018-03-30 01:30:56392 1. The swarming dimensions are crucial. These must match the GPU and
393 OS type of the physical hardware in the Swarming pool. This is what
394 causes the VMs to spawn their tests on the correct hardware. Make
John Budorickb2ff2242019-11-14 17:35:59395 sure to use the chromium.tests.gpu pool, and that the new machines
396 were specifically added to that pool.
Kai Ninomiyaa6429fb32018-03-30 01:30:56397 1. Make triply sure that there are no collisions between the new
398 hardware you're adding and hardware already in the Swarming pool.
399 For example, it used to be the case that all of the Windows NVIDIA
400 bots ran the same OS version. Later, the Windows 8 flavor bots were
401 added. In order to avoid accidentally running tests on Windows 8
402 when Windows 7 was intended, the OS in the swarming dimensions of
403 the Win7 bots had to be changed from `win` to
404 `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our
405 data center). Similarly, the Win8 bots had to have a very precise
406 OS description (`Windows-2012ServerR2-SP0`).
Kenneth Russell9618adde2018-05-03 03:16:05407 1. If you're deploying a new bot that's similar to another existing
Kenneth Russell8a386d42018-06-02 09:48:01408 configuration, please search around in
Yuly Novikov8e92b172020-02-07 17:40:12409 [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] for
410 references to the other bot's name and see if your new bot needs
411 to be added to any exclusion lists. For example, some of the tests
412 don't run on certain Win bots because of missing OpenGL extensions.
413 1. Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
414 regenerate `src/testing/buildbot/chromium.gpu.fyi.json`.
415 1. Updates [`ci.star`][ci.star] and its related generated files
Brian Sheedya7bd47b2020-05-12 01:10:01416 [`cr-buildbucket.cfg`][cr-buildbucket.cfg],
417 [`luci-scheduler.cfg`][luci-scheduler.cfg], and
418 ['luci-milo.cfg`][luci-milo.cfg]:
Yuly Novikov8e92b172020-02-07 17:40:12419 * Use the appropriate definition for the type of the bot being added,
420 for example, `ci.gpu_fyi_thin_tester()` should be used for all CI
421 tester bots on GPU FYI waterfall.
422 * Make sure to set `triggered_by` property to the builder which
423 triggers the testers (like `'GPU Win FYI Builder'`).
Brian Sheedya7bd47b2020-05-12 01:10:01424 * Include a `ci.console_view_entry` for the builder's
425 `console_view_entry` argument. Look at the short names and
426 categories to try and come up with a reasonable organization.
Yuly Novikov8e92b172020-02-07 17:40:12427 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
428 generated files. Double-check your work there.
Kenneth Russell9618adde2018-05-03 03:16:05429 1. If you were adding a new builder, you would need to also add the new
Yuly Novikov55b23a62020-10-02 18:23:43430 machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
Kenneth Russell139881b2018-05-04 00:45:20431
4321. After the Chromium-side CL lands it will take some time for all of
433 the configuration changes to be picked up by the system. The bot
Kenneth Russell4d1bb4482018-05-09 23:36:37434 will probably be in a red or purple state, claiming that it can't
435 find its configuration. (It might also be in an "empty" state, not
436 running any jobs at all.)
Kenneth Russell139881b2018-05-04 00:45:20437
Kenneth Russell4d1bb4482018-05-09 23:36:374381. *After* the Chromium-side CL lands and the bot is on the console, create a CL
439 in the [`tools/build`][tools/build] workspace which does the
Kenneth Russell139881b2018-05-04 00:45:20440 following. Here's an [example
441 CL](https://chromium-review.googlesource.com/1041145).
Yuly Novikov8e92b172020-02-07 17:40:12442 1. Adds the new bot to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
Yuly Novikov55b23a62020-10-02 18:23:43443 `recipes/recipe_modules/chromium_tests/builders/`. Make sure to set the
Kenneth Russell139881b2018-05-04 00:45:20444 `serialize_tests` property to `True`. This is specified for waterfall
445 bots, but not trybots, and helps avoid overloading the physical
446 hardware. Double-check the `BUILD_CONFIG` and `parent_buildername`
447 properties for each. They must match the Release/Debug flavor of the
Yuly Novikov1c8f4b92021-04-15 01:47:15448 builder, like `GPU FYI Win x64 Builder` vs.
449 `GPU FYI Win x64 Builder (dbg)`.
Kenneth Russell139881b2018-05-04 00:45:20450 1. Get this reviewed and landed. This step tells the Chromium recipe about
451 the newly-deployed waterfall bot, so it knows which JSON file to load
452 out of src/testing/buildbot and which entry to look at.
Yuly Novikov8e92b172020-02-07 17:40:12453 1. Sometimes it is necessary to retrain recipe expectations
Yuly Novikov55b23a62020-10-02 18:23:43454 (`recipes/recipes.py test train`). This is usually needed only
Yuly Novikov8e92b172020-02-07 17:40:12455 if the bot adds untested code flow in a recipe, but it's something
456 to watch out for if your CL fails presubmit for some reason.
Kenneth Russell139881b2018-05-04 00:45:20457
Kenneth Russell4d1bb4482018-05-09 23:36:374581. Note that it is crucial that the bot be deployed before hooking it up in the
459 tools/build workspace. In the new LUCI world, if the parent builder can't
460 find its child testers to trigger, that's a hard error on the parent. This
461 will cause the builders to fail. You can and should prepare the tools/build
462 CL in advance, but make sure it doesn't land until the bot's on the console.
Kai Ninomiyaa6429fb32018-03-30 01:30:56463
Yuly Novikov8e92b172020-02-07 17:40:124641. If the number of physical machines for the new bot permits, you should also
465 add a manually-triggered trybot at the same time that the CI bot is added.
466 This is described in [How to add a new manually-triggered trybot].
467
Brian Sheedy1ac3f672021-01-06 23:43:03468While the above instructions assume that an existing parent builder will be
469be used, a new one can be set up by performing a modified version of the steps:
470
4711. Make a [`tools/build`][tools/build] CL that adds the config for *only* the
472 new builder and land it.
4731. Make and land Chromium CL that makes the above changes in addition to the
474 following:
475 1. Add the new builder to the necessary `//infra/config` files in the same
476 way as the tester.
477 1. Add the new builder to [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
4781. Make a [`tools/build`][tools/build] CL that adds the config for *only* the
479 new tester and land it.
480
481Attempting to set up the builder/tester pair without first landing the
482[`tools/build`][tools/build] CL for the new builder will result in things
483breaking as seen in [this bug][misconfigured builder bug].
484
John Palmer046f9872021-05-24 01:24:56485[How to add a new manually-triggered trybot]: https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing_bot_details.md#How-to-add-a-new-manually_triggered-trybot
Yuly Novikov8e92b172020-02-07 17:40:12486
John Palmer046f9872021-05-24 01:24:56487[ci.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/ci.star
488[chromium.gpu.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/chromium.gpu.star
489[chromium.gpu.fyi.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/chromium.gpu.fyi.star
490[cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/cr-buildbucket.cfg
491[luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/luci-scheduler.cfg
492[luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/luci-milo.cfg
Yuly Novikov8e92b172020-02-07 17:40:12493[GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder
Brian Sheedy1ac3f672021-01-06 23:43:03494[misconfigured builder bug]: https://bugs.chromium.org/p/chromium/issues/detail?id=1163657
Kai Ninomiyaa6429fb32018-03-30 01:30:56495
Kenneth Russell3a8e5c022018-05-04 21:14:49496### How to start running tests on a new GPU type on an existing try bot
Kai Ninomiyaa6429fb32018-03-30 01:30:56497
Yuly Novikov8e92b172020-02-07 17:40:12498Let's say that you want to cause the `win10_chromium_x64_rel_ng` try bot to run
499tests on CoolNewGPUType in addition to the types it currently runs (as of this
500writing only NVIDIA). To do this:
Kai Ninomiyaa6429fb32018-03-30 01:30:56501
Yuly Novikov8e92b172020-02-07 17:40:125021. Make sure there is enough hardware capacity using the available tools to
503 report utilization of the Swarming pool.
5041. Deploy Release and Debug testers on the `chromium.gpu` waterfall, following
505 the instructions for the `chromium.gpu.fyi` waterfall above. Make sure
506 the flakiness on the new bots is comparable to existing `chromium.gpu` bots
507 before proceeding.
5081. Create a CL in the [`tools/build`][tools/build] workspace, adding the new
509 Release tester to `win10_chromium_x64_rel_ng`'s `bot_ids` list
Yuly Novikov55b23a62020-10-02 18:23:43510 in `recipes/recipe_modules/chromium_tests/trybots.py`. Rerun
511 `recipes/recipes.py test train`.
Yuly Novikov8e92b172020-02-07 17:40:125121. Once the above CL lands, the commit queue will **immediately** start
Kai Ninomiyaa6429fb32018-03-30 01:30:56513 running tests on the CoolNewGPUType configuration. Be vigilant and make
514 sure that tryjobs are green. If they are red for any reason, revert the CL
515 and figure out offline what went wrong.
516
Kenneth Russell3a8e5c022018-05-04 21:14:49517### How to add a new manually-triggered trybot
518
Yuly Novikov8e92b172020-02-07 17:40:12519Manually-triggered trybots are needed for investigating failures on a GPU type
520which doesn't have a corresponding CQ trybot (due to lack of GPU resources).
521Even for GPU types that have CQ trybots, it is convenient to have
522manually-triggered trybots as well, since the CQ trybot often runs on more than
523one GPU type, or some test suites which run on CI bot can be disabled on CQ
524trybot (when the CQ bot mirrors a
John Palmer046f9872021-05-24 01:24:56525[fake bot](https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing_bot_details.md#how-to-add-a-new-try-bot-that-runs-a-subset-of-tests-or-extra-tests)).
Yuly Novikov8e92b172020-02-07 17:40:12526Thus, all CI bots in `chromium.gpu` and `chromium.gpu.fyi` have corresponding
527manually-triggered trybots, except a few which don't have enough hardware
528to support it. A manually-triggered trybot should be added at the same time
529a CI bot is added.
Kenneth Russell3a8e5c022018-05-04 21:14:49530
531Here are the steps to set up a new trybot which runs tests just on one
532particular GPU type. Let's consider that we are adding a manually-triggered
533trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot
Yuly Novikov8e92b172020-02-07 17:40:12534`gpu-fyi-try-win7-nvidia-rel-64`.
Kenneth Russell3a8e5c022018-05-04 21:14:49535
Yuly Novikov8e92b172020-02-07 17:40:125361. If there already exist some manually-triggered trybot which runs tests on
537 the same group of machines (i.e. same GPU, OS and driver), the new trybot
538 will have to share the VMs with it. Otherwise, create a new pool of VMs for
539 the new hardware and allocate the VMs as described in
540 [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances),
541 following the "Manually-triggered GPU trybots" instructions.
Kenneth Russell3a8e5c022018-05-04 21:14:49542
Brian Sheedya7bd47b2020-05-12 01:10:015431. Create a CL in the Chromium workspace which does the following. Here's a
544 [reference CL](https://chromium-review.googlesource.com/c/chromium/src/+/2191276)
Yuly Novikov8e92b172020-02-07 17:40:12545 exemplifying the new "GCE pool per GPU hardware pool" way.
546 1. Updates [`gpu.try.star`][gpu.try.star] and its related generated file
547 [`cr-buildbucket.cfg`][cr-buildbucket.cfg]:
548 * Add the new trybot with the right `builder` define and VMs pool.
549 For `gpu-fyi-try-win7-nvidia-rel-64` this would be
550 `gpu_win_builder()` and `luci.chromium.gpu.win7.nvidia.try`.
Yuly Novikov8e92b172020-02-07 17:40:12551 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
552 generated files. Double-check your work there.
553 1. Adds the new trybot to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
554 and [`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl].
555 Use the same mixin as does the builder for the CI bot this trybot
556 mirrors, in case of `gpu-fyi-try-win7-nvidia-rel-64` this is
557 `GPU FYI Win x64 Builder` and thus `gpu_fyi_tests_release_trybot`.
Kenneth Russell3a8e5c022018-05-04 21:14:49558 1. Get this CL reviewed and landed.
559
5601. Create a CL in the [`tools/build`][tools/build] workspace which does the
561 following. Here's an [example
Yuly Novikov8e92b172020-02-07 17:40:12562 CL](https://chromium-review.googlesource.com/c/chromium/tools/build/+/1979113).
Kenneth Russell3a8e5c022018-05-04 21:14:49563
564 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in
Yuly Novikov55b23a62020-10-02 18:23:43565 `recipes/recipe_modules/chromium_tests/tests/trybots.py`. Create this
Kenneth Russell3a8e5c022018-05-04 21:14:49566 section after the "Optional GPU bots" section for the appropriate
567 tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`,
568 `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot
569 mirror the appropriate waterfall bot; in this case, the buildername to
Yuly Novikov8e92b172020-02-07 17:40:12570 mirror is `GPU FYI Win x64 Builder` and the tester is
571 `Win7 FYI x64 Release (NVIDIA)`.
Kenneth Russell3a8e5c022018-05-04 21:14:49572 1. Get this reviewed and landed. This step tells the Chromium recipe about
573 the newly-deployed trybot, so it knows which JSON file to load out of
Yuly Novikov8e92b172020-02-07 17:40:12574 `src/testing/buildbot` and which entry to look at to understand which
Kenneth Russell3a8e5c022018-05-04 21:14:49575 tests to run and on what physical hardware.
Yuly Novikov8e92b172020-02-07 17:40:12576 1. It may be necessary to retrain recipe expectations for
577 [`tools/build`][tools/build] workspace CLs
Yuly Novikov55b23a62020-10-02 18:23:43578 (`recipes/recipes.py test train`). This shouldn't be necessary
Yuly Novikov8e92b172020-02-07 17:40:12579 for just adding a manually triggered trybot, but it's something to
580 watch out for if your CL fails presubmit for some reason.
Kenneth Russell3a8e5c022018-05-04 21:14:49581
Kenneth Russellfc566142018-06-26 22:34:15582At this point the new trybot should automatically show up in the
583"Choose tryjobs" pop-up in the Gerrit UI, under the
584`luci.chromium.try` heading, because it was deployed via LUCI. It
585should be possible to send a CL to it.
Kenneth Russell3a8e5c022018-05-04 21:14:49586
Kenneth Russellfc566142018-06-26 22:34:15587(It should not be necessary to modify buildbucket.config as is
588mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the
589chrome-infra team if this doesn't work as expected.)
Kenneth Russell3a8e5c022018-05-04 21:14:49590
John Palmer046f9872021-05-24 01:24:56591[gpu.try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/gpu.try.star
592[luci.chromium.try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/luci.chromium.try.star
593[tryserver.chromium.win.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/tryserver.chromium.win.star
Kenneth Russell3a8e5c022018-05-04 21:14:49594
595
Jamie Madillda894ce2019-04-08 17:19:17596### How to add a new try bot that runs a subset of tests or extra tests
Kenneth Russell3a8e5c022018-05-04 21:14:49597
Jamie Madillda894ce2019-04-08 17:19:17598Several projects (ANGLE, Dawn) run custom tests using the Chromium recipes. They
599use try bot bot configs that run subsets of Chromium or additional slower tests
600that can't be run on the main CQ.
Kai Ninomiyaa6429fb32018-03-30 01:30:56601
Jamie Madillda894ce2019-04-08 17:19:17602These try bots are a little different because they mirror waterfall bots that
603don't actually exist. The waterfall bots' specifications exist only to tell
604these try bots which tests to run.
Kai Ninomiyaa6429fb32018-03-30 01:30:56605
Jamie Madillda894ce2019-04-08 17:19:17606Let's say that you intended to add a new such custom try bot on Windows. Call it
607`win-myproject-rel` for example. You will need to add a "fake" mirror bot for
Yuly Novikov8e92b172020-02-07 17:40:12608each GPU family on which you want to run the tests. For a GPU type of
Jamie Madillda894ce2019-04-08 17:19:17609"CoolNewGPUType" in this example you could add a "fake" bot named "MyProject GPU
610Win10 Release (CoolNewGPUType)".
Kai Ninomiyaa6429fb32018-03-30 01:30:56611
Yuly Novikov8e92b172020-02-07 17:40:126121. Allocate new virtual machines for the bots as described in
613 [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances).
6141. Make sure there is enough hardware capacity using the available tools to
615 report utilization of the Swarming pool.
Jamie Madillda894ce2019-04-08 17:19:176161. Create a CL in the Chromium workspace the does the following. Here's an
Yuly Novikov8e92b172020-02-07 17:40:12617 outdated [example CL](https://crrev.com/c/1554296).
Jamie Madillda894ce2019-04-08 17:19:17618 1. Add your new bot (for example, "MyProject GPU Win10 Release
Kai Ninomiyaa6429fb32018-03-30 01:30:56619 (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
Yuly Novikov8e92b172020-02-07 17:40:12620 [`waterfalls.pyl`][waterfalls.pyl].
621 1. Add your new bot to
622 [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
623 in the list of `get_bots_that_do_not_actually_exist` section.
624 1. Re-run
625 [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
626 to regenerate the JSON files.
627 1. Update [`scheduler-noop-jobs.star`][scheduler-noop-jobs.star] to
628 include "MyProject GPU Win10 Release (CoolNewGPUType)".
629 1. Update [`try.star`][try.star] and desired consoles to include
630 `win-myproject-rel`.
631 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
632 generated files: [`luci-milo.cfg`][luci-milo.cfg],
633 [`luci-scheduler.cfg`][luci-scheduler.cfg],
634 [`cr-buildbucket.cfg`][cr-buildbucket.cfg]. Double-check your work
635 there.
Yuly Novikov55b23a62020-10-02 18:23:43636 1. Update [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
Yuly Novikov8e92b172020-02-07 17:40:12637 to include `win-myproject-rel`.
Jamie Madillda894ce2019-04-08 17:19:176381. *After* the Chromium-side CL lands and the bot is on the console, create a CL
639 in the [`tools/build`][tools/build] workspace which does the
640 following. Here's an [example CL](https://crrev.com/c/1554272).
641 1. Adds "MyProject GPU Win10 Release
642 (CoolNewGPUType)" to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
Yuly Novikov55b23a62020-10-02 18:23:43643 `recipes/recipe_modules/chromium_tests/builders/`. You can copy a similar
Jamie Madillda894ce2019-04-08 17:19:17644 step.
645 1. Adds `win-myproject-rel` to [`trybots.py`][trybots.py] in the same folder.
646 This is where you associate "MyProject GPU Win10 Release
647 (CoolNewGPUType)" with `win-myproject-rel`. See the sample CL for an example.
648 1. Get this reviewed and landed. This step tells the Chromium recipe about
649 the newly-deployed waterfall bot, so it knows which JSON file to load
Yuly Novikov8e92b172020-02-07 17:40:12650 out of `src/testing/buildbot` and which entry to look at.
Jamie Madillda894ce2019-04-08 17:19:176511. After your CLs land you should be able to find and run `win-myproject-rel` on CLs
652 using Choose Trybots in Gerrit.
Kai Ninomiyaa6429fb32018-03-30 01:30:56653
John Palmer046f9872021-05-24 01:24:56654[scheduler-noop-jobs.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generators/scheduler-noop-jobs.star
655[try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/try.star
Yuly Novikov8e92b172020-02-07 17:40:12656
657
Yuly Novikov3fbea992019-06-28 18:25:42658### How to test and deploy a driver and/or OS update
Kai Ninomiyaa6429fb32018-03-30 01:30:56659
Yuly Novikov3fbea992019-06-28 18:25:42660Let's say that you want to roll out an update to the graphics drivers or the OS
661on one of the configurations like the Linux NVIDIA bots. In order to verify
662that the new driver or OS won't destabilize Chromium's commit queue,
663it's necessary to run the new driver or OS on one of the waterfalls for a day
664or two to make sure the tests are reliably green before rolling out the driver
665or OS update. To do this:
Kai Ninomiyaa6429fb32018-03-30 01:30:56666
Kenneth Russell9618adde2018-05-03 03:16:056671. Make sure that all of the current Swarming jobs for this OS and GPU
Yuly Novikov3fbea992019-06-28 18:25:42668 configuration are targeted at the "stable" version of the driver and the OS
Yuly Novikov8e92b172020-02-07 17:40:12669 in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
Yuly Novikov3fbea992019-06-28 18:25:426701. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of
671 the physical machines already in the Swarming pool upgraded to the new
672 version of the driver or the OS.
Kenneth Russell9618adde2018-05-03 03:16:056731. If an "experimental" version of this bot doesn't yet exist, follow the
674 instructions above for [How to add a new tester bot to the chromium.gpu.fyi
675 waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall)
676 to deploy one.
Yuly Novikov3fbea992019-06-28 18:25:426771. Have this experimental bot target the new version of the driver or the OS
Yuly Novikov8e92b172020-02-07 17:40:12678 in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
679 [Sample CL][sample driver cl].
Kenneth Russell9618adde2018-05-03 03:16:056801. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
Brian Sheedy1cea4d42019-08-12 18:09:49681 it'll be necessary to follow the instructions on
682 [updating Gold baselines (step #4)][updating gold baselines].
Kenneth Russell9618adde2018-05-03 03:16:056831. Watch the new machine for a day or two to make sure it's stable.
Brian Sheedy811cca72020-05-21 21:34:146841. When it is, add the experimental driver/OS to the `_stable` mixin using the
685 swarming OR operator `|`. For example:
Yuly Novikov3fbea992019-06-28 18:25:42686
Yuly Novikovf13babb2019-04-24 23:46:57687 ```
Brian Sheedy811cca72020-05-21 21:34:14688 'win10_intel_hd_630_stable': {
689 'swarming': {
690 'dimensions': {
691 'gpu': '8086:5912-26.20.100.7870|8086:5912-26.20.100.8141',
692 'os': 'Windows-10',
693 'pool': 'chromium.tests.gpu',
694 },
Yuly Novikov3fbea992019-06-28 18:25:42695 },
Yuly Novikov3fbea992019-06-28 18:25:42696 }
697 ```
698
Brian Sheedy811cca72020-05-21 21:34:14699 This will cause tests triggered using the `_stable` mixin to run on either
700 the old stable dimension or the experimental/new stable dimension.
701
702 **NOTE** There is a hard cap of 8 combinations in swarming, so you can only
703 use the OR operator in up to 3 dimensions if each dimension only has two
704 options. More than two options per dimension is allowed as long as the total
705 number of combinations is 8 or less.
Kenneth Russell384a1732019-03-16 02:36:027061. After it lands, ask the Chrome Infrastructure Labs team to roll out the
Kenneth Russell9618adde2018-05-03 03:16:05707 driver update across all of the similarly configured bots in the swarming
708 pool.
7091. If necessary, update pixel test expectations and remove the suppressions
Kai Ninomiyaa6429fb32018-03-30 01:30:56710 added above.
Brian Sheedy811cca72020-05-21 21:34:147111. Remove the old driver or OS version from the `_stable` mixin, leaving just
712 the new stable version.
Kai Ninomiyaa6429fb32018-03-30 01:30:56713
Kenneth Russell9618adde2018-05-03 03:16:05714Note that we leave the experimental bot in place. We could reclaim it, but it
715seems worthwhile to continuously test the "next" version of graphics drivers as
716well as the current stable ones.
Kai Ninomiyaa6429fb32018-03-30 01:30:56717
Brian Sheedy1cea4d42019-08-12 18:09:49718[sample driver cl]: https://chromium-review.googlesource.com/c/chromium/src/+/1726875
Brian Sheedy5a4c0a392021-09-22 21:28:35719[updating gold baselines]: http://go/gpu-pixel-wrangler-info#how-to-keep-the-bots-green
Kai Ninomiyaa6429fb32018-03-30 01:30:56720
721## Credentials for various servers
722
723Working with the GPU bots requires credentials to various services: the isolate
724server, the swarming server, and cloud storage.
725
726### Isolate server credentials
727
728To upload and download isolates you must first authenticate to the isolate
729server. From a Chromium checkout, run:
730
Takuto Ikuta2d01a492021-06-04 00:28:58731* `./src/tools/luci-go/isolate login`
Kai Ninomiyaa6429fb32018-03-30 01:30:56732
733This will open a web browser to complete the authentication flow. A @google.com
734email address is required in order to properly authenticate.
735
736To test your authentication, find a hash for a recent isolate. Consult the
737instructions on [Running Binaries from the Bots Locally] to find a random hash
Takuto Ikutaf5333252019-11-06 16:07:08738from a target like `gl_tests`. Then run the following:
Kai Ninomiyaa6429fb32018-03-30 01:30:56739
740[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally
741
Kai Ninomiyaa6429fb32018-03-30 01:30:56742### Swarming server credentials
743
744The swarming server uses the same `auth.py` script as the isolate server. You
745will need to authenticate if you want to manually download the results of
746previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`
747to re-run a remote job on your local workstation. Follow the instructions
748above, replacing the service with `https://chromium-swarm.appspot.com`.
749
750### Cloud storage credentials
751
752Authentication to Google Cloud Storage is needed for a couple of reasons:
753uploading pixel test results to the cloud, and potentially uploading and
754downloading builds as well, at least in Debug mode. Use the copy of gsutil in
755`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage
756instructions] to authenticate. You must use your @google.com email address and
757be a member of the Chrome GPU team in order to receive read-write access to the
758appropriate cloud storage buckets. Roughly:
759
7601. Run `gsutil config`
7612. Copy/paste the URL into your browser
7623. Log in with your @google.com account
7634. Allow the app to access the information it requests
7645. Copy-paste the resulting key back into your Terminal
7656. Press "enter" when prompted for a project-id (i.e., leave it empty)
766
767At this point you should be able to write to the cloud storage bucket.
768
769Navigate to
770<https://console.developers.google.com/storage/chromium-gpu-archive> to view
771the contents of the cloud storage bucket.
772
773[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil