blob: fe3240c5957f5795e0b7e72a6efa78ae8e9a1149 [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Bot Details
2
Kenneth Russell9618adde2018-05-03 03:16:053This page describes in detail how the GPU bots are set up, which files affect
Kai Ninomiyaa6429fb32018-03-30 01:30:564their configuration, and how to both modify their behavior and add new bots.
5
6[TOC]
7
8## Overview of the GPU bots' setup
9
10Chromium's GPU bots, compared to the majority of the project's test machines,
11are physical pieces of hardware. When end users run the Chrome browser, they
12are almost surely running it on a physical piece of hardware with a real
13graphics processor. There are some portions of the code base which simply can
14not be exercised by running the browser in a virtual machine, or on a software
15implementation of the underlying graphics libraries. The GPU bots were
16developed and deployed in order to cover these code paths, and avoid
17regressions that are otherwise inevitable in a project the size of the Chromium
18browser.
19
20The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]
21waterfalls, and various tryservers, as described in [Using the GPU Bots].
22
Kenneth Russell9618adde2018-05-03 03:16:0523[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
24[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Kai Ninomiyaa6429fb32018-03-30 01:30:5625[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots
26
Kenneth Russell9618adde2018-05-03 03:16:0527All of the physical hardware for the bots lives in the Swarming pool, and most
John Budorickb2ff2242019-11-14 17:35:5928of it in the chromium.tests.gpu Swarming pool. The waterfall bots are simply
29virtual machines which spawn Swarming tasks with the appropriate tags to get
30them to run on the desired GPU and operating system type. So, for example, the
31[Win10 x64 Release (NVIDIA)] bot is actually a virtual machine which spawns all
32of its jobs with the Swarming parameters:
Kai Ninomiyaa6429fb32018-03-30 01:30:5633
Takuto Ikuta4fd6b4792019-08-19 21:37:3134[Win10 x64 Release (NVIDIA)]: https://ci.chromium.org/p/chromium/builders/ci/Win10%20x64%20Release%20%28NVIDIA%29
Kai Ninomiyaa6429fb32018-03-30 01:30:5635
36```json
37{
Yuly Novikov8e92b172020-02-07 17:40:1238 "gpu": "nvidia-quadro-p400-win10-stable",
Kai Ninomiyaa6429fb32018-03-30 01:30:5639 "os": "Windows-10",
John Budorickb2ff2242019-11-14 17:35:5940 "pool": "chromium.tests.gpu"
Kai Ninomiyaa6429fb32018-03-30 01:30:5641}
42```
43
44Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient
45to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows
467-like NVIDIA bots in the pool, which necessitates the OS specifier.)
47
48Details about the bots can be found on [chromium-swarm.appspot.com] and by
49using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`.
50If you are authenticated with @google.com credentials you will be able to make
51queries of the bots and see, for example, which GPUs are available.
52
53[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/
54
55The waterfall bots run tests on a single GPU type in order to make it easier to
56see regressions or flakiness that affect only a certain type of GPU.
Yuly Novikov8e92b172020-02-07 17:40:1257'Mac FYI GPU ASAN Release' is an exception, running both on Intel and AMD GPUs.
Kai Ninomiyaa6429fb32018-03-30 01:30:5658
Yuly Novikov8e92b172020-02-07 17:40:1259The tryservers like `win10_chromium_x64_rel_ng` which include GPU tests, on the other
Kai Ninomiyaa6429fb32018-03-30 01:30:5660hand, run tests on more than one GPU type. As of this writing, the Windows
61tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
62Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
63by *mirroring* how one or more waterfall bots work. This is an inherent
64property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate
65differences in behavior between the tryservers and waterfall bots. Since the
66tryservers mirror waterfall bots, if the waterfall bot is working, the
67tryserver must almost inherently be working as well.
68
Yuly Novikov55b23a62020-10-02 18:23:4369[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/recipes/recipes/chromium_trybot.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5670
Yuly Novikov8e92b172020-02-07 17:40:1271There are some GPU configurations on the waterfall backed by only one machine,
72or a very small number of machines in the Swarming pool. A few examples are:
Kai Ninomiyaa6429fb32018-03-30 01:30:5673
74<!-- XXX: update this list -->
Yves Gereya702f6222019-01-24 11:07:3075* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20%28AMD%29)
Yves Gereya702f6222019-01-24 11:07:3076* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/)
Kai Ninomiyaa6429fb32018-03-30 01:30:5677
78There are a couple of reasons to continue to support running tests on a
79specific machine: it might be too expensive to deploy the required multiple
80copies of said hardware, or the configuration might not be reliable enough to
81begin scaling it up.
82
83## Adding a new isolated test to the bots
84
85Adding a new test step to the bots requires that the test run via an isolate.
86Isolates describe both the binary and data dependencies of an executable, and
Yuly Novikov8e92b172020-02-07 17:40:1287are the underpinning of how the Swarming system works. See the [LUCI] documentation for
88background on [Isolates] and [Swarming].
Kai Ninomiyaa6429fb32018-03-30 01:30:5689
Yuly Novikov8e92b172020-02-07 17:40:1290[LUCI]: https://github.com/luci/luci-py
91[Isolates]: https://github.com/luci/luci-py/blob/master/appengine/isolate/doc/README.md
92[Swarming]: https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:5693
94### Adding a new isolate
95
961. Define your target using the `template("test")` template in
Takuto Ikutaf5333252019-11-06 16:07:0897 [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
Kai Ninomiyaa6429fb32018-03-30 01:30:5698 [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
99 example which invokes a series of scripts which finally launches the
Yuly Novikov8e92b172020-02-07 17:40:12100 browser, see `telemetry_gpu_integration_test` in [`chrome/test/BUILD.gn`][chrome/test/BUILD.gn].
Kai Ninomiyaa6429fb32018-03-30 01:30:561012. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
102 your target. Find a similar target to yours in order to determine the
Yuly Novikov8e92b172020-02-07 17:40:12103 `type`. The type is referenced in [`src/tools/mb/mb.py`][mb.py].
Kai Ninomiyaa6429fb32018-03-30 01:30:56104
Yuly Novikov8e92b172020-02-07 17:40:12105[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni
106[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn
107[chrome/test/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/chrome/test/BUILD.gn
108[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
109[mb.py]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb.py
Kai Ninomiyaa6429fb32018-03-30 01:30:56110
111At this point you can build and upload your isolate to the isolate server.
112
113See [Isolated Testing for SWEs] for the most up-to-date instructions. These
114instructions are a copy which show how to run an isolate that's been uploaded
115to the isolate server on your local machine rather than on Swarming.
116
117[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes
118
119If `cd`'d into `src/`:
120
1211. `./tools/mb/mb.py isolate //out/Release [target name]`
122 * For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`
1231. `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json`
124 * For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json`
1251. This will write a hash to stdout. You can run it via:
126 `python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]`
127
128See the section below on [isolate server credentials](#Isolate-server-credentials).
129
130### Adding your new isolate to the tests that are run on the bots
131
132See [Adding new steps to the GPU bots] for details on this process.
133
134[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots
135
136## Relevant files that control the operation of the GPU bots
137
Yuly Novikov8e92b172020-02-07 17:40:12138In the [`tools/build`][tools/build] workspace:
Kai Ninomiyaa6429fb32018-03-30 01:30:56139
Yuly Novikov55b23a62020-10-02 18:23:43140* `recipes/recipe_modules/chromium_tests/`:
Yuly Novikov8e92b172020-02-07 17:40:12141 * [`chromium_gpu.py`][chromium_gpu.py] and
142 [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] define the following for
Kai Ninomiyaa6429fb32018-03-30 01:30:56143 each builder and tester:
144 * How the workspace is checked out (e.g., this is where top-of-tree
145 ANGLE is specified)
146 * The build configuration (e.g., this is where 32-bit vs. 64-bit is
147 specified)
148 * Various gclient defines (like compiling in the hardware-accelerated
149 video codecs, and enabling compilation of certain tests, like the
150 dEQP tests, that can't be built on all of the Chromium builders)
151 * Note that the GN configuration of the bots is also controlled by
Yuly Novikov8e92b172020-02-07 17:40:12152 [`mb_config.pyl`][mb_config.pyl] in the Chromium workspace; see below.
153 * [`trybots.py`][trybots.py] defines how try bots *mirror* one or more
Kai Ninomiyaa6429fb32018-03-30 01:30:56154 waterfall bots.
155 * The concept of try bots mirroring waterfall bots ensures there are
156 no differences in behavior between the waterfall bots and the try
157 bots. This helps ensure that a CL will not pass the commit queue
158 and then break on the waterfall.
159 * This file defines the behavior of the following GPU-related try
160 bots:
Yuly Novikov8e92b172020-02-07 17:40:12161 * `linux-rel`, `mac-rel`, `win10_chromium_x64_rel_ng` and
162 `android-marshmallow-arm64-rel`, which run against every
Stephen Martinis089f5f02019-02-12 02:42:24163 Chromium CL, and which mirror the behavior of bots on the
164 chromium.gpu waterfall.
Kai Ninomiyaa6429fb32018-03-30 01:30:56165 * The ANGLE try bots, which run against ANGLE CLs, and mirror the
166 behavior of the chromium.gpu.fyi waterfall (including using
167 top-of-tree ANGLE, and running additional tests not run by the
168 regular Chromium try bots)
Yuly Novikov8e92b172020-02-07 17:40:12169 * The optional GPU try servers `linux_optional_gpu_tests_rel`,
170 `mac_optional_gpu_tests_rel`, `win_optional_gpu_tests_rel` and
171 `android_optional_gpu_tests_rel`, which are added automatically
172 to CLs which modify a selected set of subdirectories and
173 run some tests which can't be run on the regular Chromium try
174 servers mainly due to lack of hardware capacity.
175 * Manual GPU trybots, starting with `gpu-try-` and `gpu-fyi-try-`
176 prefixes, which can be added manually to CLs targeting a
177 specific hardware configuration.
Kai Ninomiyaa6429fb32018-03-30 01:30:56178
Yuly Novikov8e92b172020-02-07 17:40:12179[tools/build]: https://chromium.googlesource.com/chromium/tools/build/
Yuly Novikov55b23a62020-10-02 18:23:43180[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/recipes/recipe_modules/chromium_tests/builders/chromium_gpu.py
181[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py
182[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/recipes/recipe_modules/chromium_tests/trybots.py
Kai Ninomiyaa6429fb32018-03-30 01:30:56183
Yuly Novikov8e92b172020-02-07 17:40:12184In the [`chromium/src`][chromium/src] workspace:
Kai Ninomiyaa6429fb32018-03-30 01:30:56185
Yuly Novikov8e92b172020-02-07 17:40:12186* [`src/testing/buildbot`][src/testing/buildbot]:
187 * [`chromium.gpu.json`][chromium.gpu.json] and
188 [`chromium.gpu.fyi.json`][chromium.gpu.fyi.json] define which steps are
189 run on which bots. These files are autogenerated. Don't modify them
190 directly!
191 * [`waterfalls.pyl`][waterfalls.pyl],
192 [`test_suites.pyl`][test_suites.pyl], [`mixins.pyl`][mixins.pyl] and
193 [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] define the
194 confugation for the autogenerated json files above.
195 Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
196 generate the json files after you modify these pyl files.
197 * [`generate_buildbot_json.py`][generate_buildbot_json.py]
198 * The generator script for all the waterfalls, including
199 `chromium.gpu.json` and `chromium.gpu.fyi.json`.
200 * See the [README for generate_buildbot_json.py] for documentation
201 on this script and the descriptions of the waterfalls and test
202 suites.
203 * When modifying this script, don't forget to also run it, to
204 regenerate the JSON files. Don't worry; the presubmit step will
205 catch this if you forget.
206 * See [Adding new steps to the GPU bots] for more details.
207 * [`gn_isolate_map.pyl`][gn_isolate_map.pyl] defines all of the isolates'
208 behavior in the GN build.
Kai Ninomiyaa6429fb32018-03-30 01:30:56209* [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
210 * Defines the GN arguments for all of the bots.
Yuly Novikov8e92b172020-02-07 17:40:12211* [`src/infra/config`][src/infra/config]:
212 * Definitions of how bots are organized on the waterfall,
213 how builds are triggered, which VMs or machines are used for the
214 builder itself, i.e. for compilation and scheduling swarmed tasks
215 on GPU hardware. See
216 [README.md](https://chromium.googlesource.com/chromium/src/+/master/infra/config/README.md)
217 in this directory for up to date information.
Kai Ninomiyaa6429fb32018-03-30 01:30:56218
Yuly Novikov8e92b172020-02-07 17:40:12219[chromium/src]: https://chromium.googlesource.com/chromium/src/
220[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
221[src/infra/config]: https://chromium.googlesource.com/chromium/src/+/master/infra/config
222[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json
223[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json
224[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
225[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
Yuly Novikov8e92b172020-02-07 17:40:12226[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py
227[mixins.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/mixins.pyl
228[waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/waterfalls.pyl
229[test_suites.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/test_suites.pyl
230[test_suite_exceptions.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/test_suite_exceptions.pyl
Kenneth Russell8a386d42018-06-02 09:48:01231[README for generate_buildbot_json.py]: ../../testing/buildbot/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:56232
Yuly Novikov8e92b172020-02-07 17:40:12233In the [`infradata/config`][infradata/config] workspace (Google internal only,
234sorry):
Kai Ninomiyaa6429fb32018-03-30 01:30:56235
Yuly Novikov8e92b172020-02-07 17:40:12236* [`gpu.star`][gpu.star]
237 * Defines a `chromium.tests.gpu` Swarming pool which contains all of the
238 specialized hardware, except some hardware shared with Chromium:
239 for example, the Windows and Linux NVIDIA
Kai Ninomiyaa6429fb32018-03-30 01:30:56240 bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
241 GPUs. New GPU hardware should be added to this pool.
Yuly Novikov8e92b172020-02-07 17:40:12242 * Also defines the GCEs, Mac VMs and Mac machines used for CI builders
243 on GPU and GPU.FYI waterfalls and trybots.
Yuly Novikov8e92b172020-02-07 17:40:12244* [`pools.cfg`][pools.cfg]
245 * Defines the Swarming pools for GCEs and Mac VMs used for manually
246 triggered trybots.
Kai Ninomiyaa6429fb32018-03-30 01:30:56247
248[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
Kenneth Russellfb27e2d2019-03-29 22:19:55249[gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/gpu.star
Yuly Novikov8e92b172020-02-07 17:40:12250[chromium.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/starlark/bots/chromium/chromium.star
251[pools.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/pools.cfg
Chris Blumeb42d6912019-05-15 01:27:57252[main.star]: https://chrome-internal.googlesource.com/infradata/config/+/master/main.star
Kenneth Russellfb27e2d2019-03-29 22:19:55253[vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/gce-provider/vms.cfg
Kai Ninomiyaa6429fb32018-03-30 01:30:56254
255## Walkthroughs of various maintenance scenarios
256
257This section describes various common scenarios that might arise when
258maintaining the GPU bots, and how they'd be addressed.
259
260### How to add a new test or an entire new step to the bots
261
262This is described in [Adding new tests to the GPU bots].
263
Yuly Novikov8e92b172020-02-07 17:40:12264[Adding new tests to the GPU bots]: https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing.md#Adding-New-Tests-to-the-GPU-Bots
Kai Ninomiyaa6429fb32018-03-30 01:30:56265
Jamie Madillf71bf712019-01-09 14:41:21266### How to set up new virtual machine instances
267
268The tests use virtual machines to build binaries and to trigger tests on
Yuly Novikov8e92b172020-02-07 17:40:12269physical hardware. VMs don't run any tests themselves. There are 3 types of
270bots:
Jamie Madillf71bf712019-01-09 14:41:21271
Yuly Novikov8e92b172020-02-07 17:40:12272* Builders - these bots build test binaries, upload them to storage and trigger
273 tester bots (see below). Builds must be done on the same OS on which the
274 tests will run, except for Android tests, which are built on Linux.
275* Testers - these bots trigger tests to execute in Swarming and merge results
276 from multiple shards. 2-core Linux GCEs are sufficient for this task.
277* Builder/testers - these are the combination of the above and have same OS
278 constraints as builders. All trybots are of this type, while for CI bots
279 it is optional.
Jamie Madillf71bf712019-01-09 14:41:21280
Yuly Novikov8e92b172020-02-07 17:40:12281The process is:
Jamie Madillf71bf712019-01-09 14:41:21282
Yuly Novikov8e92b172020-02-07 17:40:122831. Follow [go/request-chrome-resources](go/request-chrome-resources) to get
284 approval for the VMs. Use `GPU` project resource group.
285 See this [example ticket](http://crbug.com/1012805).
286 You'll need to determine how many VMs are required, which OSes, how many
287 cores and in which swarming pools they will be (see below for different
288 scenarios).
289 * If setting up a new GPU hardware pool, some VMs will also be needed
290 for manual trybots, usually 2 VMs as of this writing.
291 * Additional action is needed for Mac VMs, the GPU resource owner will
292 assign the bug to Labs to deploy them. See this
293 [example ticket](http://crbug.com/964355).
2941. Once GCE resource request is approved / Mac VMs are deployed, the VMs need
295 to be added to the right Swarming pools in a CL in the
296 [`infradata/config`][infradata/config] (Google internal) workspace.
297 1. GCEs for Windows CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43298 `luci-chromium-gpu-ci-win10-8` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12299 1. GCEs for Linux and Android CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43300 `luci-chromium-gpu-ci-xenial-8` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12301 1. VMs for Mac CI builders and builder/testers should be added to
Yuly Novikov55b23a62020-10-02 18:23:43302 `builderfull_gpu_ci_bots` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12303 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1166889).
304 1. GCEs for CI testers for all OSes should be added to
Yuly Novikov55b23a62020-10-02 18:23:43305 `luci-chromium-gpu-ci-xenial-2` group in [`gpu.star`][gpu.star].
Yuly Novikov8e92b172020-02-07 17:40:12306 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2016410).
307 1. GCEs and VMs for CQ and optional CQ GPU trybots for should be added to
308 a corresponding `gpu_try_bots` group in [`gpu.star`][gpu.star].
309 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1561384).
310 These trybots are "builderful", i.e. these GCEs can't be shared among
311 different bots. This is done in order to limit the number of concurrent
312 builds on these bots (until [crbug.com/949379](crbug.com/949379) is
313 fixed) to prevent oversubscribing GPU hardware.
314 `win_optional_gpu_tests_rel` is an exception, its GCEs come from
315 `luci-chromium-try-win10-*-8` groups in
316 [`chromium.star`][chromium.star], see
317 [CL](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1708723).
318 This can cause oversubscription to Windows GPU hardware, however,
319 Chrome Infra insisted on making this bot builderless due to frequent
320 interruptions they get from limiting the number of concurrent builds on
321 it, see discussion in
322 [CL](https://chromium-review.googlesource.com/c/chromium/src/+/1775098).
323 1. GCEs and VMs for manual GPU trybots should be added to a corresponding
324 pool in "Manually-triggered GPU trybots" in [`gpu.star`][gpu.star].
325 If adding a new pool, it should also be added to
326 [`pools.cfg`][pools.cfg].
327 [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2433332).
328 This is a different mechanism to limit the load on GPU hardware,
329 by having a small pool of GCEs which corresponds to some GPU hardware
330 resource, and all trybots that target this GPU hardware compete for
331 GCEs from this small pool.
332 1. Run [`main.star`][main.star] to regenerate
333 `configs/chromium-swarm/bots.cfg` and `configs/gce-provider/vms.cfg`.
334 Double-check your work there.
335 Note that previously [`vms.cfg`][vms.cfg] had to be edited manually.
336 Part of the difficulty was in choosing a zone. This should soon no
337 longer be necessary per [crbug.com/942301](http://crbug.com/942301),
338 but consult with the Chrome Infra team to find out which of the
339 [zones](https://cloud.google.com/compute/docs/regions-zones/) has
Yuly Novikov55b23a62020-10-02 18:23:43340 available capacity. This also can be checked on viceroy
341 [dashboard](https://viceroy.corp.google.com/chrome_infra/Quota/chrome?duration=7d).
Yuly Novikov8e92b172020-02-07 17:40:12342 1. Get this reviewed and landed. This step associates the VM or pool of VMs
343 with the bot's name on the waterfall for "builderful" bots or increases
344 swarmed pool capacity for "builderless" bots.
345 Note: CR+1 is not sticky in this repo, so you'll have to ping for
346 re-review after every change, like rebase.
Jamie Madillf71bf712019-01-09 14:41:21347
Kenneth Russell3a8e5c022018-05-04 21:14:49348### How to add a new tester bot to the chromium.gpu.fyi waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56349
350When deploying a new GPU configuration, it should be added to the
351chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved
352for those GPUs which are tested on the commit queue. (Some of the bots violate
353this rule – namely, the Debug bots – though we should strive to eliminate these
354differences.) Once the new configuration is ready to be fully deployed on
355tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers
356changed to mirror them.
357
358In order to add Release and Debug waterfall bots for a new configuration,
359experience has shown that at least 4 physical machines are needed in the
360swarming pool. The reason is that the tests all run in parallel on the Swarming
361cluster, so the load induced on the swarming bots is higher than it would be
Kenneth Russell9618adde2018-05-03 03:16:05362if the tests were run strictly serially.
Kai Ninomiyaa6429fb32018-03-30 01:30:56363
Kenneth Russell9618adde2018-05-03 03:16:05364With these prerequisites, these are the steps to add a new (swarmed) tester bot.
365(Actually, pair of bots -- Release and Debug. If deploying just one or the
366other, ignore the other configuration.) These instructions assume that you are
367reusing one of the existing builders, like [`GPU FYI Win Builder`][GPU FYI Win
368Builder].
Kai Ninomiyaa6429fb32018-03-30 01:30:56369
3701. Work with the Chrome Infrastructure Labs team to get the (minimum 4)
371 physical machines added to the Swarming pool. Use
372 [chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots`
373 to determine the PCI IDs of the GPUs in the bots. (These instructions will
374 need to be updated for Android bots which don't have PCI buses.)
Kenneth Russell9618adde2018-05-03 03:16:05375
John Budorickb2ff2242019-11-14 17:35:59376 1. Make sure to add these new machines to the chromium.tests.gpu Swarming
Yuly Novikov8e92b172020-02-07 17:40:12377 pool by creating a CL against [`gpu.star`][gpu.star] in the
378 [`infradata/config`][infradata/config] (Google internal) workspace.
379 Git configure your user.email to @google.com if necessary. Here is one
380 [example CL](https://chrome-internal-review.googlesource.com/913528)
381 and a
382 [second example](https://chrome-internal-review.googlesource.com/1111456).
Kenneth Russell9618adde2018-05-03 03:16:05383
Yuly Novikov8e92b172020-02-07 17:40:12384 1. Run [`main.star`][main.star] to regenerate
385 `configs/chromium-swarm/bots.cfg`. Double-check your work there.
Kenneth Russellfb27e2d2019-03-29 22:19:55386
3871. Allocate new virtual machines for the bots as described in [How to set up
388 new virtual machine
389 instances](#How-to-set-up-new-virtual-machine-instances).
Kenneth Russell9618adde2018-05-03 03:16:05390
Kenneth Russell9618adde2018-05-03 03:16:053911. Create a CL in the Chromium workspace which does the following. Here's an
Yuly Novikov8e92b172020-02-07 17:40:12392 [example CL](https://chromium-review.googlesource.com/c/chromium/src/+/1752291).
393 1. Adds the new machines to [`waterfalls.pyl`][waterfalls.pyl] directly or
394 to [`mixins.pyl`][mixins.pyl], referencing the new mixin in
395 [`waterfalls.pyl`][waterfalls.pyl].
Kai Ninomiyaa6429fb32018-03-30 01:30:56396 1. The swarming dimensions are crucial. These must match the GPU and
397 OS type of the physical hardware in the Swarming pool. This is what
398 causes the VMs to spawn their tests on the correct hardware. Make
John Budorickb2ff2242019-11-14 17:35:59399 sure to use the chromium.tests.gpu pool, and that the new machines
400 were specifically added to that pool.
Kai Ninomiyaa6429fb32018-03-30 01:30:56401 1. Make triply sure that there are no collisions between the new
402 hardware you're adding and hardware already in the Swarming pool.
403 For example, it used to be the case that all of the Windows NVIDIA
404 bots ran the same OS version. Later, the Windows 8 flavor bots were
405 added. In order to avoid accidentally running tests on Windows 8
406 when Windows 7 was intended, the OS in the swarming dimensions of
407 the Win7 bots had to be changed from `win` to
408 `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our
409 data center). Similarly, the Win8 bots had to have a very precise
410 OS description (`Windows-2012ServerR2-SP0`).
Kenneth Russell9618adde2018-05-03 03:16:05411 1. If you're deploying a new bot that's similar to another existing
Kenneth Russell8a386d42018-06-02 09:48:01412 configuration, please search around in
Yuly Novikov8e92b172020-02-07 17:40:12413 [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] for
414 references to the other bot's name and see if your new bot needs
415 to be added to any exclusion lists. For example, some of the tests
416 don't run on certain Win bots because of missing OpenGL extensions.
417 1. Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to
418 regenerate `src/testing/buildbot/chromium.gpu.fyi.json`.
419 1. Updates [`ci.star`][ci.star] and its related generated files
Brian Sheedya7bd47b2020-05-12 01:10:01420 [`cr-buildbucket.cfg`][cr-buildbucket.cfg],
421 [`luci-scheduler.cfg`][luci-scheduler.cfg], and
422 ['luci-milo.cfg`][luci-milo.cfg]:
Yuly Novikov8e92b172020-02-07 17:40:12423 * Use the appropriate definition for the type of the bot being added,
424 for example, `ci.gpu_fyi_thin_tester()` should be used for all CI
425 tester bots on GPU FYI waterfall.
426 * Make sure to set `triggered_by` property to the builder which
427 triggers the testers (like `'GPU Win FYI Builder'`).
Brian Sheedya7bd47b2020-05-12 01:10:01428 * Include a `ci.console_view_entry` for the builder's
429 `console_view_entry` argument. Look at the short names and
430 categories to try and come up with a reasonable organization.
Yuly Novikov8e92b172020-02-07 17:40:12431 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
432 generated files. Double-check your work there.
Kenneth Russell9618adde2018-05-03 03:16:05433 1. If you were adding a new builder, you would need to also add the new
Yuly Novikov55b23a62020-10-02 18:23:43434 machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
Kenneth Russell139881b2018-05-04 00:45:20435
4361. After the Chromium-side CL lands it will take some time for all of
437 the configuration changes to be picked up by the system. The bot
Kenneth Russell4d1bb4482018-05-09 23:36:37438 will probably be in a red or purple state, claiming that it can't
439 find its configuration. (It might also be in an "empty" state, not
440 running any jobs at all.)
Kenneth Russell139881b2018-05-04 00:45:20441
Kenneth Russell4d1bb4482018-05-09 23:36:374421. *After* the Chromium-side CL lands and the bot is on the console, create a CL
443 in the [`tools/build`][tools/build] workspace which does the
Kenneth Russell139881b2018-05-04 00:45:20444 following. Here's an [example
445 CL](https://chromium-review.googlesource.com/1041145).
Yuly Novikov8e92b172020-02-07 17:40:12446 1. Adds the new bot to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
Yuly Novikov55b23a62020-10-02 18:23:43447 `recipes/recipe_modules/chromium_tests/builders/`. Make sure to set the
Kenneth Russell139881b2018-05-04 00:45:20448 `serialize_tests` property to `True`. This is specified for waterfall
449 bots, but not trybots, and helps avoid overloading the physical
450 hardware. Double-check the `BUILD_CONFIG` and `parent_buildername`
451 properties for each. They must match the Release/Debug flavor of the
452 builder, like `GPU FYI Win Builder` vs. `GPU FYI Win Builder (dbg)`.
453 1. Get this reviewed and landed. This step tells the Chromium recipe about
454 the newly-deployed waterfall bot, so it knows which JSON file to load
455 out of src/testing/buildbot and which entry to look at.
Yuly Novikov8e92b172020-02-07 17:40:12456 1. Sometimes it is necessary to retrain recipe expectations
Yuly Novikov55b23a62020-10-02 18:23:43457 (`recipes/recipes.py test train`). This is usually needed only
Yuly Novikov8e92b172020-02-07 17:40:12458 if the bot adds untested code flow in a recipe, but it's something
459 to watch out for if your CL fails presubmit for some reason.
Kenneth Russell139881b2018-05-04 00:45:20460
Kenneth Russell4d1bb4482018-05-09 23:36:374611. Note that it is crucial that the bot be deployed before hooking it up in the
462 tools/build workspace. In the new LUCI world, if the parent builder can't
463 find its child testers to trigger, that's a hard error on the parent. This
464 will cause the builders to fail. You can and should prepare the tools/build
465 CL in advance, but make sure it doesn't land until the bot's on the console.
Kai Ninomiyaa6429fb32018-03-30 01:30:56466
Yuly Novikov8e92b172020-02-07 17:40:124671. If the number of physical machines for the new bot permits, you should also
468 add a manually-triggered trybot at the same time that the CI bot is added.
469 This is described in [How to add a new manually-triggered trybot].
470
Brian Sheedy1ac3f672021-01-06 23:43:03471While the above instructions assume that an existing parent builder will be
472be used, a new one can be set up by performing a modified version of the steps:
473
4741. Make a [`tools/build`][tools/build] CL that adds the config for *only* the
475 new builder and land it.
4761. Make and land Chromium CL that makes the above changes in addition to the
477 following:
478 1. Add the new builder to the necessary `//infra/config` files in the same
479 way as the tester.
480 1. Add the new builder to [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
4811. Make a [`tools/build`][tools/build] CL that adds the config for *only* the
482 new tester and land it.
483
484Attempting to set up the builder/tester pair without first landing the
485[`tools/build`][tools/build] CL for the new builder will result in things
486breaking as seen in [this bug][misconfigured builder bug].
487
Yuly Novikov8e92b172020-02-07 17:40:12488[How to add a new manually-triggered trybot]: https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing_bot_details.md#How-to-add-a-new-manually_triggered-trybot
489
Brian Sheedya7bd47b2020-05-12 01:10:01490[ci.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/subprojects/ci.star
Yuly Novikov8e92b172020-02-07 17:40:12491[chromium.gpu.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/chromium.gpu.star
492[chromium.gpu.fyi.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/chromium.gpu.fyi.star
493[cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/cr-buildbucket.cfg
494[luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/luci-scheduler.cfg
495[luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generated/luci-milo.cfg
496[GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder
Brian Sheedy1ac3f672021-01-06 23:43:03497[misconfigured builder bug]: https://bugs.chromium.org/p/chromium/issues/detail?id=1163657
Kai Ninomiyaa6429fb32018-03-30 01:30:56498
Kenneth Russell3a8e5c022018-05-04 21:14:49499### How to start running tests on a new GPU type on an existing try bot
Kai Ninomiyaa6429fb32018-03-30 01:30:56500
Yuly Novikov8e92b172020-02-07 17:40:12501Let's say that you want to cause the `win10_chromium_x64_rel_ng` try bot to run
502tests on CoolNewGPUType in addition to the types it currently runs (as of this
503writing only NVIDIA). To do this:
Kai Ninomiyaa6429fb32018-03-30 01:30:56504
Yuly Novikov8e92b172020-02-07 17:40:125051. Make sure there is enough hardware capacity using the available tools to
506 report utilization of the Swarming pool.
5071. Deploy Release and Debug testers on the `chromium.gpu` waterfall, following
508 the instructions for the `chromium.gpu.fyi` waterfall above. Make sure
509 the flakiness on the new bots is comparable to existing `chromium.gpu` bots
510 before proceeding.
5111. Create a CL in the [`tools/build`][tools/build] workspace, adding the new
512 Release tester to `win10_chromium_x64_rel_ng`'s `bot_ids` list
Yuly Novikov55b23a62020-10-02 18:23:43513 in `recipes/recipe_modules/chromium_tests/trybots.py`. Rerun
514 `recipes/recipes.py test train`.
Yuly Novikov8e92b172020-02-07 17:40:125151. Once the above CL lands, the commit queue will **immediately** start
Kai Ninomiyaa6429fb32018-03-30 01:30:56516 running tests on the CoolNewGPUType configuration. Be vigilant and make
517 sure that tryjobs are green. If they are red for any reason, revert the CL
518 and figure out offline what went wrong.
519
Kenneth Russell3a8e5c022018-05-04 21:14:49520### How to add a new manually-triggered trybot
521
Yuly Novikov8e92b172020-02-07 17:40:12522Manually-triggered trybots are needed for investigating failures on a GPU type
523which doesn't have a corresponding CQ trybot (due to lack of GPU resources).
524Even for GPU types that have CQ trybots, it is convenient to have
525manually-triggered trybots as well, since the CQ trybot often runs on more than
526one GPU type, or some test suites which run on CI bot can be disabled on CQ
527trybot (when the CQ bot mirrors a
528[fake bot](https://chromium.googlesource.com/chromium/src/+/master/docs/gpu/gpu_testing_bot_details.md#how-to-add-a-new-try-bot-that-runs-a-subset-of-tests-or-extra-tests)).
529Thus, all CI bots in `chromium.gpu` and `chromium.gpu.fyi` have corresponding
530manually-triggered trybots, except a few which don't have enough hardware
531to support it. A manually-triggered trybot should be added at the same time
532a CI bot is added.
Kenneth Russell3a8e5c022018-05-04 21:14:49533
534Here are the steps to set up a new trybot which runs tests just on one
535particular GPU type. Let's consider that we are adding a manually-triggered
536trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot
Yuly Novikov8e92b172020-02-07 17:40:12537`gpu-fyi-try-win7-nvidia-rel-64`.
Kenneth Russell3a8e5c022018-05-04 21:14:49538
Yuly Novikov8e92b172020-02-07 17:40:125391. If there already exist some manually-triggered trybot which runs tests on
540 the same group of machines (i.e. same GPU, OS and driver), the new trybot
541 will have to share the VMs with it. Otherwise, create a new pool of VMs for
542 the new hardware and allocate the VMs as described in
543 [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances),
544 following the "Manually-triggered GPU trybots" instructions.
Kenneth Russell3a8e5c022018-05-04 21:14:49545
Brian Sheedya7bd47b2020-05-12 01:10:015461. Create a CL in the Chromium workspace which does the following. Here's a
547 [reference CL](https://chromium-review.googlesource.com/c/chromium/src/+/2191276)
Yuly Novikov8e92b172020-02-07 17:40:12548 exemplifying the new "GCE pool per GPU hardware pool" way.
549 1. Updates [`gpu.try.star`][gpu.try.star] and its related generated file
550 [`cr-buildbucket.cfg`][cr-buildbucket.cfg]:
551 * Add the new trybot with the right `builder` define and VMs pool.
552 For `gpu-fyi-try-win7-nvidia-rel-64` this would be
553 `gpu_win_builder()` and `luci.chromium.gpu.win7.nvidia.try`.
Yuly Novikov8e92b172020-02-07 17:40:12554 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
555 generated files. Double-check your work there.
556 1. Adds the new trybot to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
557 and [`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl].
558 Use the same mixin as does the builder for the CI bot this trybot
559 mirrors, in case of `gpu-fyi-try-win7-nvidia-rel-64` this is
560 `GPU FYI Win x64 Builder` and thus `gpu_fyi_tests_release_trybot`.
Kenneth Russell3a8e5c022018-05-04 21:14:49561 1. Get this CL reviewed and landed.
562
5631. Create a CL in the [`tools/build`][tools/build] workspace which does the
564 following. Here's an [example
Yuly Novikov8e92b172020-02-07 17:40:12565 CL](https://chromium-review.googlesource.com/c/chromium/tools/build/+/1979113).
Kenneth Russell3a8e5c022018-05-04 21:14:49566
567 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in
Yuly Novikov55b23a62020-10-02 18:23:43568 `recipes/recipe_modules/chromium_tests/tests/trybots.py`. Create this
Kenneth Russell3a8e5c022018-05-04 21:14:49569 section after the "Optional GPU bots" section for the appropriate
570 tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`,
571 `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot
572 mirror the appropriate waterfall bot; in this case, the buildername to
Yuly Novikov8e92b172020-02-07 17:40:12573 mirror is `GPU FYI Win x64 Builder` and the tester is
574 `Win7 FYI x64 Release (NVIDIA)`.
Kenneth Russell3a8e5c022018-05-04 21:14:49575 1. Get this reviewed and landed. This step tells the Chromium recipe about
576 the newly-deployed trybot, so it knows which JSON file to load out of
Yuly Novikov8e92b172020-02-07 17:40:12577 `src/testing/buildbot` and which entry to look at to understand which
Kenneth Russell3a8e5c022018-05-04 21:14:49578 tests to run and on what physical hardware.
Yuly Novikov8e92b172020-02-07 17:40:12579 1. It may be necessary to retrain recipe expectations for
580 [`tools/build`][tools/build] workspace CLs
Yuly Novikov55b23a62020-10-02 18:23:43581 (`recipes/recipes.py test train`). This shouldn't be necessary
Yuly Novikov8e92b172020-02-07 17:40:12582 for just adding a manually triggered trybot, but it's something to
583 watch out for if your CL fails presubmit for some reason.
Kenneth Russell3a8e5c022018-05-04 21:14:49584
Kenneth Russellfc566142018-06-26 22:34:15585At this point the new trybot should automatically show up in the
586"Choose tryjobs" pop-up in the Gerrit UI, under the
587`luci.chromium.try` heading, because it was deployed via LUCI. It
588should be possible to send a CL to it.
Kenneth Russell3a8e5c022018-05-04 21:14:49589
Kenneth Russellfc566142018-06-26 22:34:15590(It should not be necessary to modify buildbucket.config as is
591mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the
592chrome-infra team if this doesn't work as expected.)
Kenneth Russell3a8e5c022018-05-04 21:14:49593
Brian Sheedya7bd47b2020-05-12 01:10:01594[gpu.try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/subprojects/gpu.try.star
Yuly Novikov8e92b172020-02-07 17:40:12595[luci.chromium.try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/luci.chromium.try.star
596[tryserver.chromium.win.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/consoles/tryserver.chromium.win.star
Kenneth Russell3a8e5c022018-05-04 21:14:49597
598
Jamie Madillda894ce2019-04-08 17:19:17599### How to add a new try bot that runs a subset of tests or extra tests
Kenneth Russell3a8e5c022018-05-04 21:14:49600
Jamie Madillda894ce2019-04-08 17:19:17601Several projects (ANGLE, Dawn) run custom tests using the Chromium recipes. They
602use try bot bot configs that run subsets of Chromium or additional slower tests
603that can't be run on the main CQ.
Kai Ninomiyaa6429fb32018-03-30 01:30:56604
Jamie Madillda894ce2019-04-08 17:19:17605These try bots are a little different because they mirror waterfall bots that
606don't actually exist. The waterfall bots' specifications exist only to tell
607these try bots which tests to run.
Kai Ninomiyaa6429fb32018-03-30 01:30:56608
Jamie Madillda894ce2019-04-08 17:19:17609Let's say that you intended to add a new such custom try bot on Windows. Call it
610`win-myproject-rel` for example. You will need to add a "fake" mirror bot for
Yuly Novikov8e92b172020-02-07 17:40:12611each GPU family on which you want to run the tests. For a GPU type of
Jamie Madillda894ce2019-04-08 17:19:17612"CoolNewGPUType" in this example you could add a "fake" bot named "MyProject GPU
613Win10 Release (CoolNewGPUType)".
Kai Ninomiyaa6429fb32018-03-30 01:30:56614
Yuly Novikov8e92b172020-02-07 17:40:126151. Allocate new virtual machines for the bots as described in
616 [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances).
6171. Make sure there is enough hardware capacity using the available tools to
618 report utilization of the Swarming pool.
Jamie Madillda894ce2019-04-08 17:19:176191. Create a CL in the Chromium workspace the does the following. Here's an
Yuly Novikov8e92b172020-02-07 17:40:12620 outdated [example CL](https://crrev.com/c/1554296).
Jamie Madillda894ce2019-04-08 17:19:17621 1. Add your new bot (for example, "MyProject GPU Win10 Release
Kai Ninomiyaa6429fb32018-03-30 01:30:56622 (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
Yuly Novikov8e92b172020-02-07 17:40:12623 [`waterfalls.pyl`][waterfalls.pyl].
624 1. Add your new bot to
625 [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
626 in the list of `get_bots_that_do_not_actually_exist` section.
627 1. Re-run
628 [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
629 to regenerate the JSON files.
630 1. Update [`scheduler-noop-jobs.star`][scheduler-noop-jobs.star] to
631 include "MyProject GPU Win10 Release (CoolNewGPUType)".
632 1. Update [`try.star`][try.star] and desired consoles to include
633 `win-myproject-rel`.
634 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the
635 generated files: [`luci-milo.cfg`][luci-milo.cfg],
636 [`luci-scheduler.cfg`][luci-scheduler.cfg],
637 [`cr-buildbucket.cfg`][cr-buildbucket.cfg]. Double-check your work
638 there.
Yuly Novikov55b23a62020-10-02 18:23:43639 1. Update [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
Yuly Novikov8e92b172020-02-07 17:40:12640 to include `win-myproject-rel`.
Jamie Madillda894ce2019-04-08 17:19:176411. *After* the Chromium-side CL lands and the bot is on the console, create a CL
642 in the [`tools/build`][tools/build] workspace which does the
643 following. Here's an [example CL](https://crrev.com/c/1554272).
644 1. Adds "MyProject GPU Win10 Release
645 (CoolNewGPUType)" to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
Yuly Novikov55b23a62020-10-02 18:23:43646 `recipes/recipe_modules/chromium_tests/builders/`. You can copy a similar
Jamie Madillda894ce2019-04-08 17:19:17647 step.
648 1. Adds `win-myproject-rel` to [`trybots.py`][trybots.py] in the same folder.
649 This is where you associate "MyProject GPU Win10 Release
650 (CoolNewGPUType)" with `win-myproject-rel`. See the sample CL for an example.
651 1. Get this reviewed and landed. This step tells the Chromium recipe about
652 the newly-deployed waterfall bot, so it knows which JSON file to load
Yuly Novikov8e92b172020-02-07 17:40:12653 out of `src/testing/buildbot` and which entry to look at.
Jamie Madillda894ce2019-04-08 17:19:176541. After your CLs land you should be able to find and run `win-myproject-rel` on CLs
655 using Choose Trybots in Gerrit.
Kai Ninomiyaa6429fb32018-03-30 01:30:56656
Yuly Novikov8e92b172020-02-07 17:40:12657[scheduler-noop-jobs.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/generators/scheduler-noop-jobs.star
Brian Sheedya7bd47b2020-05-12 01:10:01658[try.star]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/subprojects/try.star
Yuly Novikov8e92b172020-02-07 17:40:12659
660
Yuly Novikov3fbea992019-06-28 18:25:42661### How to test and deploy a driver and/or OS update
Kai Ninomiyaa6429fb32018-03-30 01:30:56662
Yuly Novikov3fbea992019-06-28 18:25:42663Let's say that you want to roll out an update to the graphics drivers or the OS
664on one of the configurations like the Linux NVIDIA bots. In order to verify
665that the new driver or OS won't destabilize Chromium's commit queue,
666it's necessary to run the new driver or OS on one of the waterfalls for a day
667or two to make sure the tests are reliably green before rolling out the driver
668or OS update. To do this:
Kai Ninomiyaa6429fb32018-03-30 01:30:56669
Kenneth Russell9618adde2018-05-03 03:16:056701. Make sure that all of the current Swarming jobs for this OS and GPU
Yuly Novikov3fbea992019-06-28 18:25:42671 configuration are targeted at the "stable" version of the driver and the OS
Yuly Novikov8e92b172020-02-07 17:40:12672 in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
Yuly Novikov3fbea992019-06-28 18:25:426731. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of
674 the physical machines already in the Swarming pool upgraded to the new
675 version of the driver or the OS.
Kenneth Russell9618adde2018-05-03 03:16:056761. If an "experimental" version of this bot doesn't yet exist, follow the
677 instructions above for [How to add a new tester bot to the chromium.gpu.fyi
678 waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall)
679 to deploy one.
Yuly Novikov3fbea992019-06-28 18:25:426801. Have this experimental bot target the new version of the driver or the OS
Yuly Novikov8e92b172020-02-07 17:40:12681 in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl].
682 [Sample CL][sample driver cl].
Kenneth Russell9618adde2018-05-03 03:16:056831. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
Brian Sheedy1cea4d42019-08-12 18:09:49684 it'll be necessary to follow the instructions on
685 [updating Gold baselines (step #4)][updating gold baselines].
Kenneth Russell9618adde2018-05-03 03:16:056861. Watch the new machine for a day or two to make sure it's stable.
Brian Sheedy811cca72020-05-21 21:34:146871. When it is, add the experimental driver/OS to the `_stable` mixin using the
688 swarming OR operator `|`. For example:
Yuly Novikov3fbea992019-06-28 18:25:42689
Yuly Novikovf13babb2019-04-24 23:46:57690 ```
Brian Sheedy811cca72020-05-21 21:34:14691 'win10_intel_hd_630_stable': {
692 'swarming': {
693 'dimensions': {
694 'gpu': '8086:5912-26.20.100.7870|8086:5912-26.20.100.8141',
695 'os': 'Windows-10',
696 'pool': 'chromium.tests.gpu',
697 },
Yuly Novikov3fbea992019-06-28 18:25:42698 },
Yuly Novikov3fbea992019-06-28 18:25:42699 }
700 ```
701
Brian Sheedy811cca72020-05-21 21:34:14702 This will cause tests triggered using the `_stable` mixin to run on either
703 the old stable dimension or the experimental/new stable dimension.
704
705 **NOTE** There is a hard cap of 8 combinations in swarming, so you can only
706 use the OR operator in up to 3 dimensions if each dimension only has two
707 options. More than two options per dimension is allowed as long as the total
708 number of combinations is 8 or less.
Kenneth Russell384a1732019-03-16 02:36:027091. After it lands, ask the Chrome Infrastructure Labs team to roll out the
Kenneth Russell9618adde2018-05-03 03:16:05710 driver update across all of the similarly configured bots in the swarming
711 pool.
7121. If necessary, update pixel test expectations and remove the suppressions
Kai Ninomiyaa6429fb32018-03-30 01:30:56713 added above.
Brian Sheedy811cca72020-05-21 21:34:147141. Remove the old driver or OS version from the `_stable` mixin, leaving just
715 the new stable version.
Kai Ninomiyaa6429fb32018-03-30 01:30:56716
Kenneth Russell9618adde2018-05-03 03:16:05717Note that we leave the experimental bot in place. We could reclaim it, but it
718seems worthwhile to continuously test the "next" version of graphics drivers as
719well as the current stable ones.
Kai Ninomiyaa6429fb32018-03-30 01:30:56720
Brian Sheedy1cea4d42019-08-12 18:09:49721[sample driver cl]: https://chromium-review.googlesource.com/c/chromium/src/+/1726875
Brian Sheedy1cea4d42019-08-12 18:09:49722[updating gold baselines]: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/gpu/pixel_wrangling.md#how-to-keep-the-bots-green
Kai Ninomiyaa6429fb32018-03-30 01:30:56723
724## Credentials for various servers
725
726Working with the GPU bots requires credentials to various services: the isolate
727server, the swarming server, and cloud storage.
728
729### Isolate server credentials
730
731To upload and download isolates you must first authenticate to the isolate
732server. From a Chromium checkout, run:
733
734* `./src/tools/swarming_client/auth.py login
735 --service=https://isolateserver.appspot.com`
736
737This will open a web browser to complete the authentication flow. A @google.com
738email address is required in order to properly authenticate.
739
740To test your authentication, find a hash for a recent isolate. Consult the
741instructions on [Running Binaries from the Bots Locally] to find a random hash
Takuto Ikutaf5333252019-11-06 16:07:08742from a target like `gl_tests`. Then run the following:
Kai Ninomiyaa6429fb32018-03-30 01:30:56743
744[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally
745
746If authentication succeeded, this will silently download a file called
747`delete_me` into the current working directory. If it failed, the script will
748report multiple authentication errors. In this case, use the following command
749to log out and then try again:
750
751* `./src/tools/swarming_client/auth.py logout
752 --service=https://isolateserver.appspot.com`
753
754### Swarming server credentials
755
756The swarming server uses the same `auth.py` script as the isolate server. You
757will need to authenticate if you want to manually download the results of
758previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`
759to re-run a remote job on your local workstation. Follow the instructions
760above, replacing the service with `https://chromium-swarm.appspot.com`.
761
762### Cloud storage credentials
763
764Authentication to Google Cloud Storage is needed for a couple of reasons:
765uploading pixel test results to the cloud, and potentially uploading and
766downloading builds as well, at least in Debug mode. Use the copy of gsutil in
767`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage
768instructions] to authenticate. You must use your @google.com email address and
769be a member of the Chrome GPU team in order to receive read-write access to the
770appropriate cloud storage buckets. Roughly:
771
7721. Run `gsutil config`
7732. Copy/paste the URL into your browser
7743. Log in with your @google.com account
7754. Allow the app to access the information it requests
7765. Copy-paste the resulting key back into your Terminal
7776. Press "enter" when prompted for a project-id (i.e., leave it empty)
778
779At this point you should be able to write to the cloud storage bucket.
780
781Navigate to
782<https://console.developers.google.com/storage/chromium-gpu-archive> to view
783the contents of the cloud storage bucket.
784
785[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil