Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 1 | # GPU Bot Details |
| 2 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 3 | This page describes in detail how the GPU bots are set up, which files affect |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 4 | their configuration, and how to both modify their behavior and add new bots. |
| 5 | |
| 6 | [TOC] |
| 7 | |
| 8 | ## Overview of the GPU bots' setup |
| 9 | |
| 10 | Chromium's GPU bots, compared to the majority of the project's test machines, |
| 11 | are physical pieces of hardware. When end users run the Chrome browser, they |
| 12 | are almost surely running it on a physical piece of hardware with a real |
| 13 | graphics processor. There are some portions of the code base which simply can |
| 14 | not be exercised by running the browser in a virtual machine, or on a software |
| 15 | implementation of the underlying graphics libraries. The GPU bots were |
| 16 | developed and deployed in order to cover these code paths, and avoid |
| 17 | regressions that are otherwise inevitable in a project the size of the Chromium |
| 18 | browser. |
| 19 | |
| 20 | The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi] |
| 21 | waterfalls, and various tryservers, as described in [Using the GPU Bots]. |
| 22 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 23 | [chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console |
| 24 | [chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 25 | [Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots |
| 26 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 27 | All of the physical hardware for the bots lives in the Swarming pool, and most |
John Budorick | b2ff224 | 2019-11-14 17:35:59 | [diff] [blame] | 28 | of it in the chromium.tests.gpu Swarming pool. The waterfall bots are simply |
| 29 | virtual machines which spawn Swarming tasks with the appropriate tags to get |
| 30 | them to run on the desired GPU and operating system type. So, for example, the |
| 31 | [Win10 x64 Release (NVIDIA)] bot is actually a virtual machine which spawns all |
| 32 | of its jobs with the Swarming parameters: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 33 | |
Takuto Ikuta | 4fd6b479 | 2019-08-19 21:37:31 | [diff] [blame] | 34 | [Win10 x64 Release (NVIDIA)]: https://ci.chromium.org/p/chromium/builders/ci/Win10%20x64%20Release%20%28NVIDIA%29 |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 35 | |
| 36 | ```json |
| 37 | { |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 38 | "gpu": "nvidia-quadro-p400-win10-stable", |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 39 | "os": "Windows-10", |
John Budorick | b2ff224 | 2019-11-14 17:35:59 | [diff] [blame] | 40 | "pool": "chromium.tests.gpu" |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 41 | } |
| 42 | ``` |
| 43 | |
| 44 | Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient |
| 45 | to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows |
| 46 | 7-like NVIDIA bots in the pool, which necessitates the OS specifier.) |
| 47 | |
| 48 | Details about the bots can be found on [chromium-swarm.appspot.com] and by |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 49 | using `src/tools/luci-go/swarming`, for example `swarming bots`. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 50 | If you are authenticated with @google.com credentials you will be able to make |
| 51 | queries of the bots and see, for example, which GPUs are available. |
| 52 | |
| 53 | [chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/ |
| 54 | |
| 55 | The waterfall bots run tests on a single GPU type in order to make it easier to |
| 56 | see regressions or flakiness that affect only a certain type of GPU. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 57 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 58 | The tryservers like `win10_chromium_x64_rel_ng` which include GPU tests, on the other |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 59 | hand, run tests on more than one GPU type. As of this writing, the Windows |
| 60 | tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on |
| 61 | Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply |
| 62 | by *mirroring* how one or more waterfall bots work. This is an inherent |
| 63 | property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate |
| 64 | differences in behavior between the tryservers and waterfall bots. Since the |
| 65 | tryservers mirror waterfall bots, if the waterfall bot is working, the |
| 66 | tryserver must almost inherently be working as well. |
| 67 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 68 | [chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipes/chromium_trybot.py |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 69 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 70 | There are some GPU configurations on the waterfall backed by only one machine, |
| 71 | or a very small number of machines in the Swarming pool. A few examples are: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 72 | |
| 73 | <!-- XXX: update this list --> |
Yves Gerey | a702f622 | 2019-01-24 11:07:30 | [diff] [blame] | 74 | * [Mac Pro Release (AMD)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Mac%20Pro%20FYI%20Release%20%28AMD%29) |
Yves Gerey | a702f622 | 2019-01-24 11:07:30 | [diff] [blame] | 75 | * [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28AMD%20R7%20240%29/) |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 76 | |
| 77 | There are a couple of reasons to continue to support running tests on a |
| 78 | specific machine: it might be too expensive to deploy the required multiple |
| 79 | copies of said hardware, or the configuration might not be reliable enough to |
| 80 | begin scaling it up. |
| 81 | |
| 82 | ## Adding a new isolated test to the bots |
| 83 | |
| 84 | Adding a new test step to the bots requires that the test run via an isolate. |
| 85 | Isolates describe both the binary and data dependencies of an executable, and |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 86 | are the underpinning of how the Swarming system works. See the [LUCI] documentation for |
| 87 | background on [Isolates] and [Swarming]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 88 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 89 | [LUCI]: https://github.com/luci/luci-py |
| 90 | [Isolates]: https://github.com/luci/luci-py/blob/master/appengine/isolate/doc/README.md |
| 91 | [Swarming]: https://github.com/luci/luci-py/blob/master/appengine/swarming/doc/README.md |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 92 | |
| 93 | ### Adding a new isolate |
| 94 | |
| 95 | 1. Define your target using the `template("test")` template in |
Takuto Ikuta | f533325 | 2019-11-06 16:07:08 | [diff] [blame] | 96 | [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 97 | [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex |
| 98 | example which invokes a series of scripts which finally launches the |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 99 | browser, see `telemetry_gpu_integration_test` in [`chrome/test/BUILD.gn`][chrome/test/BUILD.gn]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 100 | 2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to |
| 101 | your target. Find a similar target to yours in order to determine the |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 102 | `type`. The type is referenced in [`src/tools/mb/mb.py`][mb.py]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 103 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 104 | [testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/main/testing/test.gni |
| 105 | [gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/main/gpu/BUILD.gn |
| 106 | [chrome/test/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/main/chrome/test/BUILD.gn |
| 107 | [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/gn_isolate_map.pyl |
| 108 | [mb.py]: https://chromium.googlesource.com/chromium/src/+/main/tools/mb/mb.py |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 109 | |
| 110 | At this point you can build and upload your isolate to the isolate server. |
| 111 | |
| 112 | See [Isolated Testing for SWEs] for the most up-to-date instructions. These |
| 113 | instructions are a copy which show how to run an isolate that's been uploaded |
| 114 | to the isolate server on your local machine rather than on Swarming. |
| 115 | |
| 116 | [Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes |
| 117 | |
| 118 | If `cd`'d into `src/`: |
| 119 | |
| 120 | 1. `./tools/mb/mb.py isolate //out/Release [target name]` |
| 121 | * For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests` |
Junji Watanabe | 16030002 | 2021-09-27 03:09:53 | [diff] [blame] | 122 | 1. `./tools/luci-go/isolate batcharchive -cas-instance chromium-swarm out/Release/[target name].isolated.gen.json` |
| 123 | * For example: `./tools/luci-go/isolate batcharchive -cas-instance chromium-swarm out/Release/angle_end2end_tests.isolated.gen.json` |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 124 | See the section below on [isolate server credentials](#Isolate-server-credentials). |
| 125 | |
| 126 | ### Adding your new isolate to the tests that are run on the bots |
| 127 | |
| 128 | See [Adding new steps to the GPU bots] for details on this process. |
| 129 | |
| 130 | [Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots |
| 131 | |
| 132 | ## Relevant files that control the operation of the GPU bots |
| 133 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 134 | In the [`tools/build`][tools/build] workspace: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 135 | |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 136 | * `recipes/recipe_modules/chromium_tests/`: |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 137 | * [`chromium_gpu.py`][chromium_gpu.py] and |
| 138 | [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] define the following for |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 139 | each builder and tester: |
| 140 | * How the workspace is checked out (e.g., this is where top-of-tree |
| 141 | ANGLE is specified) |
| 142 | * The build configuration (e.g., this is where 32-bit vs. 64-bit is |
| 143 | specified) |
| 144 | * Various gclient defines (like compiling in the hardware-accelerated |
| 145 | video codecs, and enabling compilation of certain tests, like the |
| 146 | dEQP tests, that can't be built on all of the Chromium builders) |
| 147 | * Note that the GN configuration of the bots is also controlled by |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 148 | [`mb_config.pyl`][mb_config.pyl] in the Chromium workspace; see below. |
| 149 | * [`trybots.py`][trybots.py] defines how try bots *mirror* one or more |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 150 | waterfall bots. |
| 151 | * The concept of try bots mirroring waterfall bots ensures there are |
| 152 | no differences in behavior between the waterfall bots and the try |
| 153 | bots. This helps ensure that a CL will not pass the commit queue |
| 154 | and then break on the waterfall. |
| 155 | * This file defines the behavior of the following GPU-related try |
| 156 | bots: |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 157 | * `linux-rel`, `mac-rel`, `win10_chromium_x64_rel_ng` and |
| 158 | `android-marshmallow-arm64-rel`, which run against every |
Stephen Martinis | 089f5f0 | 2019-02-12 02:42:24 | [diff] [blame] | 159 | Chromium CL, and which mirror the behavior of bots on the |
| 160 | chromium.gpu waterfall. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 161 | * The ANGLE try bots, which run against ANGLE CLs, and mirror the |
| 162 | behavior of the chromium.gpu.fyi waterfall (including using |
| 163 | top-of-tree ANGLE, and running additional tests not run by the |
| 164 | regular Chromium try bots) |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 165 | * The optional GPU try servers `linux_optional_gpu_tests_rel`, |
| 166 | `mac_optional_gpu_tests_rel`, `win_optional_gpu_tests_rel` and |
| 167 | `android_optional_gpu_tests_rel`, which are added automatically |
| 168 | to CLs which modify a selected set of subdirectories and |
| 169 | run some tests which can't be run on the regular Chromium try |
| 170 | servers mainly due to lack of hardware capacity. |
| 171 | * Manual GPU trybots, starting with `gpu-try-` and `gpu-fyi-try-` |
| 172 | prefixes, which can be added manually to CLs targeting a |
| 173 | specific hardware configuration. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 174 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 175 | [tools/build]: https://chromium.googlesource.com/chromium/tools/build/ |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 176 | [chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/builders/chromium_gpu.py |
| 177 | [chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py |
| 178 | [trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/main/recipes/recipe_modules/chromium_tests/trybots.py |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 179 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 180 | In the [`chromium/src`][chromium/src] workspace: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 181 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 182 | * [`src/testing/buildbot`][src/testing/buildbot]: |
| 183 | * [`chromium.gpu.json`][chromium.gpu.json] and |
| 184 | [`chromium.gpu.fyi.json`][chromium.gpu.fyi.json] define which steps are |
| 185 | run on which bots. These files are autogenerated. Don't modify them |
| 186 | directly! |
| 187 | * [`waterfalls.pyl`][waterfalls.pyl], |
| 188 | [`test_suites.pyl`][test_suites.pyl], [`mixins.pyl`][mixins.pyl] and |
| 189 | [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] define the |
| 190 | confugation for the autogenerated json files above. |
| 191 | Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to |
| 192 | generate the json files after you modify these pyl files. |
| 193 | * [`generate_buildbot_json.py`][generate_buildbot_json.py] |
| 194 | * The generator script for all the waterfalls, including |
| 195 | `chromium.gpu.json` and `chromium.gpu.fyi.json`. |
| 196 | * See the [README for generate_buildbot_json.py] for documentation |
| 197 | on this script and the descriptions of the waterfalls and test |
| 198 | suites. |
| 199 | * When modifying this script, don't forget to also run it, to |
| 200 | regenerate the JSON files. Don't worry; the presubmit step will |
| 201 | catch this if you forget. |
| 202 | * See [Adding new steps to the GPU bots] for more details. |
| 203 | * [`gn_isolate_map.pyl`][gn_isolate_map.pyl] defines all of the isolates' |
| 204 | behavior in the GN build. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 205 | * [`src/tools/mb/mb_config.pyl`][mb_config.pyl] |
| 206 | * Defines the GN arguments for all of the bots. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 207 | * [`src/infra/config`][src/infra/config]: |
| 208 | * Definitions of how bots are organized on the waterfall, |
| 209 | how builds are triggered, which VMs or machines are used for the |
| 210 | builder itself, i.e. for compilation and scheduling swarmed tasks |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 211 | on GPU hardware. See |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 212 | [README.md](https://chromium.googlesource.com/chromium/src/+/main/infra/config/README.md) |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 213 | in this directory for up to date information. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 214 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 215 | [chromium/src]: https://chromium.googlesource.com/chromium/src/ |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 216 | [src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot |
| 217 | [src/infra/config]: https://chromium.googlesource.com/chromium/src/+/main/infra/config |
| 218 | [chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.json |
| 219 | [chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.fyi.json |
| 220 | [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/gn_isolate_map.pyl |
| 221 | [mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/main/tools/mb/mb_config.pyl |
| 222 | [generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/generate_buildbot_json.py |
| 223 | [mixins.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/mixins.pyl |
| 224 | [waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/waterfalls.pyl |
| 225 | [test_suites.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/test_suites.pyl |
| 226 | [test_suite_exceptions.pyl]: https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/test_suite_exceptions.pyl |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 227 | [README for generate_buildbot_json.py]: ../../testing/buildbot/README.md |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 228 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 229 | In the [`infradata/config`][infradata/config] workspace (Google internal only, |
| 230 | sorry): |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 231 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 232 | * [`gpu.star`][gpu.star] |
| 233 | * Defines a `chromium.tests.gpu` Swarming pool which contains all of the |
| 234 | specialized hardware, except some hardware shared with Chromium: |
| 235 | for example, the Windows and Linux NVIDIA |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 236 | bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD |
| 237 | GPUs. New GPU hardware should be added to this pool. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 238 | * Also defines the GCEs, Mac VMs and Mac machines used for CI builders |
| 239 | on GPU and GPU.FYI waterfalls and trybots. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 240 | * [`pools.cfg`][pools.cfg] |
| 241 | * Defines the Swarming pools for GCEs and Mac VMs used for manually |
| 242 | triggered trybots. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 243 | |
| 244 | [infradata/config]: https://chrome-internal.googlesource.com/infradata/config |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 245 | [gpu.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/starlark/bots/chromium/gpu.star |
| 246 | [chromium.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/starlark/bots/chromium/chromium.star |
| 247 | [pools.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/chromium-swarm/pools.cfg |
| 248 | [main.star]: https://chrome-internal.googlesource.com/infradata/config/+/main/main.star |
| 249 | [vms.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/main/configs/gce-provider/vms.cfg |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 250 | |
| 251 | ## Walkthroughs of various maintenance scenarios |
| 252 | |
| 253 | This section describes various common scenarios that might arise when |
| 254 | maintaining the GPU bots, and how they'd be addressed. |
| 255 | |
| 256 | ### How to add a new test or an entire new step to the bots |
| 257 | |
| 258 | This is described in [Adding new tests to the GPU bots]. |
| 259 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 260 | [Adding new tests to the GPU bots]: https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing.md#Adding-New-Tests-to-the-GPU-Bots |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 261 | |
Jamie Madill | f71bf71 | 2019-01-09 14:41:21 | [diff] [blame] | 262 | ### How to set up new virtual machine instances |
| 263 | |
| 264 | The tests use virtual machines to build binaries and to trigger tests on |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 265 | physical hardware. VMs don't run any tests themselves. There are 3 types of |
| 266 | bots: |
Jamie Madill | f71bf71 | 2019-01-09 14:41:21 | [diff] [blame] | 267 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 268 | * Builders - these bots build test binaries, upload them to storage and trigger |
| 269 | tester bots (see below). Builds must be done on the same OS on which the |
| 270 | tests will run, except for Android tests, which are built on Linux. |
| 271 | * Testers - these bots trigger tests to execute in Swarming and merge results |
| 272 | from multiple shards. 2-core Linux GCEs are sufficient for this task. |
| 273 | * Builder/testers - these are the combination of the above and have same OS |
| 274 | constraints as builders. All trybots are of this type, while for CI bots |
| 275 | it is optional. |
Jamie Madill | f71bf71 | 2019-01-09 14:41:21 | [diff] [blame] | 276 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 277 | The process is: |
Jamie Madill | f71bf71 | 2019-01-09 14:41:21 | [diff] [blame] | 278 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 279 | 1. Follow [go/request-chrome-resources](go/request-chrome-resources) to get |
| 280 | approval for the VMs. Use `GPU` project resource group. |
| 281 | See this [example ticket](http://crbug.com/1012805). |
| 282 | You'll need to determine how many VMs are required, which OSes, how many |
| 283 | cores and in which swarming pools they will be (see below for different |
| 284 | scenarios). |
| 285 | * If setting up a new GPU hardware pool, some VMs will also be needed |
| 286 | for manual trybots, usually 2 VMs as of this writing. |
| 287 | * Additional action is needed for Mac VMs, the GPU resource owner will |
| 288 | assign the bug to Labs to deploy them. See this |
| 289 | [example ticket](http://crbug.com/964355). |
| 290 | 1. Once GCE resource request is approved / Mac VMs are deployed, the VMs need |
| 291 | to be added to the right Swarming pools in a CL in the |
| 292 | [`infradata/config`][infradata/config] (Google internal) workspace. |
| 293 | 1. GCEs for Windows CI builders and builder/testers should be added to |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 294 | `luci-chromium-gpu-ci-win10-8` group in [`gpu.star`][gpu.star]. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 295 | 1. GCEs for Linux and Android CI builders and builder/testers should be added to |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 296 | `luci-chromium-gpu-ci-xenial-8` group in [`gpu.star`][gpu.star]. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 297 | 1. VMs for Mac CI builders and builder/testers should be added to |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 298 | `builderfull_gpu_ci_bots` group in [`gpu.star`][gpu.star]. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 299 | [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1166889). |
| 300 | 1. GCEs for CI testers for all OSes should be added to |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 301 | `luci-chromium-gpu-ci-xenial-2` group in [`gpu.star`][gpu.star]. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 302 | [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2016410). |
| 303 | 1. GCEs and VMs for CQ and optional CQ GPU trybots for should be added to |
| 304 | a corresponding `gpu_try_bots` group in [`gpu.star`][gpu.star]. |
| 305 | [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1561384). |
| 306 | These trybots are "builderful", i.e. these GCEs can't be shared among |
| 307 | different bots. This is done in order to limit the number of concurrent |
| 308 | builds on these bots (until [crbug.com/949379](crbug.com/949379) is |
| 309 | fixed) to prevent oversubscribing GPU hardware. |
| 310 | `win_optional_gpu_tests_rel` is an exception, its GCEs come from |
| 311 | `luci-chromium-try-win10-*-8` groups in |
| 312 | [`chromium.star`][chromium.star], see |
| 313 | [CL](https://chrome-internal-review.googlesource.com/c/infradata/config/+/1708723). |
| 314 | This can cause oversubscription to Windows GPU hardware, however, |
| 315 | Chrome Infra insisted on making this bot builderless due to frequent |
| 316 | interruptions they get from limiting the number of concurrent builds on |
| 317 | it, see discussion in |
| 318 | [CL](https://chromium-review.googlesource.com/c/chromium/src/+/1775098). |
| 319 | 1. GCEs and VMs for manual GPU trybots should be added to a corresponding |
| 320 | pool in "Manually-triggered GPU trybots" in [`gpu.star`][gpu.star]. |
| 321 | If adding a new pool, it should also be added to |
| 322 | [`pools.cfg`][pools.cfg]. |
| 323 | [Example](https://chrome-internal-review.googlesource.com/c/infradata/config/+/2433332). |
| 324 | This is a different mechanism to limit the load on GPU hardware, |
| 325 | by having a small pool of GCEs which corresponds to some GPU hardware |
| 326 | resource, and all trybots that target this GPU hardware compete for |
| 327 | GCEs from this small pool. |
| 328 | 1. Run [`main.star`][main.star] to regenerate |
| 329 | `configs/chromium-swarm/bots.cfg` and `configs/gce-provider/vms.cfg`. |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 330 | Double-check your work there. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 331 | Note that previously [`vms.cfg`][vms.cfg] had to be edited manually. |
| 332 | Part of the difficulty was in choosing a zone. This should soon no |
| 333 | longer be necessary per [crbug.com/942301](http://crbug.com/942301), |
| 334 | but consult with the Chrome Infra team to find out which of the |
| 335 | [zones](https://cloud.google.com/compute/docs/regions-zones/) has |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 336 | available capacity. This also can be checked on viceroy |
| 337 | [dashboard](https://viceroy.corp.google.com/chrome_infra/Quota/chrome?duration=7d). |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 338 | 1. Get this reviewed and landed. This step associates the VM or pool of VMs |
| 339 | with the bot's name on the waterfall for "builderful" bots or increases |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 340 | swarmed pool capacity for "builderless" bots. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 341 | Note: CR+1 is not sticky in this repo, so you'll have to ping for |
| 342 | re-review after every change, like rebase. |
Jamie Madill | f71bf71 | 2019-01-09 14:41:21 | [diff] [blame] | 343 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 344 | ### How to add a new tester bot to the chromium.gpu.fyi waterfall |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 345 | |
| 346 | When deploying a new GPU configuration, it should be added to the |
| 347 | chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved |
| 348 | for those GPUs which are tested on the commit queue. (Some of the bots violate |
| 349 | this rule – namely, the Debug bots – though we should strive to eliminate these |
| 350 | differences.) Once the new configuration is ready to be fully deployed on |
| 351 | tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers |
| 352 | changed to mirror them. |
| 353 | |
| 354 | In order to add Release and Debug waterfall bots for a new configuration, |
| 355 | experience has shown that at least 4 physical machines are needed in the |
| 356 | swarming pool. The reason is that the tests all run in parallel on the Swarming |
| 357 | cluster, so the load induced on the swarming bots is higher than it would be |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 358 | if the tests were run strictly serially. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 359 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 360 | With these prerequisites, these are the steps to add a new (swarmed) tester bot. |
| 361 | (Actually, pair of bots -- Release and Debug. If deploying just one or the |
| 362 | other, ignore the other configuration.) These instructions assume that you are |
| 363 | reusing one of the existing builders, like [`GPU FYI Win Builder`][GPU FYI Win |
| 364 | Builder]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 365 | |
| 366 | 1. Work with the Chrome Infrastructure Labs team to get the (minimum 4) |
| 367 | physical machines added to the Swarming pool. Use |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 368 | [chromium-swarm.appspot.com] or `src/tools/luci-go/swarming bots` |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 369 | to determine the PCI IDs of the GPUs in the bots. (These instructions will |
| 370 | need to be updated for Android bots which don't have PCI buses.) |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 371 | |
John Budorick | b2ff224 | 2019-11-14 17:35:59 | [diff] [blame] | 372 | 1. Make sure to add these new machines to the chromium.tests.gpu Swarming |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 373 | pool by creating a CL against [`gpu.star`][gpu.star] in the |
| 374 | [`infradata/config`][infradata/config] (Google internal) workspace. |
| 375 | Git configure your user.email to @google.com if necessary. Here is one |
| 376 | [example CL](https://chrome-internal-review.googlesource.com/913528) |
| 377 | and a |
| 378 | [second example](https://chrome-internal-review.googlesource.com/1111456). |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 379 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 380 | 1. Run [`main.star`][main.star] to regenerate |
| 381 | `configs/chromium-swarm/bots.cfg`. Double-check your work there. |
Kenneth Russell | fb27e2d | 2019-03-29 22:19:55 | [diff] [blame] | 382 | |
| 383 | 1. Allocate new virtual machines for the bots as described in [How to set up |
| 384 | new virtual machine |
| 385 | instances](#How-to-set-up-new-virtual-machine-instances). |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 386 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 387 | 1. Create a CL in the Chromium workspace which does the following. Here's an |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 388 | [example CL](https://chromium-review.googlesource.com/c/chromium/src/+/1752291). |
| 389 | 1. Adds the new machines to [`waterfalls.pyl`][waterfalls.pyl] directly or |
| 390 | to [`mixins.pyl`][mixins.pyl], referencing the new mixin in |
| 391 | [`waterfalls.pyl`][waterfalls.pyl]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 392 | 1. The swarming dimensions are crucial. These must match the GPU and |
| 393 | OS type of the physical hardware in the Swarming pool. This is what |
| 394 | causes the VMs to spawn their tests on the correct hardware. Make |
John Budorick | b2ff224 | 2019-11-14 17:35:59 | [diff] [blame] | 395 | sure to use the chromium.tests.gpu pool, and that the new machines |
| 396 | were specifically added to that pool. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 397 | 1. Make triply sure that there are no collisions between the new |
| 398 | hardware you're adding and hardware already in the Swarming pool. |
| 399 | For example, it used to be the case that all of the Windows NVIDIA |
| 400 | bots ran the same OS version. Later, the Windows 8 flavor bots were |
| 401 | added. In order to avoid accidentally running tests on Windows 8 |
| 402 | when Windows 7 was intended, the OS in the swarming dimensions of |
| 403 | the Win7 bots had to be changed from `win` to |
| 404 | `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our |
| 405 | data center). Similarly, the Win8 bots had to have a very precise |
| 406 | OS description (`Windows-2012ServerR2-SP0`). |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 407 | 1. If you're deploying a new bot that's similar to another existing |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 408 | configuration, please search around in |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 409 | [`test_suite_exceptions.pyl`][test_suite_exceptions.pyl] for |
| 410 | references to the other bot's name and see if your new bot needs |
| 411 | to be added to any exclusion lists. For example, some of the tests |
| 412 | don't run on certain Win bots because of missing OpenGL extensions. |
| 413 | 1. Run [`generate_buildbot_json.py`][generate_buildbot_json.py] to |
| 414 | regenerate `src/testing/buildbot/chromium.gpu.fyi.json`. |
| 415 | 1. Updates [`ci.star`][ci.star] and its related generated files |
Brian Sheedy | a7bd47b | 2020-05-12 01:10:01 | [diff] [blame] | 416 | [`cr-buildbucket.cfg`][cr-buildbucket.cfg], |
| 417 | [`luci-scheduler.cfg`][luci-scheduler.cfg], and |
| 418 | ['luci-milo.cfg`][luci-milo.cfg]: |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 419 | * Use the appropriate definition for the type of the bot being added, |
| 420 | for example, `ci.gpu_fyi_thin_tester()` should be used for all CI |
| 421 | tester bots on GPU FYI waterfall. |
| 422 | * Make sure to set `triggered_by` property to the builder which |
| 423 | triggers the testers (like `'GPU Win FYI Builder'`). |
Brian Sheedy | a7bd47b | 2020-05-12 01:10:01 | [diff] [blame] | 424 | * Include a `ci.console_view_entry` for the builder's |
| 425 | `console_view_entry` argument. Look at the short names and |
| 426 | categories to try and come up with a reasonable organization. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 427 | 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the |
| 428 | generated files. Double-check your work there. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 429 | 1. If you were adding a new builder, you would need to also add the new |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 430 | machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 431 | |
| 432 | 1. After the Chromium-side CL lands it will take some time for all of |
| 433 | the configuration changes to be picked up by the system. The bot |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 434 | will probably be in a red or purple state, claiming that it can't |
| 435 | find its configuration. (It might also be in an "empty" state, not |
| 436 | running any jobs at all.) |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 437 | |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 438 | 1. *After* the Chromium-side CL lands and the bot is on the console, create a CL |
| 439 | in the [`tools/build`][tools/build] workspace which does the |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 440 | following. Here's an [example |
| 441 | CL](https://chromium-review.googlesource.com/1041145). |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 442 | 1. Adds the new bot to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 443 | `recipes/recipe_modules/chromium_tests/builders/`. Make sure to set the |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 444 | `serialize_tests` property to `True`. This is specified for waterfall |
| 445 | bots, but not trybots, and helps avoid overloading the physical |
| 446 | hardware. Double-check the `BUILD_CONFIG` and `parent_buildername` |
| 447 | properties for each. They must match the Release/Debug flavor of the |
Yuly Novikov | 1c8f4b9 | 2021-04-15 01:47:15 | [diff] [blame] | 448 | builder, like `GPU FYI Win x64 Builder` vs. |
| 449 | `GPU FYI Win x64 Builder (dbg)`. |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 450 | 1. Get this reviewed and landed. This step tells the Chromium recipe about |
| 451 | the newly-deployed waterfall bot, so it knows which JSON file to load |
| 452 | out of src/testing/buildbot and which entry to look at. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 453 | 1. Sometimes it is necessary to retrain recipe expectations |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 454 | (`recipes/recipes.py test train`). This is usually needed only |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 455 | if the bot adds untested code flow in a recipe, but it's something |
| 456 | to watch out for if your CL fails presubmit for some reason. |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 457 | |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 458 | 1. Note that it is crucial that the bot be deployed before hooking it up in the |
| 459 | tools/build workspace. In the new LUCI world, if the parent builder can't |
| 460 | find its child testers to trigger, that's a hard error on the parent. This |
| 461 | will cause the builders to fail. You can and should prepare the tools/build |
| 462 | CL in advance, but make sure it doesn't land until the bot's on the console. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 463 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 464 | 1. If the number of physical machines for the new bot permits, you should also |
| 465 | add a manually-triggered trybot at the same time that the CI bot is added. |
| 466 | This is described in [How to add a new manually-triggered trybot]. |
| 467 | |
Brian Sheedy | 1ac3f67 | 2021-01-06 23:43:03 | [diff] [blame] | 468 | While the above instructions assume that an existing parent builder will be |
| 469 | be used, a new one can be set up by performing a modified version of the steps: |
| 470 | |
| 471 | 1. Make a [`tools/build`][tools/build] CL that adds the config for *only* the |
| 472 | new builder and land it. |
| 473 | 1. Make and land Chromium CL that makes the above changes in addition to the |
| 474 | following: |
| 475 | 1. Add the new builder to the necessary `//infra/config` files in the same |
| 476 | way as the tester. |
| 477 | 1. Add the new builder to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. |
| 478 | 1. Make a [`tools/build`][tools/build] CL that adds the config for *only* the |
| 479 | new tester and land it. |
| 480 | |
| 481 | Attempting to set up the builder/tester pair without first landing the |
| 482 | [`tools/build`][tools/build] CL for the new builder will result in things |
| 483 | breaking as seen in [this bug][misconfigured builder bug]. |
| 484 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 485 | [How to add a new manually-triggered trybot]: https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing_bot_details.md#How-to-add-a-new-manually_triggered-trybot |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 486 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 487 | [ci.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/ci.star |
| 488 | [chromium.gpu.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/chromium.gpu.star |
| 489 | [chromium.gpu.fyi.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/chromium.gpu.fyi.star |
| 490 | [cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/cr-buildbucket.cfg |
| 491 | [luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/luci-scheduler.cfg |
| 492 | [luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generated/luci-milo.cfg |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 493 | [GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder |
Brian Sheedy | 1ac3f67 | 2021-01-06 23:43:03 | [diff] [blame] | 494 | [misconfigured builder bug]: https://bugs.chromium.org/p/chromium/issues/detail?id=1163657 |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 495 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 496 | ### How to start running tests on a new GPU type on an existing try bot |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 497 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 498 | Let's say that you want to cause the `win10_chromium_x64_rel_ng` try bot to run |
| 499 | tests on CoolNewGPUType in addition to the types it currently runs (as of this |
| 500 | writing only NVIDIA). To do this: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 501 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 502 | 1. Make sure there is enough hardware capacity using the available tools to |
| 503 | report utilization of the Swarming pool. |
| 504 | 1. Deploy Release and Debug testers on the `chromium.gpu` waterfall, following |
| 505 | the instructions for the `chromium.gpu.fyi` waterfall above. Make sure |
| 506 | the flakiness on the new bots is comparable to existing `chromium.gpu` bots |
| 507 | before proceeding. |
| 508 | 1. Create a CL in the [`tools/build`][tools/build] workspace, adding the new |
| 509 | Release tester to `win10_chromium_x64_rel_ng`'s `bot_ids` list |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 510 | in `recipes/recipe_modules/chromium_tests/trybots.py`. Rerun |
| 511 | `recipes/recipes.py test train`. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 512 | 1. Once the above CL lands, the commit queue will **immediately** start |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 513 | running tests on the CoolNewGPUType configuration. Be vigilant and make |
| 514 | sure that tryjobs are green. If they are red for any reason, revert the CL |
| 515 | and figure out offline what went wrong. |
| 516 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 517 | ### How to add a new manually-triggered trybot |
| 518 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 519 | Manually-triggered trybots are needed for investigating failures on a GPU type |
| 520 | which doesn't have a corresponding CQ trybot (due to lack of GPU resources). |
| 521 | Even for GPU types that have CQ trybots, it is convenient to have |
| 522 | manually-triggered trybots as well, since the CQ trybot often runs on more than |
| 523 | one GPU type, or some test suites which run on CI bot can be disabled on CQ |
| 524 | trybot (when the CQ bot mirrors a |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 525 | [fake bot](https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/gpu_testing_bot_details.md#how-to-add-a-new-try-bot-that-runs-a-subset-of-tests-or-extra-tests)). |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 526 | Thus, all CI bots in `chromium.gpu` and `chromium.gpu.fyi` have corresponding |
| 527 | manually-triggered trybots, except a few which don't have enough hardware |
| 528 | to support it. A manually-triggered trybot should be added at the same time |
| 529 | a CI bot is added. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 530 | |
| 531 | Here are the steps to set up a new trybot which runs tests just on one |
| 532 | particular GPU type. Let's consider that we are adding a manually-triggered |
| 533 | trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 534 | `gpu-fyi-try-win7-nvidia-rel-64`. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 535 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 536 | 1. If there already exist some manually-triggered trybot which runs tests on |
| 537 | the same group of machines (i.e. same GPU, OS and driver), the new trybot |
| 538 | will have to share the VMs with it. Otherwise, create a new pool of VMs for |
| 539 | the new hardware and allocate the VMs as described in |
| 540 | [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances), |
| 541 | following the "Manually-triggered GPU trybots" instructions. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 542 | |
Brian Sheedy | a7bd47b | 2020-05-12 01:10:01 | [diff] [blame] | 543 | 1. Create a CL in the Chromium workspace which does the following. Here's a |
| 544 | [reference CL](https://chromium-review.googlesource.com/c/chromium/src/+/2191276) |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 545 | exemplifying the new "GCE pool per GPU hardware pool" way. |
| 546 | 1. Updates [`gpu.try.star`][gpu.try.star] and its related generated file |
| 547 | [`cr-buildbucket.cfg`][cr-buildbucket.cfg]: |
| 548 | * Add the new trybot with the right `builder` define and VMs pool. |
| 549 | For `gpu-fyi-try-win7-nvidia-rel-64` this would be |
| 550 | `gpu_win_builder()` and `luci.chromium.gpu.win7.nvidia.try`. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 551 | 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the |
| 552 | generated files. Double-check your work there. |
| 553 | 1. Adds the new trybot to [`src/tools/mb/mb_config.pyl`][mb_config.pyl] |
| 554 | and [`src/tools/mb/mb_config_buckets.pyl`][mb_config_buckets.pyl]. |
| 555 | Use the same mixin as does the builder for the CI bot this trybot |
| 556 | mirrors, in case of `gpu-fyi-try-win7-nvidia-rel-64` this is |
| 557 | `GPU FYI Win x64 Builder` and thus `gpu_fyi_tests_release_trybot`. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 558 | 1. Get this CL reviewed and landed. |
| 559 | |
| 560 | 1. Create a CL in the [`tools/build`][tools/build] workspace which does the |
| 561 | following. Here's an [example |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 562 | CL](https://chromium-review.googlesource.com/c/chromium/tools/build/+/1979113). |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 563 | |
| 564 | 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 565 | `recipes/recipe_modules/chromium_tests/tests/trybots.py`. Create this |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 566 | section after the "Optional GPU bots" section for the appropriate |
| 567 | tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`, |
| 568 | `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot |
| 569 | mirror the appropriate waterfall bot; in this case, the buildername to |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 570 | mirror is `GPU FYI Win x64 Builder` and the tester is |
| 571 | `Win7 FYI x64 Release (NVIDIA)`. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 572 | 1. Get this reviewed and landed. This step tells the Chromium recipe about |
| 573 | the newly-deployed trybot, so it knows which JSON file to load out of |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 574 | `src/testing/buildbot` and which entry to look at to understand which |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 575 | tests to run and on what physical hardware. |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 576 | 1. It may be necessary to retrain recipe expectations for |
| 577 | [`tools/build`][tools/build] workspace CLs |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 578 | (`recipes/recipes.py test train`). This shouldn't be necessary |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 579 | for just adding a manually triggered trybot, but it's something to |
| 580 | watch out for if your CL fails presubmit for some reason. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 581 | |
Kenneth Russell | fc56614 | 2018-06-26 22:34:15 | [diff] [blame] | 582 | At this point the new trybot should automatically show up in the |
| 583 | "Choose tryjobs" pop-up in the Gerrit UI, under the |
| 584 | `luci.chromium.try` heading, because it was deployed via LUCI. It |
| 585 | should be possible to send a CL to it. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 586 | |
Kenneth Russell | fc56614 | 2018-06-26 22:34:15 | [diff] [blame] | 587 | (It should not be necessary to modify buildbucket.config as is |
| 588 | mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the |
| 589 | chrome-infra team if this doesn't work as expected.) |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 590 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 591 | [gpu.try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/gpu.try.star |
| 592 | [luci.chromium.try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/luci.chromium.try.star |
| 593 | [tryserver.chromium.win.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/consoles/tryserver.chromium.win.star |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 594 | |
| 595 | |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 596 | ### How to add a new try bot that runs a subset of tests or extra tests |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 597 | |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 598 | Several projects (ANGLE, Dawn) run custom tests using the Chromium recipes. They |
| 599 | use try bot bot configs that run subsets of Chromium or additional slower tests |
| 600 | that can't be run on the main CQ. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 601 | |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 602 | These try bots are a little different because they mirror waterfall bots that |
| 603 | don't actually exist. The waterfall bots' specifications exist only to tell |
| 604 | these try bots which tests to run. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 605 | |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 606 | Let's say that you intended to add a new such custom try bot on Windows. Call it |
| 607 | `win-myproject-rel` for example. You will need to add a "fake" mirror bot for |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 608 | each GPU family on which you want to run the tests. For a GPU type of |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 609 | "CoolNewGPUType" in this example you could add a "fake" bot named "MyProject GPU |
| 610 | Win10 Release (CoolNewGPUType)". |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 611 | |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 612 | 1. Allocate new virtual machines for the bots as described in |
| 613 | [How to set up new virtual machine instances](#How-to-set-up-new-virtual-machine-instances). |
| 614 | 1. Make sure there is enough hardware capacity using the available tools to |
| 615 | report utilization of the Swarming pool. |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 616 | 1. Create a CL in the Chromium workspace the does the following. Here's an |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 617 | outdated [example CL](https://crrev.com/c/1554296). |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 618 | 1. Add your new bot (for example, "MyProject GPU Win10 Release |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 619 | (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 620 | [`waterfalls.pyl`][waterfalls.pyl]. |
| 621 | 1. Add your new bot to |
| 622 | [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] |
| 623 | in the list of `get_bots_that_do_not_actually_exist` section. |
| 624 | 1. Re-run |
| 625 | [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] |
| 626 | to regenerate the JSON files. |
| 627 | 1. Update [`scheduler-noop-jobs.star`][scheduler-noop-jobs.star] to |
| 628 | include "MyProject GPU Win10 Release (CoolNewGPUType)". |
| 629 | 1. Update [`try.star`][try.star] and desired consoles to include |
| 630 | `win-myproject-rel`. |
| 631 | 1. Run `main.star` in [`src/infra/config`][src/infra/config] to update the |
| 632 | generated files: [`luci-milo.cfg`][luci-milo.cfg], |
| 633 | [`luci-scheduler.cfg`][luci-scheduler.cfg], |
| 634 | [`cr-buildbucket.cfg`][cr-buildbucket.cfg]. Double-check your work |
| 635 | there. |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 636 | 1. Update [`src/tools/mb/mb_config.pyl`][mb_config.pyl] |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 637 | to include `win-myproject-rel`. |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 638 | 1. *After* the Chromium-side CL lands and the bot is on the console, create a CL |
| 639 | in the [`tools/build`][tools/build] workspace which does the |
| 640 | following. Here's an [example CL](https://crrev.com/c/1554272). |
| 641 | 1. Adds "MyProject GPU Win10 Release |
| 642 | (CoolNewGPUType)" to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in |
Yuly Novikov | 55b23a6 | 2020-10-02 18:23:43 | [diff] [blame] | 643 | `recipes/recipe_modules/chromium_tests/builders/`. You can copy a similar |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 644 | step. |
| 645 | 1. Adds `win-myproject-rel` to [`trybots.py`][trybots.py] in the same folder. |
| 646 | This is where you associate "MyProject GPU Win10 Release |
| 647 | (CoolNewGPUType)" with `win-myproject-rel`. See the sample CL for an example. |
| 648 | 1. Get this reviewed and landed. This step tells the Chromium recipe about |
| 649 | the newly-deployed waterfall bot, so it knows which JSON file to load |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 650 | out of `src/testing/buildbot` and which entry to look at. |
Jamie Madill | da894ce | 2019-04-08 17:19:17 | [diff] [blame] | 651 | 1. After your CLs land you should be able to find and run `win-myproject-rel` on CLs |
| 652 | using Choose Trybots in Gerrit. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 653 | |
John Palmer | 046f987 | 2021-05-24 01:24:56 | [diff] [blame] | 654 | [scheduler-noop-jobs.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/generators/scheduler-noop-jobs.star |
| 655 | [try.star]: https://chromium.googlesource.com/chromium/src/+/main/infra/config/subprojects/try.star |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 656 | |
| 657 | |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 658 | ### How to test and deploy a driver and/or OS update |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 659 | |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 660 | Let's say that you want to roll out an update to the graphics drivers or the OS |
| 661 | on one of the configurations like the Linux NVIDIA bots. In order to verify |
| 662 | that the new driver or OS won't destabilize Chromium's commit queue, |
| 663 | it's necessary to run the new driver or OS on one of the waterfalls for a day |
| 664 | or two to make sure the tests are reliably green before rolling out the driver |
| 665 | or OS update. To do this: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 666 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 667 | 1. Make sure that all of the current Swarming jobs for this OS and GPU |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 668 | configuration are targeted at the "stable" version of the driver and the OS |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 669 | in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl]. |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 670 | 1. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of |
| 671 | the physical machines already in the Swarming pool upgraded to the new |
| 672 | version of the driver or the OS. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 673 | 1. If an "experimental" version of this bot doesn't yet exist, follow the |
| 674 | instructions above for [How to add a new tester bot to the chromium.gpu.fyi |
| 675 | waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall) |
| 676 | to deploy one. |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 677 | 1. Have this experimental bot target the new version of the driver or the OS |
Yuly Novikov | 8e92b17 | 2020-02-07 17:40:12 | [diff] [blame] | 678 | in [`waterfalls.pyl`][waterfalls.pyl] and [`mixins.pyl`][mixins.pyl]. |
| 679 | [Sample CL][sample driver cl]. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 680 | 1. Hopefully, the new machine will pass the pixel tests. If it doesn't, then |
Brian Sheedy | 1cea4d4 | 2019-08-12 18:09:49 | [diff] [blame] | 681 | it'll be necessary to follow the instructions on |
| 682 | [updating Gold baselines (step #4)][updating gold baselines]. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 683 | 1. Watch the new machine for a day or two to make sure it's stable. |
Brian Sheedy | 811cca7 | 2020-05-21 21:34:14 | [diff] [blame] | 684 | 1. When it is, add the experimental driver/OS to the `_stable` mixin using the |
| 685 | swarming OR operator `|`. For example: |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 686 | |
Yuly Novikov | f13babb | 2019-04-24 23:46:57 | [diff] [blame] | 687 | ``` |
Brian Sheedy | 811cca7 | 2020-05-21 21:34:14 | [diff] [blame] | 688 | 'win10_intel_hd_630_stable': { |
| 689 | 'swarming': { |
| 690 | 'dimensions': { |
| 691 | 'gpu': '8086:5912-26.20.100.7870|8086:5912-26.20.100.8141', |
| 692 | 'os': 'Windows-10', |
| 693 | 'pool': 'chromium.tests.gpu', |
| 694 | }, |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 695 | }, |
Yuly Novikov | 3fbea99 | 2019-06-28 18:25:42 | [diff] [blame] | 696 | } |
| 697 | ``` |
| 698 | |
Brian Sheedy | 811cca7 | 2020-05-21 21:34:14 | [diff] [blame] | 699 | This will cause tests triggered using the `_stable` mixin to run on either |
| 700 | the old stable dimension or the experimental/new stable dimension. |
| 701 | |
| 702 | **NOTE** There is a hard cap of 8 combinations in swarming, so you can only |
| 703 | use the OR operator in up to 3 dimensions if each dimension only has two |
| 704 | options. More than two options per dimension is allowed as long as the total |
| 705 | number of combinations is 8 or less. |
Kenneth Russell | 384a173 | 2019-03-16 02:36:02 | [diff] [blame] | 706 | 1. After it lands, ask the Chrome Infrastructure Labs team to roll out the |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 707 | driver update across all of the similarly configured bots in the swarming |
| 708 | pool. |
| 709 | 1. If necessary, update pixel test expectations and remove the suppressions |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 710 | added above. |
Brian Sheedy | 811cca7 | 2020-05-21 21:34:14 | [diff] [blame] | 711 | 1. Remove the old driver or OS version from the `_stable` mixin, leaving just |
| 712 | the new stable version. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 713 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 714 | Note that we leave the experimental bot in place. We could reclaim it, but it |
| 715 | seems worthwhile to continuously test the "next" version of graphics drivers as |
| 716 | well as the current stable ones. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 717 | |
Brian Sheedy | 1cea4d4 | 2019-08-12 18:09:49 | [diff] [blame] | 718 | [sample driver cl]: https://chromium-review.googlesource.com/c/chromium/src/+/1726875 |
Brian Sheedy | 5a4c0a39 | 2021-09-22 21:28:35 | [diff] [blame] | 719 | [updating gold baselines]: http://go/gpu-pixel-wrangler-info#how-to-keep-the-bots-green |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 720 | |
| 721 | ## Credentials for various servers |
| 722 | |
| 723 | Working with the GPU bots requires credentials to various services: the isolate |
| 724 | server, the swarming server, and cloud storage. |
| 725 | |
| 726 | ### Isolate server credentials |
| 727 | |
| 728 | To upload and download isolates you must first authenticate to the isolate |
| 729 | server. From a Chromium checkout, run: |
| 730 | |
Takuto Ikuta | 2d01a49 | 2021-06-04 00:28:58 | [diff] [blame] | 731 | * `./src/tools/luci-go/isolate login` |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 732 | |
| 733 | This will open a web browser to complete the authentication flow. A @google.com |
| 734 | email address is required in order to properly authenticate. |
| 735 | |
| 736 | To test your authentication, find a hash for a recent isolate. Consult the |
| 737 | instructions on [Running Binaries from the Bots Locally] to find a random hash |
Takuto Ikuta | f533325 | 2019-11-06 16:07:08 | [diff] [blame] | 738 | from a target like `gl_tests`. Then run the following: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 739 | |
| 740 | [Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally |
| 741 | |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 742 | ### Swarming server credentials |
| 743 | |
| 744 | The swarming server uses the same `auth.py` script as the isolate server. You |
| 745 | will need to authenticate if you want to manually download the results of |
| 746 | previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce` |
| 747 | to re-run a remote job on your local workstation. Follow the instructions |
| 748 | above, replacing the service with `https://chromium-swarm.appspot.com`. |
| 749 | |
| 750 | ### Cloud storage credentials |
| 751 | |
| 752 | Authentication to Google Cloud Storage is needed for a couple of reasons: |
| 753 | uploading pixel test results to the cloud, and potentially uploading and |
| 754 | downloading builds as well, at least in Debug mode. Use the copy of gsutil in |
| 755 | `depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage |
| 756 | instructions] to authenticate. You must use your @google.com email address and |
| 757 | be a member of the Chrome GPU team in order to receive read-write access to the |
| 758 | appropriate cloud storage buckets. Roughly: |
| 759 | |
| 760 | 1. Run `gsutil config` |
| 761 | 2. Copy/paste the URL into your browser |
| 762 | 3. Log in with your @google.com account |
| 763 | 4. Allow the app to access the information it requests |
| 764 | 5. Copy-paste the resulting key back into your Terminal |
| 765 | 6. Press "enter" when prompted for a project-id (i.e., leave it empty) |
| 766 | |
| 767 | At this point you should be able to write to the cloud storage bucket. |
| 768 | |
| 769 | Navigate to |
| 770 | <https://console.developers.google.com/storage/chromium-gpu-archive> to view |
| 771 | the contents of the cloud storage bucket. |
| 772 | |
| 773 | [Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil |