blob: c0eb4528b402625f1a5453937370cc5de0560f7f [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Testing
2
3This set of pages documents the setup and operation of the GPU bots and try
4servers, which verify the correctness of Chrome's graphically accelerated
5rendering pipeline.
6
7[TOC]
8
9## Overview
10
11The GPU bots run a different set of tests than the majority of the Chromium
12test machines. The GPU bots specifically focus on tests which exercise the
13graphics processor, and whose results are likely to vary between graphics card
14vendors.
15
16Most of the tests on the GPU bots are run via the [Telemetry framework].
17Telemetry was originally conceived as a performance testing framework, but has
18proven valuable for correctness testing as well. Telemetry directs the browser
19to perform various operations, like page navigation and test execution, from
20external scripts written in Python. The GPU bots launch the full Chromium
21browser via Telemetry for the majority of the tests. Using the full browser to
22execute tests, rather than smaller test harnesses, has yielded several
23advantages: testing what is shipped, improved reliability, and improved
24performance.
25
26[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
27
28A subset of the tests, called "pixel tests", grab screen snapshots of the web
29page in order to validate Chromium's rendering architecture end-to-end. Where
30necessary, GPU-specific results are maintained for these tests. Some of these
31tests verify just a few pixels, using handwritten code, in order to use the
32same validation for all brands of GPUs.
33
34The GPU bots use the Chrome infrastructure team's [recipe framework], and
35specifically the [`chromium`][recipes/chromium] and
36[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
37execute. Compared to the legacy master-side buildbot scripts, recipes make it
38easy to add new steps to the bots, change the bots' configuration, and run the
39tests locally in the same way that they are run on the bots. Additionally, the
40`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
41add new steps to the bots. This single capability is a huge step forward from
42the previous configuration where new steps were added blindly, and could cause
43failures on the tryservers. For more details about the configuration of the
44bots, see the [GPU bot details].
45
46[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/master/doc/user_guide.md
47[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py
48[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
49[GPU bot details]: gpu_testing_bot_details.md
50
51The physical hardware for the GPU bots lives in the Swarming pool\*. The
52Swarming infrastructure ([new docs][new-testing-infra], [older but currently
53more complete docs][isolated-testing-infra]) provides many benefits:
54
55* Increased parallelism for the tests; all steps for a given tryjob or
56 waterfall build run in parallel.
57* Simpler scaling: just add more hardware in order to get more capacity. No
58 manual configuration or distribution of hardware needed.
59* Easier to run certain tests only on certain operating systems or types of
60 GPUs.
61* Easier to add new operating systems or types of GPUs.
62* Clearer description of the binary and data dependencies of the tests. If
63 they run successfully locally, they'll run successfully on the bots.
64
65(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
66the rule are described in the [GPU bot details].)
67
68The bots on the [chromium.gpu.fyi] waterfall are configured to always test
69top-of-tree ANGLE. This setup is done with a few lines of code in the
70[tools/build workspace]; search the code for "angle".
71
72These aspects of the bots are described in more detail below, and in linked
73pages. There is a [presentation][bots-presentation] which gives a brief
74overview of this documentation and links back to various portions.
75
76<!-- XXX: broken link -->
77[new-testing-infra]: https://github.com/luci/luci-py/wiki
78[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
Kenneth Russell8a386d42018-06-02 09:48:0179[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
80[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Kai Ninomiyaa6429fb32018-03-30 01:30:5681[tools/build workspace]: https://code.google.com/p/chromium/codesearch#chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
82[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
83
84## Fleet Status
85
86Please see the [GPU Pixel Wrangling instructions] for links to dashboards
87showing the status of various bots in the GPU fleet.
88
89[GPU Pixel Wrangling instructions]: pixel_wrangling.md#Fleet-Status
90
91## Using the GPU Bots
92
93Most Chromium developers interact with the GPU bots in two ways:
94
951. Observing the bots on the waterfalls.
962. Sending try jobs to them.
97
98The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
99waterfalls. Their current status can be easily observed there.
100
101To send try jobs, you must first upload your CL to the codereview server. Then,
102either clicking the "CQ dry run" link or running from the command line:
103
104```sh
105git cl try
106```
107
108Sends your job to the default set of try servers.
109
110The GPU tests are part of the default set for Chromium CLs, and are run as part
111of the following tryservers' jobs:
112
Stephen Martinis089f5f02019-02-12 02:42:24113* [linux-rel], formerly on the `tryserver.chromium.linux` waterfall
114* [mac-rel], formerly on the `tryserver.chromium.mac` waterfall
115* [win7-rel], formerly on the `tryserver.chromium.win` waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56116
Stephen Martinis089f5f02019-02-12 02:42:24117[linux-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-rel?limit=100
118[mac-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac-rel?limit=100
119[win7-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7-rel?limit=100
Kai Ninomiyaa6429fb32018-03-30 01:30:56120
121Scan down through the steps looking for the text "GPU"; that identifies those
122tests run on the GPU bots. For each test the "trigger" step can be ignored; the
123step further down for the test of the same name contains the results.
124
125It's usually not necessary to explicitly send try jobs just for verifying GPU
126tests. If you want to, you must invoke "git cl try" separately for each
127tryserver master you want to reference, for example:
128
129```sh
Stephen Martinis089f5f02019-02-12 02:42:24130git cl try -b linux-rel
131git cl try -b mac-rel
132git cl try -b win7-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56133```
134
135Alternatively, the Gerrit UI can be used to send a patch set to these try
136servers.
137
138Three optional tryservers are also available which run additional tests. As of
139this writing, they ran longer-running tests that can't run against all Chromium
140CLs due to lack of hardware capacity. They are added as part of the included
141tryservers for code changes to certain sub-directories.
142
Corentin Wallezb78c44a2018-04-12 14:29:47143* [linux_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
144* [mac_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
145* [win_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56146
Corentin Wallezb78c44a2018-04-12 14:29:47147[linux_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel
148[mac_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_optional_gpu_tests_rel
149[win_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel
Kenneth Russell42732952018-06-27 02:08:42150[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
Kai Ninomiyaa6429fb32018-03-30 01:30:56151
152Tryservers for the [ANGLE project] are also present on the
153[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
154interface. They are configured similarly to the tryservers for regular Chromium
155patches, and run the same tests that are run on the [chromium.gpu.fyi]
156waterfall, in the same way (e.g., against ToT ANGLE).
157
158If you find it necessary to try patches against other sub-repositories than
159Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
160[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
161
162[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/master/README.md
163[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
164[file a bug]: http://crbug.com/new
165
166## Running the GPU Tests Locally
167
168All of the GPU tests running on the bots can be run locally from a Chromium
169build. Many of the tests are simple executables:
170
171* `angle_unittests`
Kai Ninomiyaa6429fb32018-03-30 01:30:56172* `gl_tests`
173* `gl_unittests`
174* `tab_capture_end2end_tests`
175
176Some run only on the chromium.gpu.fyi waterfall, either because there isn't
177enough machine capacity at the moment, or because they're closed-source tests
178which aren't allowed to run on the regular Chromium waterfalls:
179
180* `angle_deqp_gles2_tests`
181* `angle_deqp_gles3_tests`
182* `angle_end2end_tests`
183* `audio_unittests`
184
185The remaining GPU tests are run via Telemetry. In order to run them, just
186build the `chrome` target and then
187invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
188argument. The tests this script can invoke are
189in `src/content/test/gpu/gpu_tests/`. For example:
190
191* `run_gpu_integration_test.py context_lost --browser=release`
192* `run_gpu_integration_test.py pixel --browser=release`
193* `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`
194* `run_gpu_integration_test.py maps --browser=release`
195* `run_gpu_integration_test.py screenshot_sync --browser=release`
196* `run_gpu_integration_test.py trace_test --browser=release`
197
Kenneth Russellfa3ffde2018-10-24 21:24:38198If you're testing on Android and have built and deployed
199`ChromePublic.apk` to the device, use `--browser=android-chromium` to
200invoke it.
201
Kai Ninomiyaa6429fb32018-03-30 01:30:56202**Note:** If you are on Linux and see this test harness exit immediately with
203`**Non zero exit code**`, it's probably because of some incompatible Python
204packages being installed. Please uninstall the `python-egenix-mxdatetime` and
Kenneth Russellfa3ffde2018-10-24 21:24:38205`python-logilab-common` packages in this case; see [Issue
206716241](http://crbug.com/716241). This should not be happening any more since
207the GPU tests were switched to use the infra team's `vpython` harness.
Kai Ninomiyaa6429fb32018-03-30 01:30:56208
Kenneth Russellfa3ffde2018-10-24 21:24:38209You can run a subset of tests with this harness:
Kai Ninomiyaa6429fb32018-03-30 01:30:56210
211* `run_gpu_integration_test.py webgl_conformance --browser=release
212 --test-filter=conformance_attribs`
213
214Figuring out the exact command line that was used to invoke the test on the
Kenneth Russellfa3ffde2018-10-24 21:24:38215bots can be a little tricky. The bots all run their tests via Swarming and
Kai Ninomiyaa6429fb32018-03-30 01:30:56216isolates, meaning that the invocation of a step like `[trigger]
217webgl_conformance_tests on NVIDIA GPU...` will look like:
218
219* `python -u
220 'E:\b\build\slave\Win7_Release__NVIDIA_\build\src\tools\swarming_client\swarming.py'
221 trigger --swarming https://chromium-swarm.appspot.com
222 --isolate-server https://isolateserver.appspot.com
223 --priority 25 --shards 1 --task-name 'webgl_conformance_tests on NVIDIA GPU...'`
224
225You can figure out the additional command line arguments that were passed to
226each test on the bots by examining the trigger step and searching for the
227argument separator (<code> -- </code>). For a recent invocation of
228`webgl_conformance_tests`, this looked like:
229
230* `webgl_conformance --show-stdout '--browser=release' -v
231 '--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'
232 '--isolated-script-test-output=${ISOLATED_OUTDIR}/output.json'`
233
Kenneth Russellfa3ffde2018-10-24 21:24:38234You can leave off the --isolated-script-test-output argument, because that's
235used only by wrapper scripts, so this would leave a full command line of:
Kai Ninomiyaa6429fb32018-03-30 01:30:56236
237* `run_gpu_integration_test.py
238 webgl_conformance --show-stdout '--browser=release' -v
239 '--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'`
240
241The Maps test requires you to authenticate to cloud storage in order to access
242the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
243for documentation on setting this up.
244
245[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
246
Kenneth Russellfa3ffde2018-10-24 21:24:38247### Running the pixel tests locally
Kai Ninomiyaa6429fb32018-03-30 01:30:56248
Kenneth Russellfa3ffde2018-10-24 21:24:38249The pixel tests run in a few different modes:
250
251* The waterfall bots generate reference images into cloud storage, and pass
252 the `--upload-refimg-to-cloud-storage` command line argument.
253* The trybots use the reference images that were generated by the waterfall
254 bots. They pass the `--download-refimg-from-cloud-storage` command line
255 argument, as well as other needed ones like `--refimg-cloud-storage-bucket`
256 and `--os-type`.
257* When run locally, the first time the pixel tests are run, generated
258 *reference* images are placed into
259 `src/content/test/data/gpu/gpu_reference/`. The second and subsequent times,
260 if tests fail, failure images will be placed into
261 `src/content/test/data/gpu/generated`.
262
263It's possible to make your local pixel tests download the reference images from
264cloud storage, if your workstation has the same OS and GPU type as one of the
265bots on the waterfall, and you pass the `--download-refimg-from-cloud-storage`,
266`--refimg-cloud-storage-bucket`, `--os-type` and `--build-revision` command line
267arguments.
268
269Example command line for running the pixel tests locally on a desktop
270platform, where the Chrome build is in out/Release:
271
272* `run_gpu_integration_test.py pixel --browser=release`
273
274Running against a connected Android device where ChromePublic.apk has
275already been deployed:
276
277* `run_gpu_integration_test.py pixel --browser=android-chromium`
278
279You can run a subset of the pixel tests via the --test-filter argument, which
280takes a regex:
281
282* `run_gpu_integration_test.py pixel --browser=release --test-filter=Pixel_WebGL`
283* `run_gpu_integration_test.py pixel --browser=release --test-filter=\(Pixel_WebGL2\|Pixel_GpuRasterization_BlueBox\)`
284
285More complete example command line for Android:
Kai Ninomiyaa6429fb32018-03-30 01:30:56286
287* `run_gpu_integration_test.py pixel --show-stdout --browser=android-chromium
288 -v --passthrough --extra-browser-args='--enable-logging=stderr
289 --js-flags=--expose-gc' --refimg-cloud-storage-bucket
290 chromium-gpu-archive/reference-images --os-type android
291 --download-refimg-from-cloud-storage`
292
Kai Ninomiyaa6429fb32018-03-30 01:30:56293## Running Binaries from the Bots Locally
294
295Any binary run remotely on a bot can also be run locally, assuming the local
296machine loosely matches the architecture and OS of the bot.
297
298The easiest way to do this is to find the ID of the swarming task and use
299"swarming.py reproduce" to re-run it:
300
301* `./src/tools/swarming_client/swarming.py reproduce -S https://chromium-swarm.appspot.com [task ID]`
302
303The task ID can be found in the stdio for the "trigger" step for the test. For
304example, look at a recent build from the [Mac Release (Intel)] bot, and
305look at the `gl_unittests` step. You will see something like:
306
Yves Gereya702f6222019-01-24 11:07:30307[Mac Release (Intel)]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/
Kai Ninomiyaa6429fb32018-03-30 01:30:56308
309```
310Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
311To collect results, use:
312 swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
313Or visit:
314 https://chromium-swarm.appspot.com/user/task/[TASK_ID]
315```
316
317There is a difference between the isolate's hash and Swarming's task ID. Make
318sure you use the task ID and not the isolate's hash.
319
320As of this writing, there seems to be a
321[bug](https://github.com/luci/luci-py/issues/250)
322when attempting to re-run the Telemetry based GPU tests in this way. For the
323time being, this can be worked around by instead downloading the contents of
324the isolate. To do so, look more deeply into the trigger step's log:
325
326* <code>python -u
327 /b/build/slave/Mac_10_10_Release__Intel_/build/src/tools/swarming_client/swarming.py
328 trigger [...more args...] --tag data:[ISOLATE_HASH] [...more args...]
329 [ISOLATE_HASH] -- **[...TEST_ARGS...]**</code>
330
331As of this writing, the isolate hash appears twice in the command line. To
332download the isolate's contents into directory `foo` (note, this is in the
333"Help" section associated with the page for the isolate's task, but I'm not
334sure whether that's accessible only to Google employees or all members of the
335chromium.org organization):
336
337* `python isolateserver.py download -I https://isolateserver.appspot.com
338 --namespace default-gzip -s [ISOLATE_HASH] --target foo`
339
340`isolateserver.py` will tell you the approximate command line to use. You
341should concatenate the `TEST_ARGS` highlighted in red above with
342`isolateserver.py`'s recommendation. The `ISOLATED_OUTDIR` variable can be
343safely replaced with `/tmp`.
344
345Note that `isolateserver.py` downloads a large number of files (everything
346needed to run the test) and may take a while. There is a way to use
347`run_isolated.py` to achieve the same result, but as of this writing, there
348were problems doing so, so this procedure is not documented at this time.
349
350Before attempting to download an isolate, you must ensure you have permission
351to access the isolate server. Full instructions can be [found
352here][isolate-server-credentials]. For most cases, you can simply run:
353
354* `./src/tools/swarming_client/auth.py login
355 --service=https://isolateserver.appspot.com`
356
357The above link requires that you log in with your @google.com credentials. It's
358not known at the present time whether this works with @chromium.org accounts.
359Email kbr@ if you try this and find it doesn't work.
360
361[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
362
363## Running Locally Built Binaries on the GPU Bots
364
365See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
366
Sunny Sachanandani8d071572019-06-13 20:17:58367Be sure to use the correct swarming dimensions for your desired GPU e.g. "1002:6613" instead of "AMD Radeon R7 240 (1002:6613)" which is how it appears on swarming task page. You can query bots in the Chrome-GPU pool to find the correct dimensions:
368
369* `python tools\swarming_client\swarming.py bots -S chromium-swarm.appspot.com -d pool Chrome-GPU`
370
Kai Ninomiyaa6429fb32018-03-30 01:30:56371[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
372
Kenneth Russell42732952018-06-27 02:08:42373## Moving Test Binaries from Machine to Machine
374
375To create a zip archive of your personal Chromium build plus all of
376the Telemetry-based GPU tests' dependencies, which you can then move
377to another machine for testing:
378
3791. Build Chrome (into `out/Release` in this example).
3801. `python tools/mb/mb.py zip out/Release/ telemetry_gpu_integration_test out/telemetry_gpu_integration_test.zip`
381
382Then copy telemetry_gpu_integration_test.zip to another machine. Unzip
383it, and cd into the resulting directory. Invoke
384`content/test/gpu/run_gpu_integration_test.py` as above.
385
386This workflow has been tested successfully on Windows with a
387statically-linked Release build of Chrome.
388
389Note: on one macOS machine, this command failed because of a broken
390`strip-json-comments` symlink in
391`src/third_party/catapult/common/node_runner/node_runner/node_modules/.bin`. Deleting
392that symlink allowed it to proceed.
393
394Note also: on the same macOS machine, with a component build, this
395command failed to zip up a working Chromium binary. The browser failed
396to start with the following error:
397
398`[0626/180440.571670:FATAL:chrome_main_delegate.cc(1057)] Check failed: service_manifest_data_pack_.`
399
400In a pinch, this command could be used to bundle up everything, but
401the "out" directory could be deleted from the resulting zip archive,
402and the Chromium binaries moved over to the target machine. Then the
403command line arguments `--browser=exact --browser-executable=[path]`
404can be used to launch that specific browser.
405
406See the [user guide for mb](../../tools/mb/docs/user_guide.md#mb-zip), the
407meta-build system, for more details.
408
Kai Ninomiyaa6429fb32018-03-30 01:30:56409## Adding New Tests to the GPU Bots
410
411The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
412To that end, let's add as many tests as possible that will help catch
413regressions in the product. If you see a crazy bug in Chrome's rendering which
414would be easy to catch with a pixel test running in Chrome and hard to catch in
415any of the other test harnesses, please, invest the time to add a test!
416
417There are a couple of different ways to add new tests to the bots:
418
4191. Adding a new test to one of the existing harnesses.
4202. Adding an entire new test step to the bots.
421
422### Adding a new test to one of the existing test harnesses
423
424Adding new tests to the GTest-based harnesses is straightforward and
425essentially requires no explanation.
426
427As of this writing it isn't as easy as desired to add a new test to one of the
428Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
429collectively work to address that issue. It would be great to reduce the number
430of steps on the GPU bots, or at least to avoid significantly increasing the
431number of steps on the bots. The WebGL conformance tests should probably remain
432a separate step, but some of the smaller Telemetry based tests
433(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
434single step.
435
436If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
437all you need to do is make sure that your new test runs correctly via isolates.
438See the documentation from the GPU bot details on [adding new isolated
Daniel Bratellf73f0df2018-09-24 13:52:49439tests][new-isolates] for the gn args and authentication needed to upload
Kai Ninomiyaa6429fb32018-03-30 01:30:56440isolates to the isolate server. Most likely the new test will be Telemetry
441based, and included in the `telemetry_gpu_test_run` isolate. You can then
442invoke it via:
443
444* `./src/tools/swarming_client/run_isolated.py -s [HASH]
445 -I https://isolateserver.appspot.com -- [TEST_NAME] [TEST_ARGUMENTS]`
446
447[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
448
449o## Adding new steps to the GPU Bots
450
451The tests that are run by the GPU bots are described by a couple of JSON files
452in the Chromium workspace:
453
454* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json)
455* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json)
456
457These files are autogenerated by the following script:
458
Kenneth Russell8a386d42018-06-02 09:48:01459* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py)
Kai Ninomiyaa6429fb32018-03-30 01:30:56460
Kenneth Russell8a386d42018-06-02 09:48:01461This script is documented in
462[`testing/buildbot/README.md`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/README.md). The
463JSON files are parsed by the chromium and chromium_trybot recipes, and describe
464two basic types of tests:
Kai Ninomiyaa6429fb32018-03-30 01:30:56465
466* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
467 frameworks.
Kenneth Russell8a386d42018-06-02 09:48:01468* Isolated scripts: tests whose initial entry point is a Python script which
469 follows a simple convention of command line argument parsing.
470
471The majority of the GPU tests are however:
472
473* Telemetry based tests: an isolated script test which is built on the
474 Telemetry framework and which launches the entire browser.
Kai Ninomiyaa6429fb32018-03-30 01:30:56475
476A prerequisite of adding a new test to the bots is that that test [run via
Kenneth Russell8a386d42018-06-02 09:48:01477isolates][new-isolates]. Once that is done, modify `test_suites.pyl` to add the
478test to the appropriate set of bots. Be careful when adding large new test steps
479to all of the bots, because the GPU bots are a limited resource and do not
480currently have the capacity to absorb large new test suites. It is safer to get
481new tests running on the chromium.gpu.fyi waterfall first, and expand from there
482to the chromium.gpu waterfall (which will also make them run against every
Stephen Martinis089f5f02019-02-12 02:42:24483Chromium CL by virtue of the `linux-rel`, `mac-rel`, `win7-rel` and
484`android-marshmallow-arm64-rel` tryservers' mirroring of the bots on this
485waterfall – so be careful!).
Kai Ninomiyaa6429fb32018-03-30 01:30:56486
487Tryjobs which add new test steps to the chromium.gpu.json file will run those
488new steps during the tryjob, which helps ensure that the new test won't break
489once it starts running on the waterfall.
490
491Tryjobs which modify chromium.gpu.fyi.json can be sent to the
492`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
493`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
494break the FYI bots.
495
Kenneth Russellfa3ffde2018-10-24 21:24:38496## Debugging Pixel Test Failures on the GPU Bots
497
498If pixel tests fail on the bots, the stdout will contain text like:
499
500`See http://chromium-browser-gpu-tests.commondatastorage.googleapis.com/view_test_results.html?[HASH]`
501
502This link contains all of the failing tests' generated and reference
503images, and is useful for figuring out exactly what went wrong. [Issue
504898649](http://crbug.com/898649) tracks improving this user interface,
505so that the failures can be surfaced directly in the build logs rather
506than having to dig through stdout.
507
Kai Ninomiyaa6429fb32018-03-30 01:30:56508## Updating and Adding New Pixel Tests to the GPU Bots
509
510Adding new pixel tests which require reference images is a slightly more
511complex process than adding other kinds of tests which can validate their own
512correctness. There are a few reasons for this.
513
514* Reference image based pixel tests require different golden images for
515 different combinations of operating system, GPU, driver version, OS
516 version, and occasionally other variables.
517* The reference images must be generated by the main waterfall. The try
518 servers are not allowed to produce new reference images, only consume them.
519 The reason for this is that a patch sent to the try servers might cause an
520 incorrect reference image to be generated. For this reason, the main
521 waterfall bots upload reference images to cloud storage, and the try
522 servers download them and verify their results against them.
523* The try servers will fail if they run a pixel test requiring a reference
524 image that doesn't exist in cloud storage. This is deliberate, but needs
525 more thought; see [Issue 349262](http://crbug.com/349262).
526
527If a reference image based pixel test's result is going to change because of a
Xianzhu Wang519b3a4a2018-10-18 04:41:05528change in a third party repository (e.g. in ANGLE), updating the reference
529images is a slightly tricky process. Here's how to do it:
Kai Ninomiyaa6429fb32018-03-30 01:30:56530
Xianzhu Wang519b3a4a2018-10-18 04:41:05531* Mark the pixel test as failing **without platform condition** in the
532 [pixel test]'s [test expectations]
533* Commit the change to the third party repository, etc. which will change the
534 test's results
Kai Ninomiyaa6429fb32018-03-30 01:30:56535* Note that without the failure expectation, this commit would turn some bots
Xianzhu Wang519b3a4a2018-10-18 04:41:05536 red, e.g. an ANGLE change will turn the chromium.gpu.fyi bots red
537* Wait for the third party repository to roll into Chromium
Kai Ninomiyaa6429fb32018-03-30 01:30:56538* Commit a change incrementing the revision number associated with the test
539 in the [test pages]
540* Commit a second change removing the failure expectation, once all of the
541 bots on the main waterfall have generated new reference images. This change
542 should go through the commit queue cleanly.
543
Kai Ninomiyaa6429fb32018-03-30 01:30:56544When adding a brand new pixel test that uses a reference image, the steps are
545similar, but simpler:
546
Xianzhu Wang519b3a4a2018-10-18 04:41:05547* In the same commit which introduces the new test, mark the pixel test as
548 failing **without platform condition** in the [pixel test]'s [test
549 expectations]
Kai Ninomiyaa6429fb32018-03-30 01:30:56550* Wait for the reference images to be produced by all of the GPU bots on the
551 waterfalls (see [chromium-gpu-archive/reference-images])
552* Commit a change un-marking the test as failing
553
Xianzhu Wang519b3a4a2018-10-18 04:41:05554When making a Chromium-side (including Blink which is now in the same Chromium
555repository) change which changes the pixel tests' results:
Kai Ninomiyaa6429fb32018-03-30 01:30:56556
Xianzhu Wang519b3a4a2018-10-18 04:41:05557* In your CL, both mark the pixel test as failing **without platform
558 condition** in the [pixel test]'s [test expectations] and increment the
559 test's version number associated with the test in the [test pages]
Kai Ninomiyaa6429fb32018-03-30 01:30:56560* After your CL lands, land another CL removing the failure expectations. If
561 this second CL goes through the commit queue cleanly, you know reference
562 images were generated properly.
563
564In general, when adding a new pixel test, it's better to spot check a few
565pixels in the rendered image rather than using a reference image per platform.
566The [GPU rasterization test] is a good example of a recently added test which
567performs such spot checks.
568
Xianzhu Wang519b3a4a2018-10-18 04:41:05569[pixel test]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
Rakib M. Hasan2046a052019-05-13 23:33:15570[test expectations]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/test_expectations/pixel_expectations.txt
Xianzhu Wang519b3a4a2018-10-18 04:41:05571[test pages]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/pixel_test_pages.py
Kai Ninomiyaa6429fb32018-03-30 01:30:56572[cloud storage bucket]: https://console.developers.google.com/storage/chromium-gpu-archive/reference-images
573<!-- XXX: old link -->
574[GPU rasterization test]: http://src.chromium.org/viewvc/chrome/trunk/src/content/test/gpu/gpu_tests/gpu_rasterization.py
575
576## Stamping out Flakiness
577
578It's critically important to aggressively investigate and eliminate the root
579cause of any flakiness seen on the GPU bots. The bots have been known to run
580reliably for days at a time, and any flaky failures that are tolerated on the
581bots translate directly into instability of the browser experienced by
582customers. Critical bugs in subsystems like WebGL, affecting high-profile
583products like Google Maps, have escaped notice in the past because the bots
584were unreliable. After much re-work, the GPU bots are now among the most
585reliable automated test machines in the Chromium project. Let's keep them that
586way.
587
588Flakiness affecting the GPU tests can come in from highly unexpected sources.
589Here are some examples:
590
591* Intermittent pixel_test failures on Linux where the captured pixels were
592 black, caused by the Display Power Management System (DPMS) kicking in.
593 Disabled the X server's built-in screen saver on the GPU bots in response.
594* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
595 309093](http://crbug.com/309093) and related bugs).
596* Windows Audio system changes causing intermittent assertion failures in the
597 browser ([Issue 310838](http://crbug.com/310838)).
598* Enabling assertion failures in the C++ standard library on Linux causing
599 random assertion failures ([Issue 328249](http://crbug.com/328249)).
600* V8 bugs causing random crashes of the Maps pixel test (V8 issues
601 [3022](https://code.google.com/p/v8/issues/detail?id=3022),
602 [3174](https://code.google.com/p/v8/issues/detail?id=3174)).
603* TLS changes causing random browser process crashes ([Issue
604 264406](http://crbug.com/264406)).
605* Isolated test execution flakiness caused by failures to reliably clean up
606 temporary directories ([Issue 340415](http://crbug.com/340415)).
607* The Telemetry-based WebGL conformance suite caught a bug in the memory
608 allocator on Android not caught by any other bot ([Issue
609 347919](http://crbug.com/347919)).
610* context_lost test failures caused by the compositor's retry logic ([Issue
611 356453](http://crbug.com/356453)).
612* Multiple bugs in Chromium's support for lost contexts causing flakiness of
613 the context_lost tests ([Issue 365904](http://crbug.com/365904)).
614* Maps test timeouts caused by Content Security Policy changes in Blink
615 ([Issue 395914](http://crbug.com/395914)).
616* Weak pointer assertion failures in various webgl\_conformance\_tests caused
617 by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
618* A change to a default WebSocket timeout in Telemetry causing intermittent
619 failures to run all WebGL conformance tests on the Mac bots ([Issue
620 403981](http://crbug.com/403981)).
621* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
622 race condition that suddenly showed up ([Issue
623 424024](http://crbug.com/424024)).
624* Changes to Chrome's cross-context synchronization primitives causing the
625 wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
626* A bug in V8's handling of array literals causing flaky failures of
627 texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
628* Assertion failures in sync point management related to lost contexts that
629 exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
630* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
631 garbage collection ([Issue 609249](http://crbug.com/609249)).
Kenneth Russelld5efb3f2018-05-11 01:40:45632* A change to Blink's memory purging primitive which caused intermittent
633 timeouts of WebGL conformance tests on all platforms ([Issue
634 840988](http://crbug.com/840988)).
Kai Ninomiyaa6429fb32018-03-30 01:30:56635
636If you notice flaky test failures either on the GPU waterfalls or try servers,
637please file bugs right away with the component Internals>GPU>Testing and
638include links to the failing builds and copies of the logs, since the logs
639expire after a few days. [GPU pixel wranglers] should give the highest priority
640to eliminating flakiness on the tree.
641
642[GPU pixel wranglers]: pixel_wrangling.md