Add some speed documentation: * README.md, which is intended to be the main landing site for go/chrome-speed * how_does_chrome_measure_performance.md, which matches our original outline pretty closely. * addressing_performance_regressions.md, which had a few differences from our original outline: - I felt it was better to put eveything in one doc, since I do stuff like refer to the bisect bot output in multiple sections. - I added a LOT more detail than I think we originally had slated. - I added a "it's not my cl!" section. There's still a long way to go, but I think this is a good base. [email protected],[email protected] Review-Url: https://codereview.chromium.org/2943013003 Cr-Commit-Position: refs/heads/master@{#480118}

commit: c3535a7dfd22fc8f19983c216c4ef8a52172a5f5 [log] [tgz]
author: sullivan <[email protected]> Fri Jun 16 19:12:51 2017
committer: Commit Bot <[email protected]> Fri Jun 16 19:12:51 2017
tree: 3be4f42b4e497c8358128912a8912ed2a196401a
parent: adca986a53b31b6da4cb22f8e755f6856daea89a [diff]
diff --git a/docs/speed/README.md b/docs/speed/README.md
new file mode 100644
index 0000000..2aaa2c9
--- /dev/null
+++ b/docs/speed/README.md

@@ -0,0 +1,33 @@
+# Chrome Speed
+
+## Contact information
+
+  * **Contact**: TBD
+  * **Escalation**: [email protected] (PM), [email protected] (TPM),
+    [email protected] (eng director)
+  * **[File a bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Speed%20Bug)**
+  * **Regression postmortem**: [template](https://docs.google.com/document/d/1fvfhFNOoUL9rB0XAEe1MYefyM_9yriR1IPjdxdm7PaQ/edit?disco=AAAABKdHwCg)
+
+## User Docs
+
+  * [How does Chrome measure performance?](how_does_chrome_measure_performance.md)
+  * [My CL caused a performance regression! What do I do?](addressing_performance_regressions.md)
+  * [I want Chrome to have better performance](help_improve_performance.md)
+  * [Perf sheriffing documentation](perf_regression_sheriffing.md)
+  * [I want to add tests or platforms to the perf waterfall](adding_tests_bots.md)
+  * [I'm looking for more information on the Speed Progam](speed_program.md)
+
+## Core Teams and Work
+
+  * **[Performance tracks](performance_tracks.md)**: Most of the performance
+    work on Chrome is organized into these tracks.
+  * **[Chrome Speed Operations](https://docs.google.com/document/d/1_qx6TV_N20V3bF3TQW74_9kluIJXAwkB8Lj2dLBRsRc/edit)**: provides the benchmarks, infrastructure, and
+    releasing oversight to track regressions.
+    <!--- TODO: General discussion: chrome-speed-operations mailing list link -->
+    <!--- TODO: Tracking releases and regressions: chrome-speed-releasing mailing list link -->
+    * Benchmark-specific discussion: [email protected]
+    <!--- TODO: Requests for new benchmarks: chrome-benchmarking-request mailing list link -->
+    * Performance dashboard, bisect, try jobs: [email protected]
+  * **[Chrome Speed Metrics](https://docs.google.com/document/d/1wBT5fauGf8bqW2Wcg2A5Z-3_ZvgPhE8fbp1Xe6xfGRs/edit#heading=h.8ieoiiwdknwt)**: provides a set of high-quality metrics that represent real-world user experience, and exposes these metrics to both Chrome and Web Developers.
+    * General discussion: [email protected]
+    * The actual metrics: [tracking](https://docs.google.com/spreadsheets/d/1gY5hkKPp8RNVqmOw1d-bo-f9EXLqtq4wa3Z7Q8Ek9Tk/edit#gid=0)
\ No newline at end of file

diff --git a/docs/speed/addressing_performance_regressions.md b/docs/speed/addressing_performance_regressions.md
new file mode 100644
index 0000000..dd75ce1
--- /dev/null
+++ b/docs/speed/addressing_performance_regressions.md

@@ -0,0 +1,214 @@
+# Addressing Performance Regressions
+
+The bisect bot just picked your CL as the culprit in a performance regression
+and assigned a bug to you! What should you do? Read on...
+
+## About our performance tests
+
+The [chromium.perf waterfall](perf_waterfall.md) is a continuous build which
+runs performance tests on dozens of devices across Windows, Mac, Linux, and
+Android Chrome and WebView. Often, a performance regression only affects a
+certain type of hardware or a certain operating system, which may be different
+than what you tested locally before landing your CL.
+
+Each test has an owner, named in
+[this spreasheet](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0),
+who you can cc on a performance bug if you have questions.
+
+## Understanding the bisect results
+
+The bisect bot spits out a comment on the bug that looks like this:
+
+```
+=== BISECT JOB RESULTS ===
+Perf regression found with culprit
+
+Suspected Commit
+Author : Your Name
+Commit : 15092e9195954cbc331cd58e344d0895fe03d0cd
+Date : Wed Jun 14 03:09:47 2017
+Subject: Your CL Description.
+
+Bisect Details
+Configuration: mac_pro_perf_bisect
+Benchmark : system_health.common_desktop
+Metric : timeToFirstContentfulPaint_avg/load_search/load_search_taobao
+Change : 15.25% | 1010.02 -> 1164.04
+
+Revision Result N
+chromium@479147 1010.02 +- 1535.41 14 good
+chromium@479209 699.332 +- 1282.01 6 good
+chromium@479240 383.617 +- 917.038 6 good
+chromium@479255 649.186 +- 1896.26 14 good
+chromium@479262 788.828 +- 1897.91 14 good
+chromium@479268 880.727 +- 2235.29 21 good
+chromium@479269 886.511 +- 1150.91 6 good
+chromium@479270 1164.04 +- 979.746 14 bad <--
+
+To Run This Test
+src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.taobao system_health.common_desktop
+```
+
+There's a lot of information packed in that bug comment! Here's a breakdown:
+
+  * **What regressed exactly?** The comment gives you several details:
+    * **The benchmark that regressed**: Under `Bisect Details`, you can see
+      `Benchmark :`. In this case, the `system_health.common_desktop`
+      benchmark regressed.
+    * **What platform did it regress on?** Under `Configuration`, you can find
+      some details on the bot that regressed. In this example, it is a Mac Pro
+      laptop.
+    * **How do I run that locally?** Follow the instructions under
+      `To Run This Test`. But be aware that if it regressed on Android and
+      you're developing on Windows, you may not be able to reproduce locally.
+      (See Debugging regressions below)
+    * **What is this testing?** Generally the metric
+      (`timeToFirstContentfulPaint_avg`) gives some information. If you're not
+      familiar, you can cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+      to ask for help.
+    * **How severe is this regression?** There are different axes on which to
+      answer that question:
+      * **How much did performance regress?** The bisect bot answers this both
+        in relative terms (`Change : 15.25%`) and absolute terms
+        (`1010.02 -> 1164.04`). To understand the absolute terms, you'll need
+        to look at the units on the performance graphs linked in comment #1
+        of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`).
+        In this example, the units are milliseconds; the time to load taobao
+        regressed from ~1.02 second to 1.16 seconds.
+      * **How widespread is the regression?** The graphs linked in comment #1
+        of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
+        will give you an idea how widespread the regression is. The `Bot`
+        column shows all the different bots the regression occurred on, and the
+        `Test` column shows the metrics it regressed on. Often, the same metric
+        is gathered on many different web pages. If you see a long list of
+        pages, it's likely that the regression affects most pages; if it's
+        short maybe your regression is an edge case.
+
+## Debugging regressions
+
+  * **How do I run the test locally???** Follow the instructions under
+    `To Run This Test` in the bisect comment. But be aware that regressions
+    are often hardware and/or platform-specific.
+  * **What do I do if I don't have the right hardware to test locally?** If
+    you don't have a local machine that matches the specs of the hardware that
+    regressed, you can run a perf tryjob on the same lab machines that ran the
+    bisect that blamed your CL.
+    [Here are the instructions for perf tryjobs](perf_trybots.md).
+    Drop the `perf_bisect` from the bot name and substitute dashes for
+    underscores to get the trybot name (`mac_pro_perf_bisect` -> `mac_pro`
+    in the example above).
+  * **Can I get a trace?** For most metrics, yes. Here are the steps:
+    1. Click on the `All graphs for this bug` link in comment #1. It should
+       look like this:
+       `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+    2. Select a bot/test combo that looks like what the bisect bot originally
+       caught. You might want to look through various regressions for a really
+       large increase.
+    3. On the graph, click on the exclamation point icon at the regression, and
+       a tooltip comes up. There is a "trace" link in the tooltip, click it to
+       open a the trace that was recorded during the performance test.
+  * **Wait, what's a trace?** See the
+    [documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
+    to learn how to use traces to debug performance issues.
+  * **Are there debugging tips specific to certain benchmarks?**
+    * **[Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)**
+    * **[Android binary size](apk_size_regressions.md)**
+
+## If you don't believe your CL could be the cause
+
+There are some clear reasons to believe the bisect bot made a mistake:
+
+  * Your CL changes a test or some code that isn't compiled on the platform
+    that regressed.
+  * Your CL is completely unrelated to the metric that regressed.
+  * You looked at the numbers the bisect spit out (see example above; the first
+    column is the revision, the second column is the value at that revision,
+    and the third column is the standard deviation), and:
+    * The change attributed to your CL seems well within the noise, or
+    * The change at your CL is an improvement (for example, the metric is bytes
+      of memory used, and the value goes **down** at your CL) or
+    * The change is far smaller that what's reported in the bug summary (for
+      example, the bug says there is a 15% memory regression but the bisect
+      found that your CL increases memory by 0.77%)
+
+Do the following:
+
+  * Add a comment to the bug explaining why you believe your CL is not the
+    cause of the regression.
+  * **Unassign yourself from the bug**. This lets our triage process know that
+    you are not actively working on the bug.
+  * Kick off another bisect. You can do this by:
+    1. Click on the `All graphs for this bug` link in comment #1. It should
+       look like this:
+       `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+    2. Sign in to the dashboard with your chromium.org account in the upper
+       right corner.
+    3. Select a bot/test combo that looks like what the bisect bot originally
+       caught. You might want to look through various regressions for a really
+       clear increase.
+    4. On the graph, click on the exclamation point icon at the regression, and
+       a tooltip comes up. Click the `Bisect` button on the tooltip.
+
+
+## If you believe the regression is justified
+
+Sometimes you are aware that your CL caused a performance regression, but you
+believe the CL should be landed as-is anyway. Chrome's
+[core principles](https://www.chromium.org/developers/core-principles) state:
+
+> If you make a change that regresses measured performance, you will be required to fix it or revert.
+
+**It is your responsibility to justify the regression.** You must add a comment
+on the bug explaining your justification clearly before WontFix-ing.
+
+Here are some common justification scenarios:
+
+  * **Your change regresses this metric, but is a net positive for performance.**
+    There are a few ways to demonstrate that this is true:
+    * **Use benchmark results.** If your change has a positive impact, there
+      should be clear improvements detected in benchmarks. You can look at all
+      the changes (positive and negative) the perf dashboard detected by
+      entering the commit position of a change into this url:
+      `https://chromeperf.appspot.com/group_report?rev=YOUR_COMMIT_POS_HERE`
+      All of these changes are generally changes found on a CL range, and may
+      not be attributable to your CL. You can bisect any of these to find if
+      your CL caused the improvement, just like you can bisect to find if it
+      caused the regression.
+    * **Use finch trial results.** There are some types of changes that cannot
+      be measured well in benchmarks. If you believe your case falls into this
+      category, you can show that end users are not affected via a finch trial.
+      See the "End-user metrics" section of
+      [How does Chrome measure performance](how_does_chrome_measure_performance.md)
+  * **Your change is a critical correctness or security fix.**
+    It's true that sometimes something was "fast" because it was implemented
+    incorrectly. In this case, a justification should clarify the performance
+    cost we are paying for the fix and why it is worth it. Some things to
+    include:
+    * **What did the benchmark regression cost?** Look at the
+      list of regressions in bug comment 1:
+      `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+      What is the absolute cost (5MiB RAM? 200ms on page load?)
+      How many pages regressed? How many platforms?
+    * **What do we gain?** It could be something like:
+      * Reduced code complexity
+      * Optimal code or UI correctness
+      * Additional security
+      * Knowledge via an experiment
+      * Marketing - something good for users
+    * **Is there a more performant way to solve the problem?**
+      The [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+      can generally give you an idea how much work it would take to make a
+      similarly-sized performance gain. For example, it might take 1.5
+      engineering years to save 3MiB of RAM on Android; could you solve the
+      problem in a way that takes less memory than that in less than 1.5 years?
+  * **This performance metric is incorrect.** Not all tests are perfect. It's
+    possible that your change did not regress performance, and only appears to
+    be a problem because the test is measuring incorrectly. If this is the
+    case, you must explain clearly what the issue with the test is, and why you
+    believe your change is performance neutral. Please include data from traces
+    or other performance tools to clarify your claim.
+
+**In all cases,** make sure to cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+when writing a justification and WontFix-ing a bug. If you cannot come to an
+agreement with the benchmark owner, you can escalate to [email protected],
+the owner of speed releasing.
\ No newline at end of file

diff --git a/docs/speed/how_does_chrome_measure_performance.md b/docs/speed/how_does_chrome_measure_performance.md
new file mode 100644
index 0000000..6b61984
--- /dev/null
+++ b/docs/speed/how_does_chrome_measure_performance.md

@@ -0,0 +1,63 @@
+# How Chrome Measures Performance
+
+Chrome collects performance data both in the lab, and from end users. There are
+thousands of individual metrics. This is an overview of how to sort through
+them at a high level.
+
+## Tracks and Metrics
+
+At a high level, performance work in Chrome is categorized into **tracks**,
+like loading, memory, and power. Each track has high-level metrics associated
+with it.
+
+  * **[An overview of tracks](performance_tracks.md)**: lists the tracks and key contact points.
+  * **[Speed Launch Metrics](https://docs.google.com/document/d/1Ww487ZskJ-xBmJGwPO-XPz_QcJvw-kSNffm0nPhVpj8/edit):
+    the important high-level metrics we measure for each track.
+
+## Laboratory Metrics
+
+Chrome has multiple performance labs in which benchmarks are run on continuous
+builds to pinpoint performance regressions down to individual changelists.
+
+### The chromium.perf lab
+
+The main lab for performance monitoring is chromium.perf. It continuously tests
+chromium commits and is monitored by the perf sheriff rotation.
+
+  * **[What is the perf waterfall?](perf_waterfall.md)** An overview of the
+    waterfall that runs the continuous build.
+  * **[How telemetry works](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md):
+    An overview of telemetry, our performance testing harness.
+  * **[How perf bisects work](bisects.md): An overview of the bisect bots,
+    which narrow down regressions over a CL range to a specific commit.
+  * **Benchmarks*
+    * **[Benchmark Policy](https://docs.google.com/document/d/1ni2MIeVnlH4bTj4yvEDMVNxgL73PqK_O9_NUm3NW3BA/edit)**:
+      An overview of the benchmark harnesses available in Chrome, and how to
+      find the right place to add a new test case.
+    * **[System health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782)**:
+      The system health benchmarks measure the speed launch metrics on
+      real-world web use scenarios.
+  * **[How to run on perf trybots](perf_trybots.md)**: Have an unsubmitted
+    CL and want to run benchmarks on it? Need to try a variety of hardware and
+    operating systems? Use the perf trybots.
+  * **[How to run telemetry locally](https://github.com/catapult-project/catapult/blob/master/telemetry/docs/run_benchmarks_locally.md)**:
+    Instructions on running telemetry benchmarks on your local machine.
+  * **[List of platforms in the lab](perf_lab_platforms.md)**: Devices,
+    configurations, and OSes the chromium.perf lab tests on.
+
+### Other performance labs
+
+There are several other performance labs for specialized use:
+
+  * **[Lab Spotlight: AV Lab (Googlers only)](http://goto.google.com/av-analysis-service)**:
+    Learn all about audio/video quality testing.
+  * **[Lab Spotlight: Cluster telemetry](https://docs.google.com/document/d/1GhqosQcwsy6F-eBAmFn_ITDF7_Iv_rY9FhCKwAnk9qQ/edit)**:
+    Need to run a performance test over thousands of pages? Check out cluster
+    telemetry!
+
+## End-user metrics
+
+The **[Speed Launch Metrics](https://docs.google.com/document/d/1Ww487ZskJ-xBmJGwPO-XPz_QcJvw-kSNffm0nPhVpj8/edit)**
+doc explains metrics available in UMA for end user performance. If you want to
+test how your change impacts these metrics for end users, you'll probably want
+to **[Run a Finch Trial](http://goto.google.com/finch101)**.
\ No newline at end of file
commit	c3535a7dfd22fc8f19983c216c4ef8a52172a5f5	[log] [tgz]
author	sullivan <[email protected]>	Fri Jun 16 19:12:51 2017
committer	Commit Bot <[email protected]>	Fri Jun 16 19:12:51 2017
tree	3be4f42b4e497c8358128912a8912ed2a196401a
parent	adca986a53b31b6da4cb22f8e755f6856daea89a [diff]