Add some speed documentation:
* README.md, which is intended to be the main landing site for
go/chrome-speed
* how_does_chrome_measure_performance.md, which matches our original
outline pretty closely.
* addressing_performance_regressions.md, which had a few differences
from our original outline:
- I felt it was better to put eveything in one doc, since I do stuff
like refer to the bisect bot output in multiple sections.
- I added a LOT more detail than I think we originally had slated.
- I added a "it's not my cl!" section.
There's still a long way to go, but I think this is a good base.
[email protected],[email protected]
Review-Url: https://codereview.chromium.org/2943013003
Cr-Commit-Position: refs/heads/master@{#480118}
diff --git a/docs/speed/README.md b/docs/speed/README.md
new file mode 100644
index 0000000..2aaa2c9
--- /dev/null
+++ b/docs/speed/README.md
@@ -0,0 +1,33 @@
+# Chrome Speed
+
+## Contact information
+
+ * **Contact**: TBD
+ * **Escalation**: [email protected] (PM), [email protected] (TPM),
+ [email protected] (eng director)
+ * **[File a bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Speed%20Bug)**
+ * **Regression postmortem**: [template](https://docs.google.com/document/d/1fvfhFNOoUL9rB0XAEe1MYefyM_9yriR1IPjdxdm7PaQ/edit?disco=AAAABKdHwCg)
+
+## User Docs
+
+ * [How does Chrome measure performance?](how_does_chrome_measure_performance.md)
+ * [My CL caused a performance regression! What do I do?](addressing_performance_regressions.md)
+ * [I want Chrome to have better performance](help_improve_performance.md)
+ * [Perf sheriffing documentation](perf_regression_sheriffing.md)
+ * [I want to add tests or platforms to the perf waterfall](adding_tests_bots.md)
+ * [I'm looking for more information on the Speed Progam](speed_program.md)
+
+## Core Teams and Work
+
+ * **[Performance tracks](performance_tracks.md)**: Most of the performance
+ work on Chrome is organized into these tracks.
+ * **[Chrome Speed Operations](https://docs.google.com/document/d/1_qx6TV_N20V3bF3TQW74_9kluIJXAwkB8Lj2dLBRsRc/edit)**: provides the benchmarks, infrastructure, and
+ releasing oversight to track regressions.
+ <!--- TODO: General discussion: chrome-speed-operations mailing list link -->
+ <!--- TODO: Tracking releases and regressions: chrome-speed-releasing mailing list link -->
+ * Benchmark-specific discussion: [email protected]
+ <!--- TODO: Requests for new benchmarks: chrome-benchmarking-request mailing list link -->
+ * Performance dashboard, bisect, try jobs: [email protected]
+ * **[Chrome Speed Metrics](https://docs.google.com/document/d/1wBT5fauGf8bqW2Wcg2A5Z-3_ZvgPhE8fbp1Xe6xfGRs/edit#heading=h.8ieoiiwdknwt)**: provides a set of high-quality metrics that represent real-world user experience, and exposes these metrics to both Chrome and Web Developers.
+ * General discussion: [email protected]
+ * The actual metrics: [tracking](https://docs.google.com/spreadsheets/d/1gY5hkKPp8RNVqmOw1d-bo-f9EXLqtq4wa3Z7Q8Ek9Tk/edit#gid=0)
\ No newline at end of file
diff --git a/docs/speed/addressing_performance_regressions.md b/docs/speed/addressing_performance_regressions.md
new file mode 100644
index 0000000..dd75ce1
--- /dev/null
+++ b/docs/speed/addressing_performance_regressions.md
@@ -0,0 +1,214 @@
+# Addressing Performance Regressions
+
+The bisect bot just picked your CL as the culprit in a performance regression
+and assigned a bug to you! What should you do? Read on...
+
+## About our performance tests
+
+The [chromium.perf waterfall](perf_waterfall.md) is a continuous build which
+runs performance tests on dozens of devices across Windows, Mac, Linux, and
+Android Chrome and WebView. Often, a performance regression only affects a
+certain type of hardware or a certain operating system, which may be different
+than what you tested locally before landing your CL.
+
+Each test has an owner, named in
+[this spreasheet](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0),
+who you can cc on a performance bug if you have questions.
+
+## Understanding the bisect results
+
+The bisect bot spits out a comment on the bug that looks like this:
+
+```
+=== BISECT JOB RESULTS ===
+Perf regression found with culprit
+
+Suspected Commit
+Author : Your Name
+Commit : 15092e9195954cbc331cd58e344d0895fe03d0cd
+Date : Wed Jun 14 03:09:47 2017
+Subject: Your CL Description.
+
+Bisect Details
+Configuration: mac_pro_perf_bisect
+Benchmark : system_health.common_desktop
+Metric : timeToFirstContentfulPaint_avg/load_search/load_search_taobao
+Change : 15.25% | 1010.02 -> 1164.04
+
+Revision Result N
+chromium@479147 1010.02 +- 1535.41 14 good
+chromium@479209 699.332 +- 1282.01 6 good
+chromium@479240 383.617 +- 917.038 6 good
+chromium@479255 649.186 +- 1896.26 14 good
+chromium@479262 788.828 +- 1897.91 14 good
+chromium@479268 880.727 +- 2235.29 21 good
+chromium@479269 886.511 +- 1150.91 6 good
+chromium@479270 1164.04 +- 979.746 14 bad <--
+
+To Run This Test
+src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=load.search.taobao system_health.common_desktop
+```
+
+There's a lot of information packed in that bug comment! Here's a breakdown:
+
+ * **What regressed exactly?** The comment gives you several details:
+ * **The benchmark that regressed**: Under `Bisect Details`, you can see
+ `Benchmark :`. In this case, the `system_health.common_desktop`
+ benchmark regressed.
+ * **What platform did it regress on?** Under `Configuration`, you can find
+ some details on the bot that regressed. In this example, it is a Mac Pro
+ laptop.
+ * **How do I run that locally?** Follow the instructions under
+ `To Run This Test`. But be aware that if it regressed on Android and
+ you're developing on Windows, you may not be able to reproduce locally.
+ (See Debugging regressions below)
+ * **What is this testing?** Generally the metric
+ (`timeToFirstContentfulPaint_avg`) gives some information. If you're not
+ familiar, you can cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+ to ask for help.
+ * **How severe is this regression?** There are different axes on which to
+ answer that question:
+ * **How much did performance regress?** The bisect bot answers this both
+ in relative terms (`Change : 15.25%`) and absolute terms
+ (`1010.02 -> 1164.04`). To understand the absolute terms, you'll need
+ to look at the units on the performance graphs linked in comment #1
+ of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`).
+ In this example, the units are milliseconds; the time to load taobao
+ regressed from ~1.02 second to 1.16 seconds.
+ * **How widespread is the regression?** The graphs linked in comment #1
+ of the bug (`https://chromeperf.appspot.com/group_report?bug_id=XXX`)
+ will give you an idea how widespread the regression is. The `Bot`
+ column shows all the different bots the regression occurred on, and the
+ `Test` column shows the metrics it regressed on. Often, the same metric
+ is gathered on many different web pages. If you see a long list of
+ pages, it's likely that the regression affects most pages; if it's
+ short maybe your regression is an edge case.
+
+## Debugging regressions
+
+ * **How do I run the test locally???** Follow the instructions under
+ `To Run This Test` in the bisect comment. But be aware that regressions
+ are often hardware and/or platform-specific.
+ * **What do I do if I don't have the right hardware to test locally?** If
+ you don't have a local machine that matches the specs of the hardware that
+ regressed, you can run a perf tryjob on the same lab machines that ran the
+ bisect that blamed your CL.
+ [Here are the instructions for perf tryjobs](perf_trybots.md).
+ Drop the `perf_bisect` from the bot name and substitute dashes for
+ underscores to get the trybot name (`mac_pro_perf_bisect` -> `mac_pro`
+ in the example above).
+ * **Can I get a trace?** For most metrics, yes. Here are the steps:
+ 1. Click on the `All graphs for this bug` link in comment #1. It should
+ look like this:
+ `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+ 2. Select a bot/test combo that looks like what the bisect bot originally
+ caught. You might want to look through various regressions for a really
+ large increase.
+ 3. On the graph, click on the exclamation point icon at the regression, and
+ a tooltip comes up. There is a "trace" link in the tooltip, click it to
+ open a the trace that was recorded during the performance test.
+ * **Wait, what's a trace?** See the
+ [documentation on tracing](https://www.chromium.org/developers/how-tos/trace-event-profiling-tool)
+ to learn how to use traces to debug performance issues.
+ * **Are there debugging tips specific to certain benchmarks?**
+ * **[Memory](https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md)**
+ * **[Android binary size](apk_size_regressions.md)**
+
+## If you don't believe your CL could be the cause
+
+There are some clear reasons to believe the bisect bot made a mistake:
+
+ * Your CL changes a test or some code that isn't compiled on the platform
+ that regressed.
+ * Your CL is completely unrelated to the metric that regressed.
+ * You looked at the numbers the bisect spit out (see example above; the first
+ column is the revision, the second column is the value at that revision,
+ and the third column is the standard deviation), and:
+ * The change attributed to your CL seems well within the noise, or
+ * The change at your CL is an improvement (for example, the metric is bytes
+ of memory used, and the value goes **down** at your CL) or
+ * The change is far smaller that what's reported in the bug summary (for
+ example, the bug says there is a 15% memory regression but the bisect
+ found that your CL increases memory by 0.77%)
+
+Do the following:
+
+ * Add a comment to the bug explaining why you believe your CL is not the
+ cause of the regression.
+ * **Unassign yourself from the bug**. This lets our triage process know that
+ you are not actively working on the bug.
+ * Kick off another bisect. You can do this by:
+ 1. Click on the `All graphs for this bug` link in comment #1. It should
+ look like this:
+ `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+ 2. Sign in to the dashboard with your chromium.org account in the upper
+ right corner.
+ 3. Select a bot/test combo that looks like what the bisect bot originally
+ caught. You might want to look through various regressions for a really
+ clear increase.
+ 4. On the graph, click on the exclamation point icon at the regression, and
+ a tooltip comes up. Click the `Bisect` button on the tooltip.
+
+
+## If you believe the regression is justified
+
+Sometimes you are aware that your CL caused a performance regression, but you
+believe the CL should be landed as-is anyway. Chrome's
+[core principles](https://www.chromium.org/developers/core-principles) state:
+
+> If you make a change that regresses measured performance, you will be required to fix it or revert.
+
+**It is your responsibility to justify the regression.** You must add a comment
+on the bug explaining your justification clearly before WontFix-ing.
+
+Here are some common justification scenarios:
+
+ * **Your change regresses this metric, but is a net positive for performance.**
+ There are a few ways to demonstrate that this is true:
+ * **Use benchmark results.** If your change has a positive impact, there
+ should be clear improvements detected in benchmarks. You can look at all
+ the changes (positive and negative) the perf dashboard detected by
+ entering the commit position of a change into this url:
+ `https://chromeperf.appspot.com/group_report?rev=YOUR_COMMIT_POS_HERE`
+ All of these changes are generally changes found on a CL range, and may
+ not be attributable to your CL. You can bisect any of these to find if
+ your CL caused the improvement, just like you can bisect to find if it
+ caused the regression.
+ * **Use finch trial results.** There are some types of changes that cannot
+ be measured well in benchmarks. If you believe your case falls into this
+ category, you can show that end users are not affected via a finch trial.
+ See the "End-user metrics" section of
+ [How does Chrome measure performance](how_does_chrome_measure_performance.md)
+ * **Your change is a critical correctness or security fix.**
+ It's true that sometimes something was "fast" because it was implemented
+ incorrectly. In this case, a justification should clarify the performance
+ cost we are paying for the fix and why it is worth it. Some things to
+ include:
+ * **What did the benchmark regression cost?** Look at the
+ list of regressions in bug comment 1:
+ `https://chromeperf.appspot.com/group_report?bug_id=XXXX`
+ What is the absolute cost (5MiB RAM? 200ms on page load?)
+ How many pages regressed? How many platforms?
+ * **What do we gain?** It could be something like:
+ * Reduced code complexity
+ * Optimal code or UI correctness
+ * Additional security
+ * Knowledge via an experiment
+ * Marketing - something good for users
+ * **Is there a more performant way to solve the problem?**
+ The [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+ can generally give you an idea how much work it would take to make a
+ similarly-sized performance gain. For example, it might take 1.5
+ engineering years to save 3MiB of RAM on Android; could you solve the
+ problem in a way that takes less memory than that in less than 1.5 years?
+ * **This performance metric is incorrect.** Not all tests are perfect. It's
+ possible that your change did not regress performance, and only appears to
+ be a problem because the test is measuring incorrectly. If this is the
+ case, you must explain clearly what the issue with the test is, and why you
+ believe your change is performance neutral. Please include data from traces
+ or other performance tools to clarify your claim.
+
+**In all cases,** make sure to cc the [benchmark owner](https://docs.google.com/spreadsheets/d/1xaAo0_SU3iDfGdqDJZX_jRV0QtkufwHUKH3kQKF3YQs/edit#gid=0)
+when writing a justification and WontFix-ing a bug. If you cannot come to an
+agreement with the benchmark owner, you can escalate to [email protected],
+the owner of speed releasing.
\ No newline at end of file
diff --git a/docs/speed/how_does_chrome_measure_performance.md b/docs/speed/how_does_chrome_measure_performance.md
new file mode 100644
index 0000000..6b61984
--- /dev/null
+++ b/docs/speed/how_does_chrome_measure_performance.md
@@ -0,0 +1,63 @@
+# How Chrome Measures Performance
+
+Chrome collects performance data both in the lab, and from end users. There are
+thousands of individual metrics. This is an overview of how to sort through
+them at a high level.
+
+## Tracks and Metrics
+
+At a high level, performance work in Chrome is categorized into **tracks**,
+like loading, memory, and power. Each track has high-level metrics associated
+with it.
+
+ * **[An overview of tracks](performance_tracks.md)**: lists the tracks and key contact points.
+ * **[Speed Launch Metrics](https://docs.google.com/document/d/1Ww487ZskJ-xBmJGwPO-XPz_QcJvw-kSNffm0nPhVpj8/edit):
+ the important high-level metrics we measure for each track.
+
+## Laboratory Metrics
+
+Chrome has multiple performance labs in which benchmarks are run on continuous
+builds to pinpoint performance regressions down to individual changelists.
+
+### The chromium.perf lab
+
+The main lab for performance monitoring is chromium.perf. It continuously tests
+chromium commits and is monitored by the perf sheriff rotation.
+
+ * **[What is the perf waterfall?](perf_waterfall.md)** An overview of the
+ waterfall that runs the continuous build.
+ * **[How telemetry works](https://github.com/catapult-project/catapult/blob/master/telemetry/README.md):
+ An overview of telemetry, our performance testing harness.
+ * **[How perf bisects work](bisects.md): An overview of the bisect bots,
+ which narrow down regressions over a CL range to a specific commit.
+ * **Benchmarks*
+ * **[Benchmark Policy](https://docs.google.com/document/d/1ni2MIeVnlH4bTj4yvEDMVNxgL73PqK_O9_NUm3NW3BA/edit)**:
+ An overview of the benchmark harnesses available in Chrome, and how to
+ find the right place to add a new test case.
+ * **[System health benchmarks](https://docs.google.com/document/d/1BM_6lBrPzpMNMtcyi2NFKGIzmzIQ1oH3OlNG27kDGNU/edit?ts=57e92782)**:
+ The system health benchmarks measure the speed launch metrics on
+ real-world web use scenarios.
+ * **[How to run on perf trybots](perf_trybots.md)**: Have an unsubmitted
+ CL and want to run benchmarks on it? Need to try a variety of hardware and
+ operating systems? Use the perf trybots.
+ * **[How to run telemetry locally](https://github.com/catapult-project/catapult/blob/master/telemetry/docs/run_benchmarks_locally.md)**:
+ Instructions on running telemetry benchmarks on your local machine.
+ * **[List of platforms in the lab](perf_lab_platforms.md)**: Devices,
+ configurations, and OSes the chromium.perf lab tests on.
+
+### Other performance labs
+
+There are several other performance labs for specialized use:
+
+ * **[Lab Spotlight: AV Lab (Googlers only)](http://goto.google.com/av-analysis-service)**:
+ Learn all about audio/video quality testing.
+ * **[Lab Spotlight: Cluster telemetry](https://docs.google.com/document/d/1GhqosQcwsy6F-eBAmFn_ITDF7_Iv_rY9FhCKwAnk9qQ/edit)**:
+ Need to run a performance test over thousands of pages? Check out cluster
+ telemetry!
+
+## End-user metrics
+
+The **[Speed Launch Metrics](https://docs.google.com/document/d/1Ww487ZskJ-xBmJGwPO-XPz_QcJvw-kSNffm0nPhVpj8/edit)**
+doc explains metrics available in UMA for end user performance. If you want to
+test how your change impacts these metrics for end users, you'll probably want
+to **[Run a Finch Trial](http://goto.google.com/finch101)**.
\ No newline at end of file