testing/perfdocs/generated/perf-sheriffing.rst

   1 ======================
   2 Performance Sheriffing
   3 ======================
   4
   5 .. contents::
   6     :depth: 3
   7
   8 1 Overview
   9 ----------
  10
  11 Performance sheriffs are responsible to make sure that performance changes in Firefox are detected
  12 and dealt with. They look at data and performance metrics produced by the performance testing frameworks
  13 and find regression, determine the root cause and get bugs on file to track all issues. The workflow we
  14 follow is shown below in our flowchart.
  15
  16 1.1 Flowchart
  17 ~~~~~~~~~~~~~
  18
  19 .. image:: ./flowchart.png
  20    :alt: Sheriffing Workflow Flowchart
  21    :align: center
  22
  23 The workflow of a sheriff is backfilling jobs to get the data, investigating that data, filing
  24 bugs/linking improvements based on the data, and following up with developers if needed.
  25
  26 1.2 Contacts and the Team
  27 ~~~~~~~~~~~~~~~~~~~~~~~~~
  28 In the event that you have an urgent issue and need help what can you do?
  29
  30 If you have a question about a bug that was filed and assigned to you reach out to the sheriff who
  31 filed the bug on matrix. If a performance sheriff is not responsive or you have a question about a bug
  32 send a message to the `performance sheriffs matrix channel <https://chat.mozilla.org/#/room/#perfsheriffs:mozilla.org>`_
  33 and tag the sheriff. If you still have no-one responding you can message any of the following people directly
  34 on slack or matrix:
  35
  36 - `@afinder <https://people.mozilla.org/p/afinder>`_
  37 - `@alexandrui <https://people.mozilla.org/p/alexandrui>`_
  38 - `@andra <https://people.mozilla.org/p/andraesanu>`_
  39 - `@andrej <https://people.mozilla.org/p/andrej>`_
  40 - `@beatrice <https://people.mozilla.org/p/bacasandrei>`_
  41 - `@sparky <https://people.mozilla.org/p/sparky>`_ (reach out to only if all others unreachable)
  42
  43 All of the team is in EET (Eastern European Time) except for @andrej and @sparky who are in EST (Eastern Standard Time).
  44
  45 1.3 Regression and improvement Definition
  46 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  47 Whenever we get a performance change we classify it as one of two things, either a regression (worse performance) or
  48 an improvement (better performance).
  49
  50 2 How to Investigate Alerts
  51 ---------------------------
  52 In this section we will go over how performance sheriffs investigate alerts.
  53
  54 2.1 Filtering and Reading Alerts
  55 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  56 On the `Perfherder <https://treeherder.mozilla.org/perfherder/alerts>`_ page you should see something like below:
  57
  58 .. image:: ./Alerts_view.png
  59   :alt: Alerts View Toolbar
  60   :align: center
  61
  62 After accessing the Perfherder alerts page make sure the filter (located in the top middle of the screenshot below)
  63 is set to show the correct alerts for sheriffing. The new alerts can be found when
  64 the **untriaged** option from the left-most dropdown is selected. As shown in the screenshot below:
  65
  66 .. image:: ./Alerts_view_toolbar.png
  67   :alt: Alerts View Toolbar
  68   :align: center
  69
  70 The rest of the dropdowns from left to right are as follows:
  71
  72 - **Testing harness**: altering this will take you to alerts generated on different harnesses
  73 - **The filter input**, where you can type some text and press enter to narrow down the alerts view
  74 - **"Hide downstream / reassigned to / invalid"**: enable this (recommended) to reduce clutter on the page
  75 - **"My alerts"**: only shows alerts assigned to you.
  76
  77 Below is a screenshot of an alert:
  78
  79 .. image:: ./single_alert.png
  80   :alt: Alerts View Toolbar
  81   :align: center
  82
  83 You can tell an alert by looking at the bold text, it will say "Alert #XXXXX", in each alert you have groupings of
  84 summaries of tests, and those tests:
  85
  86 - Can run on different platforms
  87 - Can share suite name (like tp5o)
  88 - Measure various metrics
  89 - Share the same framework
  90
  91 Going from left to right of the columns inside the alerts starting with test, we have:
  92
  93 - A blue hyperlink that links to the test documentation (if available)
  94 - The **platform** operating system
  95 - **Information** about the historical data distribution of that
  96 - Tags and options related to the test
  97
  98 2.2 Regressions vs Improvements
  99 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 100 First thing to note about how we investigate alerts is that **we prioritize handling regressions**! Unlike the
 101 **improvements** regressions ship bugs to users that if not addressed make our products worse and drive users away.
 102 After acknowledging an alert:
 103
 104 - Regressions go through multiple status changes (TODO: link to sections with multiple status changes) until they are finally resolved
 105 - An improvement has a single status of improvement
 106
 107 2.3 Framework Thresholds
 108 ~~~~~~~~~~~~~~~~~~~~~~~~
 109 Different frameworks test different things and the threshold for which the alerts are triggered and considered
 110 performance changes is different based on the harness:
 111
 112 - AWSY >= 0.25%
 113 - Build metrics installer size >= 100kb
 114 - Talos, Browsertime, Build Metrics >= 2%