taskcluster/docs/transforms.rst

   1 Transforms
   2 ==========
   3
   4 Many task kinds generate tasks by a process of transforming job descriptions
   5 into task definitions.  The basic operation is simple, although the sequence of
   6 transforms applied for a particular kind may not be!
   7
   8 Overview
   9 --------
  10
  11 To begin, a kind implementation generates a collection of items; see
  12 :doc:`loading`.  The items are simply Python dictionaries, and describe
  13 "semantically" what the resulting task or tasks should do.
  14
  15 The kind also defines a sequence of transformations.  These are applied, in
  16 order, to each item.  Early transforms might apply default values or break
  17 items up into smaller items (for example, chunking a test suite).  Later
  18 transforms rewrite the items entirely, with the final result being a task
  19 definition.
  20
  21 Transform Functions
  22 ...................
  23
  24 Each transformation looks like this:
  25
  26 .. code-block:: python
  27
  28     @transforms.add
  29     def transform_an_item(config, items):
  30         """This transform ..."""  # always a docstring!
  31         for item in items:
  32             # ..
  33             yield item
  34
  35 The ``config`` argument is a Python object containing useful configuration for
  36 the kind, and is a subclass of
  37 :class:`taskgraph.transforms.base.TransformConfig`, which specifies a few of
  38 its attributes.  Kinds may subclass and add additional attributes if necessary.
  39
  40 While most transforms yield one item for each item consumed, this is not always
  41 true: items that are not yielded are effectively filtered out.  Yielding
  42 multiple items for each consumed item implements item duplication; this is how
  43 test chunking is accomplished, for example.
  44
  45 The ``transforms`` object is an instance of
  46 :class:`taskgraph.transforms.base.TransformSequence`, which serves as a simple
  47 mechanism to combine a sequence of transforms into one.
  48
  49 Schemas
  50 .......
  51
  52 The items used in transforms are validated against some simple schemas at
  53 various points in the transformation process.  These schemas accomplish two
  54 things: they provide a place to add comments about the meaning of each field,
  55 and they enforce that the fields are actually used in the documented fashion.
  56
  57 Keyed By
  58 ........
  59
  60 Several fields in the input items can be "keyed by" another value in the item.
  61 For example, a test description's chunks may be keyed by ``test-platform``.
  62 In the item, this looks like:
  63
  64 .. code-block:: yaml
  65
  66     chunks:
  67         by-test-platform:
  68             linux64/debug: 12
  69             linux64/opt: 8
  70             android.*: 14
  71             default: 10
  72
  73 This is a simple but powerful way to encode business rules in the items
  74 provided as input to the transforms, rather than expressing those rules in the
  75 transforms themselves.  If you are implementing a new business rule, prefer
  76 this mode where possible.  The structure is easily resolved to a single value
  77 using :func:`taskgraph.transform.base.resolve_keyed_by`.
  78
  79 Exact matches are used immediately.  If no exact matches are found, each
  80 alternative is treated as a regular expression, matched against the whole
  81 value.  Thus ``android.*`` would match ``android-api-16/debug``.  If nothing
  82 matches as a regular expression, but there is a ``default`` alternative, it is
  83 used.  Otherwise, an exception is raised and graph generation stops.
  84
  85 Organization
  86 -------------
  87
  88 Task creation operates broadly in a few phases, with the interfaces of those
  89 stages defined by schemas.  The process begins with the raw data structures
  90 parsed from the YAML files in the kind configuration.  This data can processed
  91 by kind-specific transforms resulting, for test jobs, in a "test description".
  92 For non-test jobs, the next step is a "job description".  These transformations
  93 may also "duplicate" tasks, for example to implement chunking or several
  94 variations of the same task.
  95
  96 In any case, shared transforms then convert this into a "task description",
  97 which the task-generation transforms then convert into a task definition
  98 suitable for ``queue.createTask``.
  99
 100 Test Descriptions
 101 -----------------
 102
 103 Test descriptions specify how to run a unittest or talos run.  They aim to
 104 describe this abstractly, although in many cases the unique nature of
 105 invocation on different platforms leaves a lot of specific behavior in the test
 106 description, divided by ``by-test-platform``.
 107
 108 Test descriptions are validated to conform to the schema in
 109 ``taskcluster/taskgraph/transforms/tests.py``.  This schema is extensively
 110 documented and is a the primary reference for anyone modifying tests.
 111
 112 The output of ``tests.py`` is a task description.  Test dependencies are
 113 produced in the form of a dictionary mapping dependency name to task label.
 114
 115 Job Descriptions
 116 ----------------
 117
 118 A job description says what to run in the task.  It is a combination of a
 119 ``run`` section and all of the fields from a task description.  The run section
 120 has a ``using`` property that defines how this task should be run; for example,
 121 ``mozharness`` to run a mozharness script, or ``mach`` to run a mach command.
 122 The remainder of the run section is specific to the run-using implementation.
 123
 124 The effect of a job description is to say "run this thing on this worker".  The
 125 job description must contain enough information about the worker to identify
 126 the workerType and the implementation (docker-worker, generic-worker, etc.).
 127 Alternatively, job descriptions can specify the ``platforms`` field in
 128 conjunction with the  ``by-platform`` key to specify multiple workerTypes and
 129 implementations. Any other task-description information is passed along
 130 verbatim, although it is augmented by the run-using implementation.
 131
 132 The run-using implementations are all located in
 133 ``taskcluster/taskgraph/transforms/job``, along with the schemas for their
 134 implementations.  Those well-commented source files are the canonical
 135 documentation for what constitutes a job description, and should be considered
 136 part of the documentation.
 137
 138 following ``run-using`` are available
 139
 140   * ``hazard``
 141   * ``mach``
 142   * ``mozharness``
 143   * ``mozharness-test``
 144   * ``run-task``
 145   * ``spidermonkey`` or ``spidermonkey-package`` or ``spidermonkey-mozjs-crate`` or ``spidermonkey-rust-bindings``
 146   * ``debian-package``
 147   * ``toolchain-script``
 148   * ``always-optimized``
 149   * ``fetch-url``
 150   * ``python-test``
 151
 152
 153 Task Descriptions
 154 -----------------
 155
 156 Every kind needs to create tasks, and all of those tasks have some things in
 157 common.  They all run on one of a small set of worker implementations, each
 158 with their own idiosyncrasies.  And they all report to TreeHerder in a similar
 159 way.
 160
 161 The transforms in ``taskcluster/taskgraph/transforms/task.py`` implement
 162 this common functionality.  They expect a "task description", and produce a
 163 task definition.  The schema for a task description is defined at the top of
 164 ``task.py``, with copious comments.  Go forth and read it now!
 165
 166 In general, the task-description transforms handle functionality that is common
 167 to all Gecko tasks.  While the schema is the definitive reference, the
 168 functionality includes:
 169
 170 * TreeHerder metadata
 171
 172 * Build index routes
 173
 174 * Information about the projects on which this task should run
 175
 176 * Optimizations
 177
 178 * Defaults for ``expires-after`` and and ``deadline-after``, based on project
 179
 180 * Worker configuration
 181
 182 The parts of the task description that are specific to a worker implementation
 183 are isolated in a ``task_description['worker']`` object which has an
 184 ``implementation`` property naming the worker implementation.  Each worker
 185 implementation has its own section of the schema describing the fields it
 186 expects.  Thus the transforms that produce a task description must be aware of
 187 the worker implementation to be used, but need not be aware of the details of
 188 its payload format.
 189
 190 The ``task.py`` file also contains a dictionary mapping treeherder groups to
 191 group names using an internal list of group names.  Feel free to add additional
 192 groups to this list as necessary.
 193
 194 Signing Descriptions
 195 --------------------
 196
 197 Signing kinds are passed a single dependent job (from its kind dependency) to act
 198 on.
 199
 200 The transforms in ``taskcluster/taskgraph/transforms/signing.py`` implement
 201 this common functionality.  They expect a "signing description", and produce a
 202 task definition.  The schema for a signing description is defined at the top of
 203 ``signing.py``, with copious comments.
 204
 205 In particular you define a set of upstream artifact urls (that point at the
 206 dependent task) and can optionally provide a dependent name (defaults to build)
 207 for use in ``task-reference``/``artifact-reference``. You also need to provide
 208 the signing formats to use.
 209
 210 More Detail
 211 -----------
 212
 213 The source files provide lots of additional detail, both in the code itself and
 214 in the comments and docstrings.  For the next level of detail beyond this file,
 215 consult the transform source under ``taskcluster/taskgraph/transforms``.