docs/devel/decodetree.rst

   1 ========================
   2 Decodetree Specification
   3 ========================
   4
   5 A *decodetree* is built from instruction *patterns*.  A pattern may
   6 represent a single architectural instruction or a group of same, depending
   7 on what is convenient for further processing.
   8
   9 Each pattern has both *fixedbits* and *fixedmask*, the combination of which
  10 describes the condition under which the pattern is matched::
  11
  12   (insn & fixedmask) == fixedbits
  13
  14 Each pattern may have *fields*, which are extracted from the insn and
  15 passed along to the translator.  Examples of such are registers,
  16 immediates, and sub-opcodes.
  17
  18 In support of patterns, one may declare *fields*, *argument sets*, and
  19 *formats*, each of which may be re-used to simplify further definitions.
  20
  21 Fields
  22 ======
  23
  24 Syntax::
  25
  26   field_def     := '%' identifier ( field )* ( !function=identifier )?
  27   field         := unnamed_field | named_field
  28   unnamed_field := number ':' ( 's' ) number
  29   named_field   := identifier ':' ( 's' ) number
  30
  31 For *unnamed_field*, the first number is the least-significant bit position
  32 of the field and the second number is the length of the field.  If the 's' is
  33 present, the field is considered signed.
  34
  35 A *named_field* refers to some other field in the instruction pattern
  36 or format. Regardless of the length of the other field where it is
  37 defined, it will be inserted into this field with the specified
  38 signedness and bit width.
  39
  40 Field definitions that involve loops (i.e. where a field is defined
  41 directly or indirectly in terms of itself) are errors.
  42
  43 A format can include fields that refer to named fields that are
  44 defined in the instruction pattern(s) that use the format.
  45 Conversely, an instruction pattern can include fields that refer to
  46 named fields that are defined in the format it uses. However you
  47 cannot currently do both at once (i.e. pattern P uses format F; F has
  48 a field A that refers to a named field B that is defined in P, and P
  49 has a field C that refers to a named field D that is defined in F).
  50
  51 If multiple ``fields`` are present, they are concatenated.
  52 In this way one can define disjoint fields.
  53
  54 If ``!function`` is specified, the concatenated result is passed through the
  55 named function, taking and returning an integral value.
  56
  57 One may use ``!function`` with zero ``fields``.  This case is called
  58 a *parameter*, and the named function is only passed the ``DisasContext``
  59 and returns an integral value extracted from there.
  60
  61 A field with no ``fields`` and no ``!function`` is in error.
  62
  63 Field examples:
  64
  65 +---------------------------+---------------------------------------------+
  66 | Input                     | Generated code                              |
  67 +===========================+=============================================+
  68 | %disp   0:s16             | sextract(i, 0, 16)                          |
  69 +---------------------------+---------------------------------------------+
  70 | %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
  71 +---------------------------+---------------------------------------------+
  72 | %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
  73 |                           |    extract(i, 1, 1) << 10 |                 |
  74 |                           |    extract(i, 2, 10)                        |
  75 +---------------------------+---------------------------------------------+
  76 | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
  77 |   !function=expand_shimm8 |               extract(i, 13, 1))            |
  78 +---------------------------+---------------------------------------------+
  79 | %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      |
  80 |   !function=expand_sz_imm |               extract(a->sz, 0, 3))         |
  81 +---------------------------+---------------------------------------------+
  82
  83 Argument Sets
  84 =============
  85
  86 Syntax::
  87
  88   args_def    := '&' identifier ( args_elt )+ ( !extern )?
  89   args_elt    := identifier (':' identifier)?
  90
  91 Each *args_elt* defines an argument within the argument set.
  92 If the form of the *args_elt* contains a colon, the first
  93 identifier is the argument name and the second identifier is
  94 the argument type.  If the colon is missing, the argument
  95 type will be ``int``.
  96
  97 Each argument set will be rendered as a C structure "arg_$name"
  98 with each of the fields being one of the member arguments.
  99
 100 If ``!extern`` is specified, the backing structure is assumed
 101 to have been already declared, typically via a second decoder.
 102
 103 Argument sets are useful when one wants to define helper functions
 104 for the translator functions that can perform operations on a common
 105 set of arguments.  This can ensure, for instance, that the ``AND``
 106 pattern and the ``OR`` pattern put their operands into the same named
 107 structure, so that a common ``gen_logic_insn`` may be able to handle
 108 the operations common between the two.
 109
 110 Argument set examples::
 111
 112   &reg3       ra rb rc
 113   &loadstore  reg base offset
 114   &longldst   reg base offset:int64_t
 115
 116
 117 Formats
 118 =======
 119
 120 Syntax::
 121
 122   fmt_def      := '@' identifier ( fmt_elt )+
 123   fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
 124   fixedbit_elt := [01.-]+
 125   field_elt    := identifier ':' 's'? number
 126   field_ref    := '%' identifier | identifier '=' '%' identifier
 127   args_ref     := '&' identifier
 128
 129 Defining a format is a handy way to avoid replicating groups of fields
 130 across many instruction patterns.
 131
 132 A *fixedbit_elt* describes a contiguous sequence of bits that must
 133 be 1, 0, or don't care.  The difference between '.' and '-'
 134 is that '.' means that the bit will be covered with a field or a
 135 final 0 or 1 from the pattern, and '-' means that the bit is really
 136 ignored by the cpu and will not be specified.
 137
 138 A *field_elt* describes a simple field only given a width; the position of
 139 the field is implied by its position with respect to other *fixedbit_elt*
 140 and *field_elt*.
 141
 142 If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
 143 Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
 144
 145 A *field_ref* incorporates a field by reference.  This is the only way to
 146 add a complex field to a format.  A field may be renamed in the process
 147 via assignment to another identifier.  This is intended to allow the
 148 same argument set be used with disjoint named fields.
 149
 150 A single *args_ref* may specify an argument set to use for the format.
 151 The set of fields in the format must be a subset of the arguments in
 152 the argument set.  If an argument set is not specified, one will be
 153 inferred from the set of fields.
 154
 155 It is recommended, but not required, that all *field_ref* and *args_ref*
 156 appear at the end of the line, not interleaving with *fixedbit_elf* or
 157 *field_elt*.
 158
 159 Format examples::
 160
 161   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
 162   @opi    ...... ra:5 lit:8    1 ....... rc:5
 163
 164 Patterns
 165 ========
 166
 167 Syntax::
 168
 169   pat_def      := identifier ( pat_elt )+
 170   pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
 171   fmt_ref      := '@' identifier
 172   const_elt    := identifier '=' number
 173
 174 The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
 175 A pattern that does not specify a named format will have one inferred
 176 from a referenced argument set (if present) and the set of fields.
 177
 178 A *const_elt* allows a argument to be set to a constant value.  This may
 179 come in handy when fields overlap between patterns and one has to
 180 include the values in the *fixedbit_elt* instead.
 181
 182 The decoder will call a translator function for each pattern matched.
 183
 184 Pattern examples::
 185
 186   addl_r   010000 ..... ..... .... 0000000 ..... @opr
 187   addl_i   010000 ..... ..... .... 0000000 ..... @opi
 188
 189 which will, in part, invoke::
 190
 191   trans_addl_r(ctx, &arg_opr, insn)
 192
 193 and::
 194
 195   trans_addl_i(ctx, &arg_opi, insn)
 196
 197 Pattern Groups
 198 ==============
 199
 200 Syntax::
 201
 202   group            := overlap_group | no_overlap_group
 203   overlap_group    := '{' ( pat_def | group )+ '}'
 204   no_overlap_group := '[' ( pat_def | group )+ ']'
 205
 206 A *group* begins with a lone open-brace or open-bracket, with all
 207 subsequent lines indented two spaces, and ending with a lone
 208 close-brace or close-bracket.  Groups may be nested, increasing the
 209 required indentation of the lines within the nested group to two
 210 spaces per nesting level.
 211
 212 Patterns within overlap groups are allowed to overlap.  Conflicts are
 213 resolved by selecting the patterns in order.  If all of the fixedbits
 214 for a pattern match, its translate function will be called.  If the
 215 translate function returns false, then subsequent patterns within the
 216 group will be matched.
 217
 218 Patterns within no-overlap groups are not allowed to overlap, just
 219 the same as ungrouped patterns.  Thus no-overlap groups are intended
 220 to be nested inside overlap groups.
 221
 222 The following example from PA-RISC shows specialization of the *or*
 223 instruction::
 224
 225   {
 226     {
 227       nop   000010 ----- ----- 0000 001001 0 00000
 228       copy  000010 00000 r1:5  0000 001001 0 rt:5
 229     }
 230     or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
 231   }
 232
 233 When the *cf* field is zero, the instruction has no side effects,
 234 and may be specialized.  When the *rt* field is zero, the output
 235 is discarded and so the instruction has no effect.  When the *rt2*
 236 field is zero, the operation is ``reg[r1] | 0`` and so encodes
 237 the canonical register copy operation.
 238
 239 The output from the generator might look like::
 240
 241   switch (insn & 0xfc000fe0) {
 242   case 0x08000240:
 243     /* 000010.. ........ ....0010 010..... */
 244     if ((insn & 0x0000f000) == 0x00000000) {
 245         /* 000010.. ........ 00000010 010..... */
 246         if ((insn & 0x0000001f) == 0x00000000) {
 247             /* 000010.. ........ 00000010 01000000 */
 248             extract_decode_Fmt_0(&u.f_decode0, insn);
 249             if (trans_nop(ctx, &u.f_decode0)) return true;
 250         }
 251         if ((insn & 0x03e00000) == 0x00000000) {
 252             /* 00001000 000..... 00000010 010..... */
 253             extract_decode_Fmt_1(&u.f_decode1, insn);
 254             if (trans_copy(ctx, &u.f_decode1)) return true;
 255         }
 256     }
 257     extract_decode_Fmt_2(&u.f_decode2, insn);
 258     if (trans_or(ctx, &u.f_decode2)) return true;
 259     return false;
 260   }