docs/devel/decodetree.rst

   1 ========================
   2 Decodetree Specification
   3 ========================
   4
   5 A *decodetree* is built from instruction *patterns*.  A pattern may
   6 represent a single architectural instruction or a group of same, depending
   7 on what is convenient for further processing.
   8
   9 Each pattern has both *fixedbits* and *fixedmask*, the combination of which
  10 describes the condition under which the pattern is matched::
  11
  12   (insn & fixedmask) == fixedbits
  13
  14 Each pattern may have *fields*, which are extracted from the insn and
  15 passed along to the translator.  Examples of such are registers,
  16 immediates, and sub-opcodes.
  17
  18 In support of patterns, one may declare *fields*, *argument sets*, and
  19 *formats*, each of which may be re-used to simplify further definitions.
  20
  21 Fields
  22 ======
  23
  24 Syntax::
  25
  26   field_def     := '%' identifier ( unnamed_field )* ( !function=identifier )?
  27   unnamed_field := number ':' ( 's' ) number
  28
  29 For *unnamed_field*, the first number is the least-significant bit position
  30 of the field and the second number is the length of the field.  If the 's' is
  31 present, the field is considered signed.  If multiple ``unnamed_fields`` are
  32 present, they are concatenated.  In this way one can define disjoint fields.
  33
  34 If ``!function`` is specified, the concatenated result is passed through the
  35 named function, taking and returning an integral value.
  36
  37 One may use ``!function`` with zero ``unnamed_fields``.  This case is called
  38 a *parameter*, and the named function is only passed the ``DisasContext``
  39 and returns an integral value extracted from there.
  40
  41 A field with no ``unnamed_fields`` and no ``!function`` is in error.
  42
  43 FIXME: the fields of the structure into which this result will be stored
  44 is restricted to ``int``.  Which means that we cannot expand 64-bit items.
  45
  46 Field examples:
  47
  48 +---------------------------+---------------------------------------------+
  49 | Input                     | Generated code                              |
  50 +===========================+=============================================+
  51 | %disp   0:s16             | sextract(i, 0, 16)                          |
  52 +---------------------------+---------------------------------------------+
  53 | %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
  54 +---------------------------+---------------------------------------------+
  55 | %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
  56 |                           |    extract(i, 1, 1) << 10 |                 |
  57 |                           |    extract(i, 2, 10)                        |
  58 +---------------------------+---------------------------------------------+
  59 | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
  60 |   !function=expand_shimm8 |               extract(i, 13, 1))            |
  61 +---------------------------+---------------------------------------------+
  62
  63 Argument Sets
  64 =============
  65
  66 Syntax::
  67
  68   args_def    := '&' identifier ( args_elt )+ ( !extern )?
  69   args_elt    := identifier
  70
  71 Each *args_elt* defines an argument within the argument set.
  72 Each argument set will be rendered as a C structure "arg_$name"
  73 with each of the fields being one of the member arguments.
  74
  75 If ``!extern`` is specified, the backing structure is assumed
  76 to have been already declared, typically via a second decoder.
  77
  78 Argument sets are useful when one wants to define helper functions
  79 for the translator functions that can perform operations on a common
  80 set of arguments.  This can ensure, for instance, that the ``AND``
  81 pattern and the ``OR`` pattern put their operands into the same named
  82 structure, so that a common ``gen_logic_insn`` may be able to handle
  83 the operations common between the two.
  84
  85 Argument set examples::
  86
  87   &reg3       ra rb rc
  88   &loadstore  reg base offset
  89
  90
  91 Formats
  92 =======
  93
  94 Syntax::
  95
  96   fmt_def      := '@' identifier ( fmt_elt )+
  97   fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
  98   fixedbit_elt := [01.-]+
  99   field_elt    := identifier ':' 's'? number
 100   field_ref    := '%' identifier | identifier '=' '%' identifier
 101   args_ref     := '&' identifier
 102
 103 Defining a format is a handy way to avoid replicating groups of fields
 104 across many instruction patterns.
 105
 106 A *fixedbit_elt* describes a contiguous sequence of bits that must
 107 be 1, 0, or don't care.  The difference between '.' and '-'
 108 is that '.' means that the bit will be covered with a field or a
 109 final 0 or 1 from the pattern, and '-' means that the bit is really
 110 ignored by the cpu and will not be specified.
 111
 112 A *field_elt* describes a simple field only given a width; the position of
 113 the field is implied by its position with respect to other *fixedbit_elt*
 114 and *field_elt*.
 115
 116 If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
 117 Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
 118
 119 A *field_ref* incorporates a field by reference.  This is the only way to
 120 add a complex field to a format.  A field may be renamed in the process
 121 via assignment to another identifier.  This is intended to allow the
 122 same argument set be used with disjoint named fields.
 123
 124 A single *args_ref* may specify an argument set to use for the format.
 125 The set of fields in the format must be a subset of the arguments in
 126 the argument set.  If an argument set is not specified, one will be
 127 inferred from the set of fields.
 128
 129 It is recommended, but not required, that all *field_ref* and *args_ref*
 130 appear at the end of the line, not interleaving with *fixedbit_elf* or
 131 *field_elt*.
 132
 133 Format examples::
 134
 135   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
 136   @opi    ...... ra:5 lit:8    1 ....... rc:5
 137
 138 Patterns
 139 ========
 140
 141 Syntax::
 142
 143   pat_def      := identifier ( pat_elt )+
 144   pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
 145   fmt_ref      := '@' identifier
 146   const_elt    := identifier '=' number
 147
 148 The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
 149 A pattern that does not specify a named format will have one inferred
 150 from a referenced argument set (if present) and the set of fields.
 151
 152 A *const_elt* allows a argument to be set to a constant value.  This may
 153 come in handy when fields overlap between patterns and one has to
 154 include the values in the *fixedbit_elt* instead.
 155
 156 The decoder will call a translator function for each pattern matched.
 157
 158 Pattern examples::
 159
 160   addl_r   010000 ..... ..... .... 0000000 ..... @opr
 161   addl_i   010000 ..... ..... .... 0000000 ..... @opi
 162
 163 which will, in part, invoke::
 164
 165   trans_addl_r(ctx, &arg_opr, insn)
 166
 167 and::
 168
 169   trans_addl_i(ctx, &arg_opi, insn)
 170
 171 Pattern Groups
 172 ==============
 173
 174 Syntax::
 175
 176   group            := overlap_group | no_overlap_group
 177   overlap_group    := '{' ( pat_def | group )+ '}'
 178   no_overlap_group := '[' ( pat_def | group )+ ']'
 179
 180 A *group* begins with a lone open-brace or open-bracket, with all
 181 subsequent lines indented two spaces, and ending with a lone
 182 close-brace or close-bracket.  Groups may be nested, increasing the
 183 required indentation of the lines within the nested group to two
 184 spaces per nesting level.
 185
 186 Patterns within overlap groups are allowed to overlap.  Conflicts are
 187 resolved by selecting the patterns in order.  If all of the fixedbits
 188 for a pattern match, its translate function will be called.  If the
 189 translate function returns false, then subsequent patterns within the
 190 group will be matched.
 191
 192 Patterns within no-overlap groups are not allowed to overlap, just
 193 the same as ungrouped patterns.  Thus no-overlap groups are intended
 194 to be nested inside overlap groups.
 195
 196 The following example from PA-RISC shows specialization of the *or*
 197 instruction::
 198
 199   {
 200     {
 201       nop   000010 ----- ----- 0000 001001 0 00000
 202       copy  000010 00000 r1:5  0000 001001 0 rt:5
 203     }
 204     or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
 205   }
 206
 207 When the *cf* field is zero, the instruction has no side effects,
 208 and may be specialized.  When the *rt* field is zero, the output
 209 is discarded and so the instruction has no effect.  When the *rt2*
 210 field is zero, the operation is ``reg[r1] | 0`` and so encodes
 211 the canonical register copy operation.
 212
 213 The output from the generator might look like::
 214
 215   switch (insn & 0xfc000fe0) {
 216   case 0x08000240:
 217     /* 000010.. ........ ....0010 010..... */
 218     if ((insn & 0x0000f000) == 0x00000000) {
 219         /* 000010.. ........ 00000010 010..... */
 220         if ((insn & 0x0000001f) == 0x00000000) {
 221             /* 000010.. ........ 00000010 01000000 */
 222             extract_decode_Fmt_0(&u.f_decode0, insn);
 223             if (trans_nop(ctx, &u.f_decode0)) return true;
 224         }
 225         if ((insn & 0x03e00000) == 0x00000000) {
 226             /* 00001000 000..... 00000010 010..... */
 227             extract_decode_Fmt_1(&u.f_decode1, insn);
 228             if (trans_copy(ctx, &u.f_decode1)) return true;
 229         }
 230     }
 231     extract_decode_Fmt_2(&u.f_decode2, insn);
 232     if (trans_or(ctx, &u.f_decode2)) return true;
 233     return false;
 234   }