docs/AMDGPUOperandSyntax.rst

   1 =====================================
   2 Syntax of AMDGPU Instruction Operands
   3 =====================================
   4
   5 .. contents::
   6    :local:
   7
   8 Conventions
   9 ===========
  10
  11 The following notation is used throughout this document:
  12
  13     =================== =============================================================================
  14     Notation            Description
  15     =================== =============================================================================
  16     {0..N}              Any integer value in the range from 0 to N (inclusive).
  17     <x>                 Syntax and meaning of *x* is explained elsewhere.
  18     =================== =============================================================================
  19
  20 .. _amdgpu_syn_operands:
  21
  22 Operands
  23 ========
  24
  25 .. _amdgpu_synid_v:
  26
  27 v
  28 -
  29
  30 Vector registers. There are 256 32-bit vector registers.
  31
  32 A sequence of *vector* registers may be used to operate with more than 32 bits of data.
  33
  34 Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
  35
  36     =================================================== ====================================================================
  37     Syntax                                              Description
  38     =================================================== ====================================================================
  39     **v**\<N>                                           A single 32-bit *vector* register.
  40
  41                                                         *N* must be a decimal
  42                                                         :ref:`integer number<amdgpu_synid_integer_number>`.
  43     **v[**\ <N>\ **]**                                  A single 32-bit *vector* register.
  44
  45                                                         *N* may be specified as an
  46                                                         :ref:`integer number<amdgpu_synid_integer_number>`
  47                                                         or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
  48     **v[**\ <N>:<K>\ **]**                              A sequence of (\ *K-N+1*\ ) *vector* registers.
  49
  50                                                         *N* and *K* may be specified as
  51                                                         :ref:`integer numbers<amdgpu_synid_integer_number>`
  52                                                         or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
  53     **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *vector* registers.
  54
  55                                                         Register indices must be specified as decimal
  56                                                         :ref:`integer numbers<amdgpu_synid_integer_number>`.
  57     =================================================== ====================================================================
  58
  59 Note: *N* and *K* must satisfy the following conditions:
  60
  61 * *N* <= *K*.
  62 * 0 <= *N* <= 255.
  63 * 0 <= *K* <= 255.
  64 * *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
  65
  66 Examples:
  67
  68 .. parsed-literal::
  69
  70   v255
  71   v[0]
  72   v[0:1]
  73   v[1:1]
  74   v[0:3]
  75   v[2*2]
  76   v[1-1:2-1]
  77   [v252]
  78   [v252,v253,v254,v255]
  79
  80 .. _amdgpu_synid_nsa:
  81
  82 GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
  83
  84     ===================================== =================================================
  85     Syntax                                Description
  86     ===================================== =================================================
  87     **[Vm**, \ **Vn**, ... **Vk**\ **]**  A sequence of 32-bit *vector* registers.
  88                                           Each register may be specified using a syntax
  89                                           defined :ref:`above<amdgpu_synid_v>`.
  90
  91                                           In contrast with standard syntax, registers
  92                                           in *NSA* sequence are not required to have
  93                                           consecutive indices. Moreover, the same register
  94                                           may appear in the list more than once.
  95     ===================================== =================================================
  96
  97 Examples:
  98
  99 .. parsed-literal::
 100
 101   [v32,v1,v[2]]
 102   [v[32],v[1:1],[v2]]
 103   [v4,v4,v4,v4]
 104
 105 .. _amdgpu_synid_s:
 106
 107 s
 108 -
 109
 110 Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
 111
 112     ======= ============================
 113     GPU     Number of *scalar* registers
 114     ======= ============================
 115     GFX7    104
 116     GFX8    102
 117     GFX9    102
 118     GFX10   106
 119     ======= ============================
 120
 121 A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
 122 Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
 123
 124 Pairs of *scalar* registers must be even-aligned (the first register must be even).
 125 Sequences of 4 and more *scalar* registers must be quad-aligned.
 126
 127     ======================================================== ====================================================================
 128     Syntax                                                   Description
 129     ======================================================== ====================================================================
 130     **s**\ <N>                                               A single 32-bit *scalar* register.
 131
 132                                                              *N* must be a decimal
 133                                                              :ref:`integer number<amdgpu_synid_integer_number>`.
 134
 135     **s[**\ <N>\ **]**                                       A single 32-bit *scalar* register.
 136
 137                                                              *N* may be specified as an
 138                                                              :ref:`integer number<amdgpu_synid_integer_number>`
 139                                                              or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 140     **s[**\ <N>:<K>\ **]**                                   A sequence of (\ *K-N+1*\ ) *scalar* registers.
 141
 142                                                              *N* and *K* may be specified as
 143                                                              :ref:`integer numbers<amdgpu_synid_integer_number>`
 144                                                              or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 145
 146     **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]**       A sequence of (\ *K-N+1*\ ) *scalar* registers.
 147
 148                                                              Register indices must be specified as decimal
 149                                                              :ref:`integer numbers<amdgpu_synid_integer_number>`.
 150     ======================================================== ====================================================================
 151
 152 Note: *N* and *K* must satisfy the following conditions:
 153
 154 * *N* must be properly aligned based on sequence size.
 155 * *N* <= *K*.
 156 * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
 157 * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
 158 * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
 159
 160 Examples:
 161
 162 .. parsed-literal::
 163
 164   s0
 165   s[0]
 166   s[0:1]
 167   s[1:1]
 168   s[0:3]
 169   s[2*2]
 170   s[1-1:2-1]
 171   [s4]
 172   [s4,s5,s6,s7]
 173
 174 Examples of *scalar* registers with an invalid alignment:
 175
 176 .. parsed-literal::
 177
 178   s[1:2]
 179   s[2:5]
 180
 181 .. _amdgpu_synid_trap:
 182
 183 trap
 184 ----
 185
 186 A set of trap handler registers:
 187
 188 * :ref:`ttmp<amdgpu_synid_ttmp>`
 189 * :ref:`tba<amdgpu_synid_tba>`
 190 * :ref:`tma<amdgpu_synid_tma>`
 191
 192 .. _amdgpu_synid_ttmp:
 193
 194 ttmp
 195 ----
 196
 197 Trap handler temporary scalar registers, 32-bits wide.
 198 The number of available *ttmp* registers depends on GPU:
 199
 200     ======= ===========================
 201     GPU     Number of *ttmp* registers
 202     ======= ===========================
 203     GFX7    12
 204     GFX8    12
 205     GFX9    16
 206     GFX10   16
 207     ======= ===========================
 208
 209 A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
 210 Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
 211
 212 Pairs of *ttmp* registers must be even-aligned (the first register must be even).
 213 Sequences of 4 and more *ttmp* registers must be quad-aligned.
 214
 215     ============================================================= ====================================================================
 216     Syntax                                                        Description
 217     ============================================================= ====================================================================
 218     **ttmp**\ <N>                                                 A single 32-bit *ttmp* register.
 219
 220                                                                   *N* must be a decimal
 221                                                                   :ref:`integer number<amdgpu_synid_integer_number>`.
 222     **ttmp[**\ <N>\ **]**                                         A single 32-bit *ttmp* register.
 223
 224                                                                   *N* may be specified as an
 225                                                                   :ref:`integer number<amdgpu_synid_integer_number>`
 226                                                                   or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 227     **ttmp[**\ <N>:<K>\ **]**                                     A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 228
 229                                                                   *N* and *K* may be specified as
 230                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`
 231                                                                   or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 232     **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]**   A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 233
 234                                                                   Register indices must be specified as decimal
 235                                                                   :ref:`integer numbers<amdgpu_synid_integer_number>`.
 236     ============================================================= ====================================================================
 237
 238 Note: *N* and *K* must satisfy the following conditions:
 239
 240 * *N* must be properly aligned based on sequence size.
 241 * *N* <= *K*.
 242 * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
 243 * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
 244 * *K-N+1* must be equal to 1, 2, 4, 8 or 16.
 245
 246 Examples:
 247
 248 .. parsed-literal::
 249
 250   ttmp0
 251   ttmp[0]
 252   ttmp[0:1]
 253   ttmp[1:1]
 254   ttmp[0:3]
 255   ttmp[2*2]
 256   ttmp[1-1:2-1]
 257   [ttmp4]
 258   [ttmp4,ttmp5,ttmp6,ttmp7]
 259
 260 Examples of *ttmp* registers with an invalid alignment:
 261
 262 .. parsed-literal::
 263
 264   ttmp[1:2]
 265   ttmp[2:5]
 266
 267 .. _amdgpu_synid_tba:
 268
 269 tba
 270 ---
 271
 272 Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
 273
 274     ================== ======================================================================= =============
 275     Syntax             Description                                                             Availability
 276     ================== ======================================================================= =============
 277     tba                64-bit *trap base address* register.                                    GFX7, GFX8
 278     [tba]              64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8
 279     [tba_lo,tba_hi]    64-bit *trap base address* register (an SP3 syntax).                    GFX7, GFX8
 280     ================== ======================================================================= =============
 281
 282 High and low 32 bits of *trap base address* may be accessed as separate registers:
 283
 284     ================== ======================================================================= =============
 285     Syntax             Description                                                             Availability
 286     ================== ======================================================================= =============
 287     tba_lo             Low 32 bits of *trap base address* register.                            GFX7, GFX8
 288     tba_hi             High 32 bits of *trap base address* register.                           GFX7, GFX8
 289     [tba_lo]           Low 32 bits of *trap base address* register (an SP3 syntax).            GFX7, GFX8
 290     [tba_hi]           High 32 bits of *trap base address* register (an SP3 syntax).           GFX7, GFX8
 291     ================== ======================================================================= =============
 292
 293 Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
 294 but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 295
 296 .. _amdgpu_synid_tma:
 297
 298 tma
 299 ---
 300
 301 Trap memory address, 64-bits wide.
 302
 303     ================= ======================================================================= ==================
 304     Syntax            Description                                                             Availability
 305     ================= ======================================================================= ==================
 306     tma               64-bit *trap memory address* register.                                  GFX7, GFX8
 307     [tma]             64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8
 308     [tma_lo,tma_hi]   64-bit *trap memory address* register (an SP3 syntax).                  GFX7, GFX8
 309     ================= ======================================================================= ==================
 310
 311 High and low 32 bits of *trap memory address* may be accessed as separate registers:
 312
 313     ================= ======================================================================= ==================
 314     Syntax            Description                                                             Availability
 315     ================= ======================================================================= ==================
 316     tma_lo            Low 32 bits of *trap memory address* register.                          GFX7, GFX8
 317     tma_hi            High 32 bits of *trap memory address* register.                         GFX7, GFX8
 318     [tma_lo]          Low 32 bits of *trap memory address* register (an SP3 syntax).          GFX7, GFX8
 319     [tma_hi]          High 32 bits of *trap memory address* register (an SP3 syntax).         GFX7, GFX8
 320     ================= ======================================================================= ==================
 321
 322 Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
 323 but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 324
 325 .. _amdgpu_synid_flat_scratch:
 326
 327 flat_scratch
 328 ------------
 329
 330 Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
 331
 332     ================================== ================================================================
 333     Syntax                             Description
 334     ================================== ================================================================
 335     flat_scratch                       64-bit *flat scratch* address register.
 336     [flat_scratch]                     64-bit *flat scratch* address register (an SP3 syntax).
 337     [flat_scratch_lo,flat_scratch_hi]  64-bit *flat scratch* address register (an SP3 syntax).
 338     ================================== ================================================================
 339
 340 High and low 32 bits of *flat scratch* address may be accessed as separate registers:
 341
 342     ========================= =========================================================================
 343     Syntax                    Description
 344     ========================= =========================================================================
 345     flat_scratch_lo           Low 32 bits of *flat scratch* address register.
 346     flat_scratch_hi           High 32 bits of *flat scratch* address register.
 347     [flat_scratch_lo]         Low 32 bits of *flat scratch* address register (an SP3 syntax).
 348     [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an SP3 syntax).
 349     ========================= =========================================================================
 350
 351 .. _amdgpu_synid_xnack:
 352
 353 xnack
 354 -----
 355
 356 Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
 357 received an *XNACK* due to a vector memory operation.
 358
 359 .. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
 360
 361 \
 362
 363     ============================== =====================================================
 364     Syntax                         Description
 365     ============================== =====================================================
 366     xnack_mask                     64-bit *xnack mask* register.
 367     [xnack_mask]                   64-bit *xnack mask* register (an SP3 syntax).
 368     [xnack_mask_lo,xnack_mask_hi]  64-bit *xnack mask* register (an SP3 syntax).
 369     ============================== =====================================================
 370
 371 High and low 32 bits of *xnack mask* may be accessed as separate registers:
 372
 373     ===================== ==============================================================
 374     Syntax                Description
 375     ===================== ==============================================================
 376     xnack_mask_lo         Low 32 bits of *xnack mask* register.
 377     xnack_mask_hi         High 32 bits of *xnack mask* register.
 378     [xnack_mask_lo]       Low 32 bits of *xnack mask* register (an SP3 syntax).
 379     [xnack_mask_hi]       High 32 bits of *xnack mask* register (an SP3 syntax).
 380     ===================== ==============================================================
 381
 382 .. _amdgpu_synid_vcc:
 383 .. _amdgpu_synid_vcc_lo:
 384
 385 vcc
 386 ---
 387
 388 Vector condition code, 64-bits wide. A bit mask with one bit per thread;
 389 it holds the result of a vector compare operation.
 390
 391 Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
 392
 393     ================ =========================================================================
 394     Syntax           Description
 395     ================ =========================================================================
 396     vcc              64-bit *vector condition code* register.
 397     [vcc]            64-bit *vector condition code* register (an SP3 syntax).
 398     [vcc_lo,vcc_hi]  64-bit *vector condition code* register (an SP3 syntax).
 399     ================ =========================================================================
 400
 401 High and low 32 bits of *vector condition code* may be accessed as separate registers:
 402
 403     ================ =========================================================================
 404     Syntax           Description
 405     ================ =========================================================================
 406     vcc_lo           Low 32 bits of *vector condition code* register.
 407     vcc_hi           High 32 bits of *vector condition code* register.
 408     [vcc_lo]         Low 32 bits of *vector condition code* register (an SP3 syntax).
 409     [vcc_hi]         High 32 bits of *vector condition code* register (an SP3 syntax).
 410     ================ =========================================================================
 411
 412 .. _amdgpu_synid_m0:
 413
 414 m0
 415 --
 416
 417 A 32-bit memory register. It has various uses,
 418 including register indexing and bounds checking.
 419
 420     =========== ===================================================
 421     Syntax      Description
 422     =========== ===================================================
 423     m0          A 32-bit *memory* register.
 424     [m0]        A 32-bit *memory* register (an SP3 syntax).
 425     =========== ===================================================
 426
 427 .. _amdgpu_synid_exec:
 428
 429 exec
 430 ----
 431
 432 Execute mask, 64-bits wide. A bit mask with one bit per thread,
 433 which is applied to vector instructions and controls which threads execute
 434 and which ignore the instruction.
 435
 436 Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
 437
 438     ===================== =================================================================
 439     Syntax                Description
 440     ===================== =================================================================
 441     exec                  64-bit *execute mask* register.
 442     [exec]                64-bit *execute mask* register (an SP3 syntax).
 443     [exec_lo,exec_hi]     64-bit *execute mask* register (an SP3 syntax).
 444     ===================== =================================================================
 445
 446 High and low 32 bits of *execute mask* may be accessed as separate registers:
 447
 448     ===================== =================================================================
 449     Syntax                Description
 450     ===================== =================================================================
 451     exec_lo               Low 32 bits of *execute mask* register.
 452     exec_hi               High 32 bits of *execute mask* register.
 453     [exec_lo]             Low 32 bits of *execute mask* register (an SP3 syntax).
 454     [exec_hi]             High 32 bits of *execute mask* register (an SP3 syntax).
 455     ===================== =================================================================
 456
 457 .. _amdgpu_synid_vccz:
 458
 459 vccz
 460 ----
 461
 462 A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
 463
 464 Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
 465
 466 .. _amdgpu_synid_execz:
 467
 468 execz
 469 -----
 470
 471 A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
 472
 473 Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
 474
 475 .. _amdgpu_synid_scc:
 476
 477 scc
 478 ---
 479
 480 A single bit flag indicating the result of a scalar compare operation.
 481
 482 .. _amdgpu_synid_lds_direct:
 483
 484 lds_direct
 485 ----------
 486
 487 A special operand which supplies a 32-bit value
 488 fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
 489
 490 .. _amdgpu_synid_null:
 491
 492 null
 493 ----
 494
 495 This is a special operand which may be used as a source or a destination.
 496
 497 When used as a destination, the result of the operation is discarded.
 498
 499 When used as a source, it supplies zero value.
 500
 501 GFX10 only.
 502
 503 .. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
 504
 505 .. _amdgpu_synid_constant:
 506
 507 inline constant
 508 ---------------
 509
 510 An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
 511 Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
 512
 513 Inline constants include:
 514
 515 * :ref:`iconst<amdgpu_synid_iconst>`
 516 * :ref:`fconst<amdgpu_synid_fconst>`
 517 * :ref:`ival<amdgpu_synid_ival>`
 518
 519 If a number may be encoded as either
 520 a :ref:`literal<amdgpu_synid_literal>` or
 521 a :ref:`constant<amdgpu_synid_constant>`,
 522 assembler selects the latter encoding as more efficient.
 523
 524 .. _amdgpu_synid_iconst:
 525
 526 iconst
 527 ~~~~~~
 528
 529 An :ref:`integer number<amdgpu_synid_integer_number>` or
 530 an :ref:`absolute expression<amdgpu_synid_absolute_expression>`
 531 encoded as an *inline constant*.
 532
 533 Only a small fraction of integer numbers may be encoded as *inline constants*.
 534 They are enumerated in the table below.
 535 Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 536
 537     ================================== ====================================
 538     Value                              Note
 539     ================================== ====================================
 540     {0..64}                            Positive integer inline constants.
 541     {-16..-1}                          Negative integer inline constants.
 542     ================================== ====================================
 543
 544 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
 545
 546 .. _amdgpu_synid_fconst:
 547
 548 fconst
 549 ~~~~~~
 550
 551 A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
 552 encoded as an *inline constant*.
 553
 554 Only a small fraction of floating-point numbers may be encoded as *inline constants*.
 555 They are enumerated in the table below.
 556 Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 557
 558     ===================== ===================================================== ==================
 559     Value                 Note                                                  Availability
 560     ===================== ===================================================== ==================
 561     0.0                   The same as integer constant 0.                       All GPUs
 562     0.5                   Floating-point constant 0.5                           All GPUs
 563     1.0                   Floating-point constant 1.0                           All GPUs
 564     2.0                   Floating-point constant 2.0                           All GPUs
 565     4.0                   Floating-point constant 4.0                           All GPUs
 566     -0.5                  Floating-point constant -0.5                          All GPUs
 567     -1.0                  Floating-point constant -1.0                          All GPUs
 568     -2.0                  Floating-point constant -2.0                          All GPUs
 569     -4.0                  Floating-point constant -4.0                          All GPUs
 570     0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8, GFX9, GFX10
 571     0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8, GFX9, GFX10
 572     0.15915494309189532   1.0/(2.0*pi).                                         GFX8, GFX9, GFX10
 573     ===================== ===================================================== ==================
 574
 575 .. WARNING:: GFX7 does not support inline constants for *f16* operands.
 576
 577 .. _amdgpu_synid_ival:
 578
 579 ival
 580 ~~~~
 581
 582 A symbolic operand encoded as an *inline constant*.
 583 These operands provide read-only access to H/W registers.
 584
 585     ======================== ================================================ =============
 586     Syntax                   Note                                             Availability
 587     ======================== ================================================ =============
 588     shared_base              Base address of shared memory region.            GFX9, GFX10
 589     shared_limit             Address of the end of shared memory region.      GFX9, GFX10
 590     private_base             Base address of private memory region.           GFX9, GFX10
 591     private_limit            Address of the end of private memory region.     GFX9, GFX10
 592     pops_exiting_wave_id     A dedicated counter for POPS.                    GFX9, GFX10
 593     ======================== ================================================ =============
 594
 595 .. _amdgpu_synid_literal:
 596
 597 literal
 598 -------
 599
 600 A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
 601 Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
 602
 603 If a number may be encoded as either
 604 a :ref:`literal<amdgpu_synid_literal>` or
 605 an :ref:`inline constant<amdgpu_synid_constant>`,
 606 assembler selects the latter encoding as more efficient.
 607
 608 Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
 609 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
 610 :ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
 611 :ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
 612
 613 An instruction may use only one literal but several operands may refer the same literal.
 614
 615 .. _amdgpu_synid_uimm8:
 616
 617 uimm8
 618 -----
 619
 620 A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
 621 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 622 The value must be in the range 0..0xFF.
 623
 624 .. _amdgpu_synid_uimm32:
 625
 626 uimm32
 627 ------
 628
 629 A 32-bit :ref:`integer number<amdgpu_synid_integer_number>`
 630 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 631 The value must be in the range 0..0xFFFFFFFF.
 632
 633 .. _amdgpu_synid_uimm20:
 634
 635 uimm20
 636 ------
 637
 638 A 20-bit :ref:`integer number<amdgpu_synid_integer_number>`
 639 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 640
 641 The value must be in the range 0..0xFFFFF.
 642
 643 .. _amdgpu_synid_uimm21:
 644
 645 uimm21
 646 ------
 647
 648 A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
 649 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 650
 651 The value must be in the range 0..0x1FFFFF.
 652
 653 .. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
 654
 655 .. _amdgpu_synid_simm21:
 656
 657 simm21
 658 ------
 659
 660 A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`
 661 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 662
 663 The value must be in the range -0x100000..0x0FFFFF.
 664
 665 .. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
 666
 667 .. _amdgpu_synid_off:
 668
 669 off
 670 ---
 671
 672 A special entity which indicates that the value of this operand is not used.
 673
 674     ================================== ===================================================
 675     Syntax                             Description
 676     ================================== ===================================================
 677     off                                Indicates an unused operand.
 678     ================================== ===================================================
 679
 680
 681 .. _amdgpu_synid_number:
 682
 683 Numbers
 684 =======
 685
 686 .. _amdgpu_synid_integer_number:
 687
 688 Integer Numbers
 689 ---------------
 690
 691 Integer numbers are 64 bits wide.
 692 They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
 693 as described :ref:`here<amdgpu_synid_int_conv>`.
 694
 695 Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
 696
 697     ============ =============================== ========
 698     Format       Syntax                          Example
 699     ============ =============================== ========
 700     Decimal      [-]?[1-9][0-9]*                 -1234
 701     Binary       [-]?0b[01]+                     0b1010
 702     Octal        [-]?0[0-7]+                     010
 703     Hexadecimal  [-]?0x[0-9a-fA-F]+              0xff
 704     \            [-]?[0x]?[0-9][0-9a-fA-F]*[hH]  0ffh
 705     ============ =============================== ========
 706
 707 .. _amdgpu_synid_floating-point_number:
 708
 709 Floating-Point Numbers
 710 ----------------------
 711
 712 All floating-point numbers are handled as double (64 bits wide).
 713 They are converted to
 714 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 715 as described :ref:`here<amdgpu_synid_fp_conv>`.
 716
 717 Floating-point numbers may be specified in hexadecimal and decimal formats:
 718
 719     ============ ======================================================== ====================== ====================
 720     Format       Syntax                                                   Examples               Note
 721     ============ ======================================================== ====================== ====================
 722     Decimal      [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)?                    -1.234, 234e2          Must include either
 723                                                                                                  a decimal separator
 724                                                                                                  or an exponent.
 725     Hexadecimal  [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+   -0x1afp-10, 0x.1afp10
 726     ============ ======================================================== ====================== ====================
 727
 728 .. _amdgpu_synid_expression:
 729
 730 Expressions
 731 ===========
 732
 733 An expression is evaluated to a 64-bit integer.
 734 Note that floating-point expressions are not supported.
 735
 736 There are two kinds of expressions:
 737
 738 * :ref:`Absolute<amdgpu_synid_absolute_expression>`.
 739 * :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
 740
 741 .. _amdgpu_synid_absolute_expression:
 742
 743 Absolute Expressions
 744 --------------------
 745
 746 The value of an absolute expression does not change after program relocation.
 747 Absolute expressions must not include unassigned and relocatable values
 748 such as labels.
 749
 750 Absolute expressions are evaluated to 64-bit integer values and converted to
 751 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 752 as described :ref:`here<amdgpu_synid_int_conv>`.
 753
 754 Examples:
 755
 756 .. parsed-literal::
 757
 758     x = -1
 759     y = x + 10
 760
 761 .. _amdgpu_synid_relocatable_expression:
 762
 763 Relocatable Expressions
 764 -----------------------
 765
 766 The value of a relocatable expression depends on program relocation.
 767
 768 Note that use of relocatable expressions is limited with branch targets
 769 and 32-bit integer operands.
 770
 771 A relocatable expression is evaluated to a 64-bit integer value
 772 which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
 773 of symbol(s) used in the expression. For example, if an instruction refers a label,
 774 this reference is evaluated to an offset from the address after the instruction
 775 to the label address:
 776
 777 .. parsed-literal::
 778
 779     label:
 780     v_add_co_u32_e32 v0, vcc, label, v1  // 'label' operand is evaluated to -4
 781
 782 Note that values of relocatable expressions are usually unknown at assembly time;
 783 they are resolved later by a linker and converted to
 784 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 785 as described :ref:`here<amdgpu_synid_rl_conv>`.
 786
 787 Operands and Operations
 788 -----------------------
 789
 790 Expressions are composed of 64-bit integer operands and operations.
 791 Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
 792 and :ref:`symbols<amdgpu_synid_symbol>`.
 793
 794 Expressions may also use "." which is a reference to the current PC (program counter).
 795
 796 :ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
 797 operations produce 64-bit integer results.
 798
 799 Syntax of Expressions
 800 ---------------------
 801
 802 The syntax of expressions is shown below::
 803
 804     expr ::= expr binop expr | primaryexpr ;
 805
 806     primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
 807
 808     binop ::= '&&'
 809             | '||'
 810             | '|'
 811             | '^'
 812             | '&'
 813             | '!'
 814             | '=='
 815             | '!='
 816             | '<>'
 817             | '<'
 818             | '<='
 819             | '>'
 820             | '>='
 821             | '<<'
 822             | '>>'
 823             | '+'
 824             | '-'
 825             | '*'
 826             | '/'
 827             | '%' ;
 828
 829     unop ::= '~'
 830            | '+'
 831            | '-'
 832            | '!' ;
 833
 834 .. _amdgpu_synid_expression_bin_op:
 835
 836 Binary Operators
 837 ----------------
 838
 839 Binary operators are described in the following table.
 840 They operate on and produce 64-bit integers.
 841 Operators with higher priority are performed first.
 842
 843     ========== ========= ===============================================
 844     Operator   Priority  Meaning
 845     ========== ========= ===============================================
 846        \*         5      Integer multiplication.
 847        /          5      Integer division.
 848        %          5      Integer signed remainder.
 849        \+         4      Integer addition.
 850        \-         4      Integer subtraction.
 851        <<         3      Integer shift left.
 852        >>         3      Logical shift right.
 853        ==         2      Equality comparison.
 854        !=         2      Inequality comparison.
 855        <>         2      Inequality comparison.
 856        <          2      Signed less than comparison.
 857        <=         2      Signed less than or equal comparison.
 858        >          2      Signed greater than comparison.
 859        >=         2      Signed greater than or equal comparison.
 860       \|          1      Bitwise or.
 861        ^          1      Bitwise xor.
 862        &          1      Bitwise and.
 863        &&         0      Logical and.
 864        ||         0      Logical or.
 865     ========== ========= ===============================================
 866
 867 .. _amdgpu_synid_expression_un_op:
 868
 869 Unary Operators
 870 ---------------
 871
 872 Unary operators are described in the following table.
 873 They operate on and produce 64-bit integers.
 874
 875     ========== ===============================================
 876     Operator   Meaning
 877     ========== ===============================================
 878        !       Logical negation.
 879        ~       Bitwise negation.
 880        \+      Integer unary plus.
 881        \-      Integer unary minus.
 882     ========== ===============================================
 883
 884 .. _amdgpu_synid_symbol:
 885
 886 Symbols
 887 -------
 888
 889 A symbol is a named 64-bit integer value, representing a relocatable
 890 address or an absolute (non-relocatable) number.
 891
 892 Symbol names have the following syntax:
 893     ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
 894
 895 The table below provides several examples of syntax used for symbol definition.
 896
 897     ================ ==========================================================
 898     Syntax           Meaning
 899     ================ ==========================================================
 900     .globl <S>       Declares a global symbol S without assigning it a value.
 901     .set <S>, <E>    Assigns the value of an expression E to a symbol S.
 902     <S> = <E>        Assigns the value of an expression E to a symbol S.
 903     <S>:             Declares a label S and assigns it the current PC value.
 904     ================ ==========================================================
 905
 906 A symbol may be used before it is declared or assigned;
 907 unassigned symbols are assumed to be PC-relative.
 908
 909 Additional information about symbols may be found :ref:`here<amdgpu-symbols>`.
 910
 911 .. _amdgpu_synid_conv:
 912
 913 Type and Size Conversion
 914 ========================
 915
 916 This section describes what happens when a 64-bit
 917 :ref:`integer number<amdgpu_synid_integer_number>`, a
 918 :ref:`floating-point number<amdgpu_synid_floating-point_number>` or an
 919 :ref:`expression<amdgpu_synid_expression>`
 920 is used for an operand which has a different type or size.
 921
 922 .. _amdgpu_synid_int_conv:
 923
 924 Conversion of Integer Values
 925 ----------------------------
 926
 927 Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
 928 :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
 929 the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
 930
 931 1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
 932 (see the table below). There are two cases when this operation is enabled:
 933
 934     * The truncated bits are all 0.
 935     * The truncated bits are all 1 and the value after truncation has its MSB bit set.
 936
 937 In all other cases assembler triggers an error.
 938
 939 2. *Conversion*. The input value is converted to the expected type as described in the table below.
 940 Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
 941
 942     ============== ================= =============== ====================================================================
 943     Expected type  Truncation Width  Conversion      Description
 944     ============== ================= =============== ====================================================================
 945     i16, u16, b16  16                num.u16         Truncate to 16 bits.
 946     i32, u32, b32  32                num.u32         Truncate to 32 bits.
 947     i64            32                {-1,num.i32}    Truncate to 32 bits and then sign-extend the result to 64 bits.
 948     u64, b64       32                {0,num.u32}     Truncate to 32 bits and then zero-extend the result to 64 bits.
 949     f16            16                num.u16         Use low 16 bits as an f16 value.
 950     f32            32                num.u32         Use low 32 bits as an f32 value.
 951     f64            32                {num.u32,0}     Use low 32 bits of the number as high 32 bits
 952                                                      of the result; low 32 bits of the result are zeroed.
 953     ============== ================= =============== ====================================================================
 954
 955 Examples of enabled conversions:
 956
 957 .. parsed-literal::
 958
 959     // GFX9
 960
 961     v_add_u16 v0, -1, 0                   // src0 = 0xFFFF
 962     v_add_f16 v0, -1, 0                   // src0 = 0xFFFF (NaN)
 963                                           //
 964     v_add_u32 v0, -1, 0                   // src0 = 0xFFFFFFFF
 965     v_add_f32 v0, -1, 0                   // src0 = 0xFFFFFFFF (NaN)
 966                                           //
 967     v_add_u16 v0, 0xff00, v0              // src0 = 0xff00
 968     v_add_u16 v0, 0xffffffffffffff00, v0  // src0 = 0xff00
 969     v_add_u16 v0, -256, v0                // src0 = 0xff00
 970                                           //
 971     s_bfe_i64 s[0:1], 0xffefffff, s3      // src0 = 0xffffffffffefffff
 972     s_bfe_u64 s[0:1], 0xffefffff, s3      // src0 = 0x00000000ffefffff
 973     v_ceil_f64_e32 v[0:1], 0xffefffff     // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
 974                                           //
 975     x = 0xffefffff                        //
 976     s_bfe_i64 s[0:1], x, s3               // src0 = 0xffffffffffefffff
 977     s_bfe_u64 s[0:1], x, s3               // src0 = 0x00000000ffefffff
 978     v_ceil_f64_e32 v[0:1], x              // src0 = 0xffefffff00000000 (-1.7976922776554302e308)
 979
 980 Examples of disabled conversions:
 981
 982 .. parsed-literal::
 983
 984     // GFX9
 985
 986     v_add_u16 v0, 0x1ff00, v0               // truncated bits are not all 0 or 1
 987     v_add_u16 v0, 0xffffffffffff00ff, v0    // truncated bits do not match MSB of the result
 988
 989 .. _amdgpu_synid_fp_conv:
 990
 991 Conversion of Floating-Point Values
 992 -----------------------------------
 993
 994 Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
 995 These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
 996
 997 1. *Validation*. Assembler checks if the input f64 number can be converted
 998 to the *required floating-point type* (see the table below) without overflow or underflow.
 999 Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
1000
1001 2. *Conversion*. The input value is converted to the expected type as described in the table below.
1002 Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
1003
1004     ============== ================ ================= =================================================================
1005     Expected type  Required FP Type Conversion        Description
1006     ============== ================ ================= =================================================================
1007     i16, u16, b16  f16              f16(num)          Convert to f16 and use bits of the result as an integer value.
1008     i32, u32, b32  f32              f32(num)          Convert to f32 and use bits of the result as an integer value.
1009     i64, u64, b64  \-               \-                Conversion disabled.
1010     f16            f16              f16(num)          Convert to f16.
1011     f32            f32              f32(num)          Convert to f32.
1012     f64            f64              {num.u32.hi,0}    Use high 32 bits of the number as high 32 bits of the result;
1013                                                       zero-fill low 32 bits of the result.
1014
1015                                                       Note that the result may differ from the original number.
1016     ============== ================ ================= =================================================================
1017
1018 Examples of enabled conversions:
1019
1020 .. parsed-literal::
1021
1022     // GFX9
1023
1024     v_add_f16 v0, 1.0, 0        // src0 = 0x3C00 (1.0)
1025     v_add_u16 v0, 1.0, 0        // src0 = 0x3C00
1026                                 //
1027     v_add_f32 v0, 1.0, 0        // src0 = 0x3F800000 (1.0)
1028     v_add_u32 v0, 1.0, 0        // src0 = 0x3F800000
1029
1030                                 // src0 before conversion:
1031                                 //   1.7976931348623157e308 = 0x7fefffffffffffff
1032                                 // src0 after conversion:
1033                                 //   1.7976922776554302e308 = 0x7fefffff00000000
1034     v_ceil_f64 v[0:1], 1.7976931348623157e308
1035
1036     v_add_f16 v1, 65500.0, v2   // ok for f16.
1037     v_add_f32 v1, 65600.0, v2   // ok for f32, but would result in overflow for f16.
1038
1039 Examples of disabled conversions:
1040
1041 .. parsed-literal::
1042
1043     // GFX9
1044
1045     v_add_f16 v1, 65600.0, v2    // overflow
1046
1047 .. _amdgpu_synid_rl_conv:
1048
1049 Conversion of Relocatable Values
1050 --------------------------------
1051
1052 :ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>`
1053 may be used with 32-bit integer operands and jump targets.
1054
1055 When the value of a relocatable expression is resolved by a linker, it is
1056 converted as needed and truncated to the operand size. The conversion depends
1057 on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
1058
1059 For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
1060 this reference is evaluated to a 64-bit offset from the address after the
1061 instruction to the address being referenced, *counted in bytes*.
1062 Then the value is truncated to 32 bits and encoded as a literal:
1063
1064 .. parsed-literal::
1065
1066     expr = .
1067     v_add_co_u32_e32 v0, vcc, expr, v1  // 'expr' operand is evaluated to -4
1068                                         // and then truncated to 0xFFFFFFFC
1069
1070 As another example, when a branch instruction refers a label,
1071 this reference is evaluated to an offset from the address after the
1072 instruction to the label address, *counted in dwords*.
1073 Then the value is truncated to 16 bits:
1074
1075 .. parsed-literal::
1076
1077     label:
1078     s_branch label  // 'label' operand is evaluated to -1 and truncated to 0xFFFF