docs/GlobalISel.rst

   1 ============================
   2 Global Instruction Selection
   3 ============================
   4
   5 .. contents::
   6    :local:
   7    :depth: 1
   8
   9 .. warning::
  10    This document is a work in progress.  It reflects the current state of the
  11    implementation, as well as open design and implementation issues.
  12
  13 Introduction
  14 ============
  15
  16 GlobalISel is a framework that provides a set of reusable passes and utilities
  17 for instruction selection --- translation from LLVM IR to target-specific
  18 Machine IR (MIR).
  19
  20 GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to
  21 solve three major problems:
  22
  23 * **Performance** --- SelectionDAG introduces a dedicated intermediate
  24   representation, which has a compile-time cost.
  25
  26   GlobalISel directly operates on the post-isel representation used by the
  27   rest of the code generator, MIR.
  28   It does require extensions to that representation to support arbitrary
  29   incoming IR: :ref:`gmir`.
  30
  31 * **Granularity** --- SelectionDAG and FastISel operate on individual basic
  32   blocks, losing some global optimization opportunities.
  33
  34   GlobalISel operates on the whole function.
  35
  36 * **Modularity** --- SelectionDAG and FastISel are radically different and share
  37   very little code.
  38
  39   GlobalISel is built in a way that enables code reuse. For instance, both the
  40   optimized and fast selectors share the :ref:`pipeline`, and targets can
  41   configure that pipeline to better suit their needs.
  42
  43
  44 .. _gmir:
  45
  46 Generic Machine IR
  47 ==================
  48
  49 Machine IR operates on physical registers, register classes, and (mostly)
  50 target-specific instructions.
  51
  52 To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
  53 Machine IR:
  54
  55 .. contents::
  56    :local:
  57
  58 ``NOTE``:
  59 The generic MIR (GMIR) representation still contains references to IR
  60 constructs (such as ``GlobalValue``).  Removing those should let us write more
  61 accurate tests, or delete IR after building the initial MIR.  However, it is
  62 not part of the GlobalISel effort.
  63
  64 .. _gmir-instructions:
  65
  66 Generic Instructions
  67 --------------------
  68
  69 The main addition is support for pre-isel generic machine instructions (e.g.,
  70 ``G_ADD``).  Like other target-independent instructions (e.g., ``COPY`` or
  71 ``PHI``), these are available on all targets.
  72
  73 ``TODO``:
  74 While we're progressively adding instructions, one kind in particular exposes
  75 interesting problems: compares and how to represent condition codes.
  76 Some targets (x86, ARM) have generic comparisons setting multiple flags,
  77 which are then used by predicated variants.
  78 Others (IR) specify the predicate in the comparison and users just get a single
  79 bit.  SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
  80 represent this.
  81
  82 The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
  83 a convenient way to create these generic instructions.
  84
  85 .. _gmir-gvregs:
  86
  87 Generic Virtual Registers
  88 -------------------------
  89
  90 Generic instructions operate on a new kind of register: "generic" virtual
  91 registers.  As opposed to non-generic vregs, they are not assigned a Register
  92 Class.  Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
  93 a :ref:`gmir-regbank`.
  94
  95 ``MachineRegisterInfo`` tracks the same information that it does for
  96 non-generic vregs (e.g., use-def chains).  Additionally, it also tracks the
  97 :ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
  98 its :ref:`gmir-regbank`, if any.
  99
 100 For simplicity, most generic instructions only accept generic vregs:
 101
 102 * instead of immediates, they use a gvreg defined by an instruction
 103   materializing the immediate value (see :ref:`irtranslator-constants`).
 104 * instead of physical register, they use a gvreg defined by a ``COPY``.
 105
 106 ``NOTE``:
 107 We started with an alternative representation, where MRI tracks a size for
 108 each gvreg, and instructions have lists of types.
 109 That had two flaws: the type and size are redundant, and there was no generic
 110 way of getting a given operand's type (as there was no 1:1 mapping between
 111 instruction types and operands).
 112 We considered putting the type in some variant of MCInstrDesc instead:
 113 See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
 114 need a type but this increases the memory footprint of the related objects
 115
 116 .. _gmir-regbank:
 117
 118 Register Bank
 119 -------------
 120
 121 A Register Bank is a set of register classes defined by the target.
 122 A bank has a size, which is the maximum store size of all covered classes.
 123
 124 In general, cross-class copies inside a bank are expected to be cheaper than
 125 copies across banks.  They are also coalesceable by the register coalescer,
 126 whereas cross-bank copies are not.
 127
 128 Also, equivalent operations can be performed on different banks using different
 129 instructions.
 130
 131 For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
 132 vector (which could be further split into a bank per domain for single vs
 133 double precision instructions).
 134
 135 Register banks are described by a target-provided API,
 136 :ref:`RegisterBankInfo <api-registerbankinfo>`.
 137
 138 .. _gmir-llt:
 139
 140 Low Level Type
 141 --------------
 142
 143 Additionally, every generic virtual register has a type, represented by an
 144 instance of the ``LLT`` class.
 145
 146 Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed
 147 integer types.  Furthermore, it also has no distinction between integer and
 148 floating-point types: it mainly conveys absolutely necessary information, such
 149 as size and number of vector lanes:
 150
 151 * ``sN`` for scalars
 152 * ``pN`` for pointers
 153 * ``<N x sM>`` for vectors
 154 * ``unsized`` for labels, etc..
 155
 156 ``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.
 157
 158 Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
 159
 160    =============  =========  ======================================
 161    LLT            EVT        IR Type
 162    =============  =========  ======================================
 163    ``s1``         ``i1``     ``i1``
 164    ``s8``         ``i8``     ``i8``
 165    ``s32``        ``i32``    ``i32``
 166    ``s32``        ``f32``    ``float``
 167    ``s17``        ``i17``    ``i17``
 168    ``s16``        N/A        ``{i8, i8}``
 169    ``s32``        N/A        ``[4 x i8]``
 170    ``p0``         ``iPTR``   ``i8*``, ``i32*``, ``%opaque*``
 171    ``p2``         ``iPTR``   ``i8 addrspace(2)*``
 172    ``<4 x s32>``  ``v4f32``  ``<4 x float>``
 173    ``s64``        ``v1f64``  ``<1 x double>``
 174    ``<3 x s32>``  ``v3i32``  ``<3 x i32>``
 175    ``unsized``    ``Other``  ``label``
 176    =============  =========  ======================================
 177
 178
 179 Rationale: instructions already encode a specific interpretation of types
 180 (e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``).  Also encoding that
 181 information in the type system requires introducing bitcast with no real
 182 advantage for the selector.
 183
 184 Pointer types are distinguished by address space.  This matches IR, as opposed
 185 to SelectionDAG where address space is an attribute on operations.
 186 This representation better supports pointers having different sizes depending
 187 on their addressspace.
 188
 189 ``NOTE``:
 190 Currently, LLT requires at least 2 elements in vectors, but some targets have
 191 the concept of a '1-element vector'.  Representing them as their underlying
 192 scalar type is a nice simplification.
 193
 194 ``TODO``:
 195 Currently, non-generic virtual registers, defined by non-pre-isel-generic
 196 instructions, cannot have a type, and thus cannot be used by a pre-isel generic
 197 instruction.  Instead, they are given a type using a COPY.  We could relax that
 198 and allow types on all vregs: this would reduce the number of MI required when
 199 emitting target-specific MIR early in the pipeline.  This should purely be
 200 a compile-time optimization.
 201
 202 .. _pipeline:
 203
 204 Core Pipeline
 205 =============
 206
 207 There are four required passes, regardless of the optimization mode:
 208
 209 .. contents::
 210    :local:
 211
 212 Additional passes can then be inserted at higher optimization levels or for
 213 specific targets. For example, to match the current SelectionDAG set of
 214 transformations: MachineCSE and a better MachineCombiner between every pass.
 215
 216 ``NOTE``:
 217 In theory, not all passes are always necessary.
 218 As an additional compile-time optimization, we could skip some of the passes by
 219 setting the relevant MachineFunction properties.  For instance, if the
 220 IRTranslator did not encounter any illegal instruction, it would set the
 221 ``legalized`` property to avoid running the :ref:`milegalizer`.
 222 Similarly, we considered specializing the IRTranslator per-target to directly
 223 emit target-specific MI.
 224 However, we instead decided to keep the core pipeline simple, and focus on
 225 minimizing the overhead of the passes in the no-op cases.
 226
 227
 228 .. _irtranslator:
 229
 230 IRTranslator
 231 ------------
 232
 233 This pass translates the input LLVM IR ``Function`` to a GMIR
 234 ``MachineFunction``.
 235
 236 ``TODO``:
 237 This currently doesn't support the more complex instructions, in particular
 238 those involving control flow (``switch``, ``invoke``, ...).
 239 For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass.
 240
 241 .. _api-calllowering:
 242
 243 API: CallLowering
 244 ^^^^^^^^^^^^^^^^^
 245
 246 The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also
 247 implements the ABI's calling convention by lowering calls, returns, and
 248 arguments to the appropriate physical register usage and instruction sequences.
 249
 250 .. _irtranslator-aggregates:
 251
 252 Aggregates
 253 ^^^^^^^^^^
 254
 255 Aggregates are lowered to a single scalar vreg.
 256 This differs from SelectionDAG's multiple vregs via ``GetValueVTs``.
 257
 258 ``TODO``:
 259 As some of the bits are undef (padding), we should consider augmenting the
 260 representation with additional metadata (in effect, caching computeKnownBits
 261 information on vregs).
 262 See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during
 263 IR to MachineInstr translation for aggregate type
 264
 265 .. _irtranslator-constants:
 266
 267 Constant Lowering
 268 ^^^^^^^^^^^^^^^^^
 269
 270 The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined
 271 by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions.
 272 Currently, these instructions are always emitted in the entry basic block.
 273 In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg.
 274
 275 This is beneficial as it allows us to fold constants into immediate operands
 276 during :ref:`instructionselect`, while still avoiding redundant materializations
 277 for expensive non-foldable constants.
 278 However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as
 279 these vregs can have long live ranges.
 280
 281 ``TODO``:
 282 We're investigating better placement of these instructions, in fast and
 283 optimized modes.
 284
 285
 286 .. _milegalizer:
 287
 288 Legalizer
 289 ---------
 290
 291 This pass transforms the generic machine instructions such that they are legal.
 292
 293 A legal instruction is defined as:
 294
 295 * **selectable** --- the target will later be able to select it to a
 296   target-specific (non-generic) instruction.
 297
 298 * operating on **vregs that can be loaded and stored** -- if necessary, the
 299   target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand.
 300
 301 As opposed to SelectionDAG, there are no legalization phases.  In particular,
 302 'type' and 'operation' legalization are not separate.
 303
 304 Legalization is iterative, and all state is contained in GMIR.  To maintain the
 305 validity of the intermediate code, instructions are introduced:
 306
 307 * ``G_MERGE_VALUES`` --- concatenate multiple registers of the same
 308   size into a single wider register.
 309
 310 * ``G_UNMERGE_VALUES`` --- extract multiple registers of the same size
 311   from a single wider register.
 312
 313 * ``G_EXTRACT`` --- extract a simple register (as contiguous sequences of bits)
 314   from a single wider register.
 315
 316 As they are expected to be temporary byproducts of the legalization process,
 317 they are combined at the end of the :ref:`milegalizer` pass.
 318 If any remain, they are expected to always be selectable, using loads and stores
 319 if necessary.
 320
 321 The legality of an instruction may only depend on the instruction itself and
 322 must not depend on any context in which the instruction is used. However, after
 323 deciding that an instruction is not legal, using the context of the instruction
 324 to decide how to legalize the instruction is permitted. As an example, if we
 325 have a ``G_FOO`` instruction of the form::
 326
 327   %1:_(s32) = G_CONSTANT i32 1
 328   %2:_(s32) = G_FOO %0:_(s32), %1:_(s32)
 329
 330 it's impossible to say that G_FOO is legal iff %1 is a ``G_CONSTANT`` with
 331 value ``1``. However, the following::
 332
 333   %2:_(s32) = G_FOO %0:_(s32), i32 1
 334
 335 can say that it's legal iff operand 2 is an immediate with value ``1`` because
 336 that information is entirely contained within the single instruction.
 337
 338 .. _api-legalizerinfo:
 339
 340 API: LegalizerInfo
 341 ^^^^^^^^^^^^^^^^^^
 342
 343 The recommended [#legalizer-legacy-footnote]_ API looks like this::
 344
 345   getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
 346       .legalFor({s32, s64, v2s32, v4s32, v2s64})
 347       .clampScalar(0, s32, s64)
 348       .widenScalarToNextPow2(0)
 349       .clampNumElements(0, v2s32, v4s32)
 350       .clampNumElements(0, v2s64, v2s64)
 351       .moreElementsToNextPow2(0);
 352
 353 and describes a set of rules by which we can either declare an instruction legal
 354 or decide which action to take to make it more legal.
 355
 356 At the core of this ruleset is the ``LegalityQuery`` which describes the
 357 instruction. We use a description rather than the instruction to both allow other
 358 passes to determine legality without having to create an instruction and also to
 359 limit the information available to the predicates to that which is safe to rely
 360 on. Currently, the information available to the predicates that determine
 361 legality contains:
 362
 363 * The opcode for the instruction
 364
 365 * The type of each type index (see ``type0``, ``type1``, etc.)
 366
 367 * The size in bytes and atomic ordering for each MachineMemOperand
 368
 369 Rule Processing and Declaring Rules
 370 """""""""""""""""""""""""""""""""""
 371
 372 The ``getActionDefinitionsBuilder`` function generates a ruleset for the given
 373 opcode(s) that rules can be added to. If multiple opcodes are given, they are
 374 all permanently bound to the same ruleset. The rules in a ruleset are executed
 375 from top to bottom and will start again from the top if an instruction is
 376 legalized as a result of the rules. If the ruleset is exhausted without
 377 satisfying any rule, then it is considered unsupported.
 378
 379 When it doesn't declare the instruction legal, each pass over the rules may
 380 request that one type changes to another type. Sometimes this can cause multiple
 381 types to change but we avoid this as much as possible as making multiple changes
 382 can make it difficult to avoid infinite loops where, for example, narrowing one
 383 type causes another to be too small and widening that type causes the first one
 384 to be too big.
 385
 386 In general, it's advisable to declare instructions legal as close to the top of
 387 the rule as possible and to place any expensive rules as low as possible. This
 388 helps with performance as testing for legality happens more often than
 389 legalization and legalization can require multiple passes over the rules.
 390
 391 As a concrete example, consider the rule::
 392
 393   getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
 394       .legalFor({s32, s64, v2s32, v4s32, v2s64})
 395       .clampScalar(0, s32, s64)
 396       .widenScalarToNextPow2(0);
 397
 398 and the instruction::
 399
 400   %2:_(s7) = G_ADD %0:_(s7), %1:_(s7)
 401
 402 this doesn't meet the predicate for the :ref:`.legalFor() <legalfor>` as ``s7``
 403 is not one of the listed types so it falls through to the
 404 :ref:`.clampScalar() <clampscalar>`. It does meet the predicate for this rule
 405 as the type is smaller than the ``s32`` and this rule instructs the legalizer
 406 to change type 0 to ``s32``. It then restarts from the top. This time it does
 407 satisfy ``.legalFor()`` and the resulting output is::
 408
 409   %3:_(s32) = G_ANYEXT %0:_(s7)
 410   %4:_(s32) = G_ANYEXT %1:_(s7)
 411   %5:_(s32) = G_ADD %3:_(s32), %4:_(s32)
 412   %2:_(s7) = G_TRUNC %5:_(s32)
 413
 414 where the ``G_ADD`` is legal and the other instructions are scheduled for
 415 processing by the legalizer.
 416
 417 Rule Actions
 418 """"""""""""
 419
 420 There are various rule factories that append rules to a ruleset but they have a
 421 few actions in common:
 422
 423 .. _legalfor:
 424
 425 * ``legalIf()``, ``legalFor()``, etc. declare an instruction to be legal if the
 426   predicate is satisfied.
 427
 428 * ``narrowScalarIf()``, ``narrowScalarFor()``, etc. declare an instruction to be illegal
 429   if the predicate is satisfied and indicates that narrowing the scalars in one
 430   of the types to a specific type would make it more legal. This action supports
 431   both scalars and vectors.
 432
 433 * ``widenScalarIf()``, ``widenScalarFor()``, etc. declare an instruction to be illegal
 434   if the predicate is satisfied and indicates that widening the scalars in one
 435   of the types to a specific type would make it more legal. This action supports
 436   both scalars and vectors.
 437
 438 * ``fewerElementsIf()``, ``fewerElementsFor()``, etc. declare an instruction to be
 439   illegal if the predicate is satisfied and indicates reducing the number of
 440   vector elements in one of the types to a specific type would make it more
 441   legal. This action supports vectors.
 442
 443 * ``moreElementsIf()``, ``moreElementsFor()``, etc. declare an instruction to be illegal
 444   if the predicate is satisfied and indicates increasing the number of vector
 445   elements in one of the types to a specific type would make it more legal.
 446   This action supports vectors.
 447
 448 * ``lowerIf()``, ``lowerFor()``, etc. declare an instruction to be illegal if the
 449   predicate is satisfied and indicates that replacing it with equivalent
 450   instruction(s) would make it more legal. Support for this action differs for
 451   each opcode.
 452
 453 * ``libcallIf()``, ``libcallFor()``, etc. declare an instruction to be illegal if the
 454   predicate is satisfied and indicates that replacing it with a libcall would
 455   make it more legal. Support for this action differs for
 456   each opcode.
 457
 458 * ``customIf()``, ``customFor()``, etc. declare an instruction to be illegal if the
 459   predicate is satisfied and indicates that the backend developer will supply
 460   a means of making it more legal.
 461
 462 * ``unsupportedIf()``, ``unsupportedFor()``, etc. declare an instruction to be illegal
 463   if the predicate is satisfied and indicates that there is no way to make it
 464   legal and the compiler should fail.
 465
 466 * ``fallback()`` falls back on an older API and should only be used while porting
 467   existing code from that API.
 468
 469 Rule Predicates
 470 """""""""""""""
 471
 472 The rule factories also have predicates in common:
 473
 474 * ``legal()``, ``lower()``, etc. are always satisfied.
 475
 476 * ``legalIf()``, ``narrowScalarIf()``, etc. are satisfied if the user-supplied
 477   ``LegalityPredicate`` function returns true. This predicate has access to the
 478   information in the ``LegalityQuery`` to make its decision.
 479   User-supplied predicates can also be combined using ``all(P0, P1, ...)``.
 480
 481 * ``legalFor()``, ``narrowScalarFor()``, etc. are satisfied if the type matches one in
 482   a given set of types. For example ``.legalFor({s16, s32})`` declares the
 483   instruction legal if type 0 is either s16 or s32. Additional versions for two
 484   and three type indices are generally available. For these, all the type
 485   indices considered together must match all the types in one of the tuples. So
 486   ``.legalFor({{s16, s32}, {s32, s64}})`` will only accept ``{s16, s32}``, or
 487   ``{s32, s64}`` but will not accept ``{s16, s64}``.
 488
 489 * ``legalForTypesWithMemSize()``, ``narrowScalarForTypesWithMemSize()``, etc. are
 490   similar to ``legalFor()``, ``narrowScalarFor()``, etc. but additionally require a
 491   MachineMemOperand to have a given size in each tuple.
 492
 493 * ``legalForCartesianProduct()``, ``narrowScalarForCartesianProduct()``, etc. are
 494   satisfied if each type index matches one element in each of the independent
 495   sets. So ``.legalForCartesianProduct({s16, s32}, {s32, s64})`` will accept
 496   ``{s16, s32}``, ``{s16, s64}``, ``{s32, s32}``, and ``{s32, s64}``.
 497
 498 Composite Rules
 499 """""""""""""""
 500
 501 There are some composite rules for common situations built out of the above facilities:
 502
 503 * ``widenScalarToNextPow2()`` is like ``widenScalarIf()`` but is satisfied iff the type
 504   size in bits is not a power of 2 and selects a target type that is the next
 505   largest power of 2.
 506
 507 .. _clampscalar:
 508
 509 * ``minScalar()`` is like ``widenScalarIf()`` but is satisfied iff the type
 510   size in bits is smaller than the given minimum and selects the minimum as the
 511   target type. Similarly, there is also a ``maxScalar()`` for the maximum and a
 512   ``clampScalar()`` to do both at once.
 513
 514 * ``minScalarSameAs()`` is like ``minScalar()`` but the minimum is taken from another
 515   type index.
 516
 517 * ``moreElementsToNextMultiple()`` is like ``moreElementsToNextPow2()`` but is based on
 518   multiples of X rather than powers of 2.
 519
 520 Other Information
 521 """""""""""""""""
 522
 523 ``TODO``:
 524 An alternative worth investigating is to generalize the API to represent
 525 actions using ``std::function`` that implements the action, instead of explicit
 526 enum tokens (``Legal``, ``WidenScalar``, ...).
 527
 528 ``TODO``:
 529 Moreover, we could use TableGen to initially infer legality of operation from
 530 existing patterns (as any pattern we can select is by definition legal).
 531 Expanding that to describe legalization actions is a much larger but
 532 potentially useful project.
 533
 534 .. rubric:: Footnotes
 535
 536 .. [#legalizer-legacy-footnote] An API is broadly similar to
 537    SelectionDAG/TargetLowering is available but is not recommended as a more
 538    powerful API is available.
 539
 540 .. _min-legalizerinfo:
 541
 542 Minimum Rule Set
 543 ^^^^^^^^^^^^^^^^
 544
 545 GlobalISel's legalizer has a great deal of flexibility in how a given target
 546 shapes the GMIR that the rest of the backend must handle. However, there are
 547 a small number of requirements that all targets must meet.
 548
 549 Before discussing the minimum requirements, we'll need some terminology:
 550
 551 Producer Type Set
 552   The set of types which is the union of all possible types produced by at
 553   least one legal instruction.
 554
 555 Consumer Type Set
 556   The set of types which is the union of all possible types consumed by at
 557   least one legal instruction.
 558
 559 Both sets are often identical but there's no guarantee of that. For example,
 560 it's not uncommon to be unable to consume s64 but still be able to produce it
 561 for a few specific instructions.
 562
 563 Minimum Rules For Scalars
 564 """""""""""""""""""""""""
 565
 566 * G_ANYEXT must be legal for all inputs from the producer type set and all larger
 567   outputs from the consumer type set.
 568 * G_TRUNC must be legal for all inputs from the producer type set and all
 569   smaller outputs from the consumer type set.
 570
 571 G_ANYEXT, and G_TRUNC have mandatory legality since the GMIR requires a means to
 572 connect operations with different type sizes. They are usually trivial to support
 573 since G_ANYEXT doesn't define the value of the additional bits and G_TRUNC is
 574 discarding bits. The other conversions can be lowered into G_ANYEXT/G_TRUNC
 575 with some additional operations that are subject to further legalization. For
 576 example, G_SEXT can lower to::
 577
 578   %1 = G_ANYEXT %0
 579   %2 = G_CONSTANT ...
 580   %3 = G_SHL %1, %2
 581   %4 = G_ASHR %3, %2
 582
 583 and the G_CONSTANT/G_SHL/G_ASHR can further lower to other operations or target
 584 instructions. Similarly, G_FPEXT has no legality requirement since it can lower
 585 to a G_ANYEXT followed by a target instruction.
 586
 587 G_MERGE_VALUES and G_UNMERGE_VALUES do not have legality requirements since the
 588 former can lower to G_ANYEXT and some other legalizable instructions, while the
 589 latter can lower to some legalizable instructions followed by G_TRUNC.
 590
 591 Minimum Legality For Vectors
 592 """"""""""""""""""""""""""""
 593
 594 Within the vector types, there aren't any defined conversions in LLVM IR as
 595 vectors are often converted by reinterpreting the bits or by decomposing the
 596 vector and reconstituting it as a different type. As such, G_BITCAST is the
 597 only operation to account for. We generally don't require that it's legal
 598 because it can usually be lowered to COPY (or to nothing using
 599 replaceAllUses()). However, there are situations where G_BITCAST is non-trivial
 600 (e.g. little-endian vectors of big-endian data such as on big-endian MIPS MSA and
 601 big-endian ARM NEON, see `_i_bitcast`). To account for this G_BITCAST must be
 602 legal for all type combinations that change the bit pattern in the value.
 603
 604 There are no legality requirements for G_BUILD_VECTOR, or G_BUILD_VECTOR_TRUNC
 605 since these can be handled by:
 606 * Declaring them legal.
 607 * Scalarizing them.
 608 * Lowering them to G_TRUNC+G_ANYEXT and some legalizable instructions.
 609 * Lowering them to target instructions which are legal by definition.
 610
 611 The same reasoning also allows G_UNMERGE_VALUES to lack legality requirements
 612 for vector inputs.
 613
 614 Minimum Legality for Pointers
 615 """""""""""""""""""""""""""""
 616
 617 There are no minimum rules for pointers since G_INTTOPTR and G_PTRTOINT can
 618 be selected to a COPY from register class to another by the legalizer.
 619
 620 Minimum Legality For Operations
 621 """""""""""""""""""""""""""""""
 622
 623 The rules for G_ANYEXT, G_MERGE_VALUES, G_BITCAST, G_BUILD_VECTOR,
 624 G_BUILD_VECTOR_TRUNC, G_CONCAT_VECTORS, G_UNMERGE_VALUES, G_PTRTOINT, and
 625 G_INTTOPTR have already been noted above. In addition to those, the following
 626 operations have requirements:
 627
 628 * At least one G_IMPLICIT_DEF must be legal. This is usually trivial as it
 629   requires no code to be selected.
 630 * G_PHI must be legal for all types in the producer and consumer typesets. This
 631   is usually trivial as it requires no code to be selected.
 632 * At least one G_FRAME_INDEX must be legal
 633 * At least one G_BLOCK_ADDR must be legal
 634
 635 There are many other operations you'd expect to have legality requirements but
 636 they can be lowered to target instructions which are legal by definition.
 637
 638 .. _regbankselect:
 639
 640 RegBankSelect
 641 -------------
 642
 643 This pass constrains the :ref:`gmir-gvregs` operands of generic
 644 instructions to some :ref:`gmir-regbank`.
 645
 646 It iteratively maps instructions to a set of per-operand bank assignment.
 647 The possible mappings are determined by the target-provided
 648 :ref:`RegisterBankInfo <api-registerbankinfo>`.
 649 The mapping is then applied, possibly introducing ``COPY`` instructions if
 650 necessary.
 651
 652 It traverses the ``MachineFunction`` top down so that all operands are already
 653 mapped when analyzing an instruction.
 654
 655 This pass could also remap target-specific instructions when beneficial.
 656 In the future, this could replace the ExeDepsFix pass, as we can directly
 657 select the best variant for an instruction that's available on multiple banks.
 658
 659 .. _api-registerbankinfo:
 660
 661 API: RegisterBankInfo
 662 ^^^^^^^^^^^^^^^^^^^^^
 663
 664 The ``RegisterBankInfo`` class describes multiple aspects of register banks.
 665
 666 * **Banks**: ``addRegBankCoverage`` --- which register bank covers each
 667   register class.
 668
 669 * **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank
 670   to another.
 671
 672 * **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for
 673   a given instruction.
 674
 675 * **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other
 676   possible bank assignments for a given instruction.
 677
 678 ``TODO``:
 679 All this information should eventually be static and generated by TableGen,
 680 mostly using existing information augmented by bank descriptions.
 681
 682 ``TODO``:
 683 ``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping``
 684 because the latter is more expensive: as we move to static mapping info,
 685 both methods should be free, and we should merge them.
 686
 687 .. _regbankselect-modes:
 688
 689 RegBankSelect Modes
 690 ^^^^^^^^^^^^^^^^^^^
 691
 692 ``RegBankSelect`` currently has two modes:
 693
 694 * **Fast** --- For each instruction, pick a target-provided "default" bank
 695   assignment.  This is the default at -O0.
 696
 697 * **Greedy** --- For each instruction, pick the cheapest of several
 698   target-provided bank assignment alternatives.
 699
 700 We intend to eventually introduce an additional optimizing mode:
 701
 702 * **Global** --- Across multiple instructions, pick the cheapest combination of
 703   bank assignments.
 704
 705 ``NOTE``:
 706 On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at
 707 backend -O1):  because :ref:`gmir-llt` doesn't distinguish floating point from
 708 integer scalars, the default assignment for loads and stores is the integer
 709 bank, introducing cross-bank copies on most floating point operations.
 710
 711
 712 .. _instructionselect:
 713
 714 InstructionSelect
 715 -----------------
 716
 717 This pass transforms generic machine instructions into equivalent
 718 target-specific instructions.  It traverses the ``MachineFunction`` bottom-up,
 719 selecting uses before definitions, enabling trivial dead code elimination.
 720
 721 .. _api-instructionselector:
 722
 723 API: InstructionSelector
 724 ^^^^^^^^^^^^^^^^^^^^^^^^
 725
 726 The target implements the ``InstructionSelector`` class, containing the
 727 target-specific selection logic proper.
 728
 729 The instance is provided by the subtarget, so that it can specialize the
 730 selector by subtarget feature (with, e.g., a vector selector overriding parts
 731 of a general-purpose common selector).
 732 We might also want to parameterize it by MachineFunction, to enable selector
 733 variants based on function attributes like optsize.
 734
 735 The simple API consists of:
 736
 737   .. code-block:: c++
 738
 739     virtual bool select(MachineInstr &MI)
 740
 741 This target-provided method is responsible for mutating (or replacing) a
 742 possibly-generic MI into a fully target-specific equivalent.
 743 It is also responsible for doing the necessary constraining of gvregs into the
 744 appropriate register classes as well as passing through COPY instructions to
 745 the register allocator.
 746
 747 The ``InstructionSelector`` can fold other instructions into the selected MI,
 748 by walking the use-def chain of the vreg operands.
 749 As GlobalISel is Global, this folding can occur across basic blocks.
 750
 751 SelectionDAG Rule Imports
 752 ^^^^^^^^^^^^^^^^^^^^^^^^^
 753
 754 TableGen will import SelectionDAG rules and provide the following function to
 755 execute them:
 756
 757   .. code-block:: c++
 758
 759     bool selectImpl(MachineInstr &MI)
 760
 761 The ``--stats`` option can be used to determine what proportion of rules were
 762 successfully imported. The easiest way to use this is to copy the
 763 ``-gen-globalisel`` tablegen command from ``ninja -v`` and modify it.
 764
 765 Similarly, the ``--warn-on-skipped-patterns`` option can be used to obtain the
 766 reasons that rules weren't imported. This can be used to focus on the most
 767 important rejection reasons.
 768
 769 PatLeaf Predicates
 770 ^^^^^^^^^^^^^^^^^^
 771
 772 PatLeafs cannot be imported because their C++ is implemented in terms of
 773 ``SDNode`` objects. PatLeafs that handle immediate predicates should be
 774 replaced by ``ImmLeaf``, ``IntImmLeaf``, or ``FPImmLeaf`` as appropriate.
 775
 776 There's no standard answer for other PatLeafs. Some standard predicates have
 777 been baked into TableGen but this should not generally be done.
 778
 779 Custom SDNodes
 780 ^^^^^^^^^^^^^^
 781
 782 Custom SDNodes should be mapped to Target Pseudos using ``GINodeEquiv``. This
 783 will cause the instruction selector to import them but you will also need to
 784 ensure the target pseudo is introduced to the MIR before the instruction
 785 selector. Any preceding pass is suitable but the legalizer will be a
 786 particularly common choice.
 787
 788 ComplexPatterns
 789 ^^^^^^^^^^^^^^^
 790
 791 ComplexPatterns cannot be imported because their C++ is implemented in terms of
 792 ``SDNode`` objects. GlobalISel versions should be defined with
 793 ``GIComplexOperandMatcher`` and mapped to ComplexPattern with
 794 ``GIComplexPatternEquiv``.
 795
 796 The following predicates are useful for porting ComplexPattern:
 797
 798 * isBaseWithConstantOffset() - Check for base+offset structures
 799 * isOperandImmEqual() - Check for a particular constant
 800 * isObviouslySafeToFold() - Check for reasons an instruction can't be sunk and folded into another.
 801
 802 There are some important points for the C++ implementation:
 803
 804 * Don't modify MIR in the predicate
 805 * Renderer lambdas should capture by value to avoid use-after-free. They will be used after the predicate returns.
 806 * Only create instructions in a renderer lambda. GlobalISel won't clean up things you create but don't use.
 807
 808
 809 .. _maintainability:
 810
 811 Maintainability
 812 ===============
 813
 814 .. _maintainability-iterative:
 815
 816 Iterative Transformations
 817 -------------------------
 818
 819 Passes are split into small, iterative transformations, with all state
 820 represented in the MIR.
 821
 822 This differs from SelectionDAG (in particular, the legalizer) using various
 823 in-memory side-tables.
 824
 825
 826 .. _maintainability-mir:
 827
 828 MIR Serialization
 829 -----------------
 830
 831 .. FIXME: Update the MIRLangRef to include GMI additions.
 832
 833 :ref:`gmir` is serializable (see :doc:`MIRLangRef`).
 834 Combined with :ref:`maintainability-iterative`, this enables much finer-grained
 835 testing, rather than requiring large and fragile IR-to-assembly tests.
 836
 837 The current "stage" in the :ref:`pipeline` is represented by a set of
 838 ``MachineFunctionProperties``:
 839
 840 * ``legalized``
 841 * ``regBankSelected``
 842 * ``selected``
 843
 844
 845 .. _maintainability-verifier:
 846
 847 MachineVerifier
 848 ---------------
 849
 850 The pass approach lets us use the ``MachineVerifier`` to enforce invariants.
 851 For instance, a ``regBankSelected`` function may not have gvregs without
 852 a bank.
 853
 854 ``TODO``:
 855 The ``MachineVerifier`` being monolithic, some of the checks we want to do
 856 can't be integrated to it:  GlobalISel is a separate library, so we can't
 857 directly reference it from CodeGen.  For instance, legality checks are
 858 currently done in RegBankSelect/InstructionSelect proper.  We could #ifdef out
 859 the checks, or we could add some sort of verifier API.
 860
 861
 862 .. _progress:
 863
 864 Progress and Future Work
 865 ========================
 866
 867 The initial goal is to replace FastISel on AArch64.  The next step will be to
 868 replace SelectionDAG as the optimized ISel.
 869
 870 ``NOTE``:
 871 While we iterate on GlobalISel, we strive to avoid affecting the performance of
 872 SelectionDAG, FastISel, or the other MIR passes.  For instance, the types of
 873 :ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``,
 874 that is destroyed after :ref:`instructionselect`.
 875
 876 .. _progress-fastisel:
 877
 878 FastISel Replacement
 879 --------------------
 880
 881 For the initial FastISel replacement, we intend to fallback to SelectionDAG on
 882 selection failures.
 883
 884 Currently, compile-time of the fast pipeline is within 1.5x of FastISel.
 885 We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be
 886 challenging given the multi-pass approach.
 887 Still, supporting all IR (via a complete legalizer) and avoiding the fallback
 888 to SelectionDAG in the worst case should enable better amortized performance
 889 than SelectionDAG+FastISel.
 890
 891 ``NOTE``:
 892 We considered never having a fallback to SelectionDAG, instead deciding early
 893 whether a given function is supported by GlobalISel or not.  The decision would
 894 be based on :ref:`milegalizer` queries.
 895 We abandoned that for two reasons:
 896 a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`;
 897 b) to be robust against unforeseen failures and to enable iterative
 898 improvements.
 899
 900 .. _progress-targets:
 901
 902 Support For Other Targets
 903 -------------------------
 904
 905 In parallel, we're investigating adding support for other - ideally quite
 906 different - targets.  For instance, there is some initial AMDGPU support.
 907
 908
 909 .. _porting:
 910
 911 Porting GlobalISel to A New Target
 912 ==================================
 913
 914 There are four major classes to implement by the target:
 915
 916 * :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments
 917   according to the ABI.
 918 * :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe
 919   :ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of
 920   operands onto banks for each instruction.
 921 * :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how
 922   to legalize what isn't.
 923 * :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR
 924   to target-specific MIR.
 925
 926 Additionally:
 927
 928 * ``TargetPassConfig`` --- create the passes constituting the pipeline,
 929   including additional passes not included in the :ref:`pipeline`.
 930
 931 .. _other_resources:
 932
 933 Resources
 934 =========
 935
 936 * `Global Instruction Selection - A Proposal by Quentin Colombet @LLVMDevMeeting 2015 <https://www.youtube.com/watch?v=F6GGbYtae3g>`_
 937 * `Global Instruction Selection - Status by Quentin Colombet, Ahmed Bougacha, and Tim Northover @LLVMDevMeeting 2016 <https://www.youtube.com/watch?v=6tfb344A7w8>`_
 938 * `GlobalISel - LLVM's Latest Instruction Selection Framework by Diana Picus @FOSDEM17 <https://www.youtube.com/watch?v=d6dF6E4BPeU>`_
 939 * GlobalISel: Past, Present, and Future by Quentin Colombet and Ahmed Bougacha @LLVMDevMeeting 2017
 940 * Head First into GlobalISel by Daniel Sanders, Aditya Nandakumar, and Justin Bogner @LLVMDevMeeting 2017