1 =====================================
3 =====================================
10 This is a work in progress. Compatibility across LLVM releases is not
18 LLVM coroutines are functions that have one or more `suspend points`_.
19 When a suspend point is reached, the execution of a coroutine is suspended and
20 control is returned back to its caller. A suspended coroutine can be resumed
21 to continue execution from the last suspend point or it can be destroyed.
23 In the following example, we call function `f` (which may or may not be a
24 coroutine itself) that returns a handle to a suspended coroutine
25 (**coroutine handle**) that is used by `main` to resume the coroutine twice and
32 %hdl = call i8* @f(i32 4)
33 call void @llvm.coro.resume(i8* %hdl)
34 call void @llvm.coro.resume(i8* %hdl)
35 call void @llvm.coro.destroy(i8* %hdl)
41 In addition to the function stack frame which exists when a coroutine is
42 executing, there is an additional region of storage that contains objects that
43 keep the coroutine state when a coroutine is suspended. This region of storage
44 is called **coroutine frame**. It is created when a coroutine is called and
45 destroyed when a coroutine runs to completion or destroyed by a call to
46 the `coro.destroy`_ intrinsic.
48 An LLVM coroutine is represented as an LLVM function that has calls to
49 `coroutine intrinsics`_ defining the structure of the coroutine.
50 After lowering, a coroutine is split into several
51 functions that represent three different ways of how control can enter the
54 1. a ramp function, which represents an initial invocation of the coroutine that
55 creates the coroutine frame and executes the coroutine code until it
56 encounters a suspend point or reaches the end of the function;
58 2. a coroutine resume function that is invoked when the coroutine is resumed;
60 3. a coroutine destroy function that is invoked when the coroutine is destroyed.
62 .. note:: Splitting out resume and destroy functions are just one of the
63 possible ways of lowering the coroutine. We chose it for initial
64 implementation as it matches closely the mental model and results in
70 Coroutine Representation
71 ------------------------
73 Let's look at an example of an LLVM coroutine with the behavior sketched
74 by the following pseudo-code.
81 <suspend> // returns a coroutine handle on first suspend
85 This coroutine calls some function `print` with value `n` as an argument and
86 suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
87 a `main` shown in the previous section. It will call `print` with values 4, 5
88 and 6 after which the coroutine will be destroyed.
90 The LLVM IR for this coroutine looks like this:
94 define i8* @f(i32 %n) {
96 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
97 %size = call i32 @llvm.coro.size.i32()
98 %alloc = call i8* @malloc(i32 %size)
99 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
102 %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
103 %inc = add nsw i32 %n.val, 1
104 call void @print(i32 %n.val)
105 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
106 switch i8 %0, label %suspend [i8 0, label %loop
107 i8 1, label %cleanup]
109 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
110 call void @free(i8* %mem)
113 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
117 The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
118 lowered to a constant representing the size required for the coroutine frame.
119 The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
120 coroutine handle. The second parameter of `coro.begin` is given a block of memory
121 to be used if the coroutine frame needs to be allocated dynamically.
122 The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the
123 `coro.begin`_ intrinsic get duplicated by optimization passes such as
126 The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
127 given the coroutine handle, returns a pointer of the memory block to be freed or
128 `null` if the coroutine frame was not allocated dynamically. The `cleanup`
129 block is entered when coroutine runs to completion by itself or destroyed via
130 call to the `coro.destroy`_ intrinsic.
132 The `suspend` block contains code to be executed when coroutine runs to
133 completion or suspended. The `coro.end`_ intrinsic marks the point where
134 a coroutine needs to return control back to the caller if it is not an initial
135 invocation of the coroutine.
137 The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
138 intrinsic in combination with the following switch indicates what happens to
139 control flow when a coroutine is suspended (default case), resumed (case 0) or
142 Coroutine Transformation
143 ------------------------
145 One of the steps of coroutine lowering is building the coroutine frame. The
146 def-use chains are analyzed to determine which objects need be kept alive across
147 suspend points. In the coroutine shown in the previous section, use of virtual register
148 `%n.val` is separated from the definition by a suspend point, therefore, it
149 cannot reside on the stack frame since the latter goes away once the coroutine
150 is suspended and control is returned back to the caller. An i32 slot is
151 allocated in the coroutine frame and `%n.val` is spilled and reloaded from that
154 We also store addresses of the resume and destroy functions so that the
155 `coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
156 when its identity cannot be determined statically at compile time. For our
157 example, the coroutine frame will be:
161 %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
163 After resume and destroy parts are outlined, function `f` will contain only the
164 code responsible for creation and initialization of the coroutine frame and
165 execution of the coroutine until a suspend point is reached:
169 define i8* @f(i32 %n) {
171 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
172 %alloc = call noalias i8* @malloc(i32 24)
173 %0 = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
174 %frame = bitcast i8* %0 to %f.frame*
175 %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
176 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
177 %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
178 store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
180 %inc = add nsw i32 %n, 1
181 %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
182 store i32 %inc, i32* %inc.spill.addr
183 call void @print(i32 %n)
188 Outlined resume part of the coroutine will reside in function `f.resume`:
192 define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
194 %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
195 %inc.spill = load i32, i32* %inc.spill.addr, align 4
196 %inc = add i32 %n.val, 1
197 store i32 %inc, i32* %inc.spill.addr, align 4
198 tail call void @print(i32 %inc)
202 Whereas function `f.destroy` will contain the cleanup code for the coroutine:
206 define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
208 %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
209 tail call void @free(i8* %0)
213 Avoiding Heap Allocations
214 -------------------------
216 A particular coroutine usage pattern, which is illustrated by the `main`
217 function in the overview section, where a coroutine is created, manipulated and
218 destroyed by the same calling function, is common for coroutines implementing
219 RAII idiom and is suitable for allocation elision optimization which avoid
220 dynamic allocation by storing the coroutine frame as a static `alloca` in its
223 In the entry block, we will call `coro.alloc`_ intrinsic that will return `true`
224 when dynamic allocation is required, and `false` if dynamic allocation is
230 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
231 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
232 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
234 %size = call i32 @llvm.coro.size.i32()
235 %alloc = call i8* @CustomAlloc(i32 %size)
238 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
239 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
241 In the cleanup block, we will make freeing the coroutine frame conditional on
242 `coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
243 thus skipping the deallocation code:
248 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
249 %need.dyn.free = icmp ne i8* %mem, null
250 br i1 %need.dyn.free, label %dyn.free, label %if.end
252 call void @CustomFree(i8* %mem)
257 With allocations and deallocations represented as described as above, after
258 coroutine heap allocation elision optimization, the resulting main will be:
264 call void @print(i32 4)
265 call void @print(i32 5)
266 call void @print(i32 6)
270 Multiple Suspend Points
271 -----------------------
273 Let's consider the coroutine that has more than one suspend point:
286 Matching LLVM code would look like (with the rest of the code remaining the same
287 as the code in the previous section):
292 %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
293 call void @print(i32 %n.addr) #4
294 %2 = call i8 @llvm.coro.suspend(token none, i1 false)
295 switch i8 %2, label %suspend [i8 0, label %loop.resume
296 i8 1, label %cleanup]
298 %inc = add nsw i32 %n.addr, 1
299 %sub = xor i32 %n.addr, -1
300 call void @print(i32 %sub)
301 %3 = call i8 @llvm.coro.suspend(token none, i1 false)
302 switch i8 %3, label %suspend [i8 0, label %loop
303 i8 1, label %cleanup]
305 In this case, the coroutine frame would include a suspend index that will
306 indicate at which suspend point the coroutine needs to resume. The resume
307 function will use an index to jump to an appropriate basic block and will look
312 define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
314 %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
315 %index = load i8, i8* %index.addr, align 1
316 %switch = icmp eq i8 %index, 0
317 %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
318 %n = load i32, i32* %n.addr, align 4
319 br i1 %switch, label %loop.resume, label %loop
322 %sub = xor i32 %n, -1
323 call void @print(i32 %sub)
326 %inc = add nsw i32 %n, 1
327 store i32 %inc, i32* %n.addr, align 4
328 tail call void @print(i32 %inc)
332 %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
333 store i8 %storemerge, i8* %index.addr, align 1
337 If different cleanup code needs to get executed for different suspend points,
338 a similar switch will be in the `f.destroy` function.
342 Using suspend index in a coroutine state and having a switch in `f.resume` and
343 `f.destroy` is one of the possible implementation strategies. We explored
344 another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
345 every suspend point, and instead of storing an index, the resume and destroy
346 function pointers are updated at every suspend. Early testing showed that the
347 current approach is easier on the optimizer than the latter so it is a
348 lowering strategy implemented at the moment.
350 Distinct Save and Suspend
351 -------------------------
353 In the previous example, setting a resume index (or some other state change that
354 needs to happen to prepare a coroutine for resumption) happens at the same time as
355 a suspension of a coroutine. However, in certain cases, it is necessary to control
356 when coroutine is prepared for resumption and when it is suspended.
358 In the following example, a coroutine represents some activity that is driven
359 by completions of asynchronous operations `async_op1` and `async_op2` which get
360 a coroutine handle as a parameter and resume the coroutine once async
361 operation is finished.
368 async_op1(<coroutine-handle>); // will resume once async_op1 completes
373 async_op2(<coroutine-handle>); // will resume once async_op2 completes
380 In this case, coroutine should be ready for resumption prior to a call to
381 `async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
382 point when coroutine should be ready for resumption (namely, when a resume index
383 should be stored in the coroutine frame, so that it can be resumed at the
384 correct resume point):
389 %save1 = call token @llvm.coro.save(i8* %hdl)
390 call void @async_op1(i8* %hdl)
391 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
392 switch i8 %suspend1, label %suspend [i8 0, label %resume1
393 i8 1, label %cleanup]
395 %save2 = call token @llvm.coro.save(i8* %hdl)
396 call void @async_op2(i8* %hdl)
397 %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
398 switch i8 %suspend1, label %suspend [i8 0, label %resume2
399 i8 1, label %cleanup]
401 .. _coroutine promise:
406 A coroutine author or a frontend may designate a distinguished `alloca` that can
407 be used to communicate with the coroutine. This distinguished alloca is called
408 **coroutine promise** and is provided as the second parameter to the
409 `coro.id`_ intrinsic.
411 The following coroutine designates a 32 bit integer `promise` and uses it to
412 store the current value produced by a coroutine.
416 define i8* @f(i32 %n) {
418 %promise = alloca i32
419 %pv = bitcast i32* %promise to i8*
420 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
421 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
422 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
424 %size = call i32 @llvm.coro.size.i32()
425 %alloc = call i8* @malloc(i32 %size)
428 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
429 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
432 %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
433 %inc = add nsw i32 %n.val, 1
434 store i32 %n.val, i32* %promise
435 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
436 switch i8 %0, label %suspend [i8 0, label %loop
437 i8 1, label %cleanup]
439 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
440 call void @free(i8* %mem)
443 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
447 A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
454 %hdl = call i8* @f(i32 4)
455 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
456 %promise.addr = bitcast i8* %promise.addr.raw to i32*
457 %val0 = load i32, i32* %promise.addr
458 call void @print(i32 %val0)
459 call void @llvm.coro.resume(i8* %hdl)
460 %val1 = load i32, i32* %promise.addr
461 call void @print(i32 %val1)
462 call void @llvm.coro.resume(i8* %hdl)
463 %val2 = load i32, i32* %promise.addr
464 call void @print(i32 %val2)
465 call void @llvm.coro.destroy(i8* %hdl)
469 After example in this section is compiled, result of the compilation will be:
475 tail call void @print(i32 4)
476 tail call void @print(i32 5)
477 tail call void @print(i32 6)
487 A coroutine author or a frontend may designate a particular suspend to be final,
488 by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
489 Such a suspend point has two properties:
491 * it is possible to check whether a suspended coroutine is at the final suspend
492 point via `coro.done`_ intrinsic;
494 * a resumption of a coroutine stopped at the final suspend point leads to
495 undefined behavior. The only possible action for a coroutine at a final
496 suspend point is destroying it via `coro.destroy`_ intrinsic.
498 From the user perspective, the final suspend point represents an idea of a
499 coroutine reaching the end. From the compiler perspective, it is an optimization
500 opportunity for reducing number of resume points (and therefore switch cases) in
503 The following is an example of a function that keeps resuming the coroutine
504 until the final suspend point is reached after which point the coroutine is
511 %hdl = call i8* @f(i32 4)
514 call void @llvm.coro.resume(i8* %hdl)
515 %done = call i1 @llvm.coro.done(i8* %hdl)
516 br i1 %done, label %end, label %while
518 call void @llvm.coro.destroy(i8* %hdl)
522 Usually, final suspend point is a frontend injected suspend point that does not
523 correspond to any explicitly authored suspend point of the high level language.
524 For example, for a Python generator that has only one suspend point:
526 .. code-block:: python
532 Python frontend would inject two more suspend points, so that the actual code
537 void* coroutine(int n) {
539 <designate current_value to be coroutine promise>
540 <SUSPEND> // injected suspend point, so that the coroutine starts suspended
541 for (int i = 0; i < n; ++i) {
542 current_value = i; <SUSPEND>; // corresponds to "yield i"
544 <SUSPEND final=true> // injected final suspend point
547 and python iterator `__next__` would look like:
551 int __next__(void* hdl) {
553 if (coro.done(hdl)) throw StopIteration();
554 return *(int*)coro.promise(hdl, 4, false);
560 Coroutine Manipulation Intrinsics
561 ---------------------------------
563 Intrinsics described in this section are used to manipulate an existing
564 coroutine. They can be used in any function which happen to have a pointer
565 to a `coroutine frame`_ or a pointer to a `coroutine promise`_.
569 'llvm.coro.destroy' Intrinsic
570 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
577 declare void @llvm.coro.destroy(i8* <handle>)
582 The '``llvm.coro.destroy``' intrinsic destroys a suspended
588 The argument is a coroutine handle to a suspended coroutine.
593 When possible, the `coro.destroy` intrinsic is replaced with a direct call to
594 the coroutine destroy function. Otherwise it is replaced with an indirect call
595 based on the function pointer for the destroy function stored in the coroutine
596 frame. Destroying a coroutine that is not suspended leads to undefined behavior.
600 'llvm.coro.resume' Intrinsic
601 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605 declare void @llvm.coro.resume(i8* <handle>)
610 The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine.
615 The argument is a handle to a suspended coroutine.
620 When possible, the `coro.resume` intrinsic is replaced with a direct call to the
621 coroutine resume function. Otherwise it is replaced with an indirect call based
622 on the function pointer for the resume function stored in the coroutine frame.
623 Resuming a coroutine that is not suspended leads to undefined behavior.
627 'llvm.coro.done' Intrinsic
628 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
632 declare i1 @llvm.coro.done(i8* <handle>)
637 The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at
638 the final suspend point or not.
643 The argument is a handle to a suspended coroutine.
648 Using this intrinsic on a coroutine that does not have a `final suspend`_ point
649 or on a coroutine that is not suspended leads to undefined behavior.
653 'llvm.coro.promise' Intrinsic
654 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
658 declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
663 The '``llvm.coro.promise``' intrinsic obtains a pointer to a
664 `coroutine promise`_ given a coroutine handle and vice versa.
669 The first argument is a handle to a coroutine if `from` is false. Otherwise,
670 it is a pointer to a coroutine promise.
672 The second argument is an alignment requirements of the promise.
673 If a frontend designated `%promise = alloca i32` as a promise, the alignment
674 argument to `coro.promise` should be the alignment of `i32` on the target
675 platform. If a frontend designated `%promise = alloca i32, align 16` as a
676 promise, the alignment argument should be 16.
677 This argument only accepts constants.
679 The third argument is a boolean indicating a direction of the transformation.
680 If `from` is true, the intrinsic returns a coroutine handle given a pointer
681 to a promise. If `from` is false, the intrinsics return a pointer to a promise
682 from a coroutine handle. This argument only accepts constants.
687 Using this intrinsic on a coroutine that does not have a coroutine promise
688 leads to undefined behavior. It is possible to read and modify coroutine
689 promise of the coroutine which is currently executing. The coroutine author and
690 a coroutine user are responsible to makes sure there is no data races.
697 define i8* @f(i32 %n) {
699 %promise = alloca i32
700 %pv = bitcast i32* %promise to i8*
701 ; the second argument to coro.id points to the coroutine promise.
702 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
704 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
706 store i32 42, i32* %promise ; store something into the promise
713 %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
714 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
715 %promise.addr = bitcast i8* %promise.addr.raw to i32*
716 %val = load i32, i32* %promise.addr ; load a value from the promise
717 call void @print(i32 %val)
718 call void @llvm.coro.destroy(i8* %hdl)
722 .. _coroutine intrinsics:
724 Coroutine Structure Intrinsics
725 ------------------------------
726 Intrinsics described in this section are used within a coroutine to describe
727 the coroutine structure. They should not be used outside of a coroutine.
731 'llvm.coro.size' Intrinsic
732 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
735 declare i32 @llvm.coro.size.i32()
736 declare i64 @llvm.coro.size.i64()
741 The '``llvm.coro.size``' intrinsic returns the number of bytes
742 required to store a `coroutine frame`_.
752 The `coro.size` intrinsic is lowered to a constant representing the size of
757 'llvm.coro.begin' Intrinsic
758 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
761 declare i8* @llvm.coro.begin(token <id>, i8* <mem>)
766 The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
771 The first argument is a token returned by a call to '``llvm.coro.id``'
772 identifying the coroutine.
774 The second argument is a pointer to a block of memory where coroutine frame
775 will be stored if it is allocated dynamically.
780 Depending on the alignment requirements of the objects in the coroutine frame
781 and/or on the codegen compactness reasons the pointer returned from `coro.begin`
782 may be at offset to the `%mem` argument. (This could be beneficial if
783 instructions that express relative access to data can be more compactly encoded
784 with small positive and negative offsets).
786 A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
790 'llvm.coro.free' Intrinsic
791 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
794 declare i8* @llvm.coro.free(token %id, i8* <frame>)
799 The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
800 coroutine frame is stored or `null` if this instance of a coroutine did not use
801 dynamically allocated memory for its coroutine frame.
806 The first argument is a token returned by a call to '``llvm.coro.id``'
807 identifying the coroutine.
809 The second argument is a pointer to the coroutine frame. This should be the same
810 pointer that was returned by prior `coro.begin` call.
812 Example (custom deallocation function):
813 """""""""""""""""""""""""""""""""""""""
818 %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
819 %mem_not_null = icmp ne i8* %mem, null
820 br i1 %mem_not_null, label %if.then, label %if.end
822 call void @CustomFree(i8* %mem)
827 Example (standard deallocation functions):
828 """"""""""""""""""""""""""""""""""""""""""
833 %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
834 call void @free(i8* %mem)
839 'llvm.coro.alloc' Intrinsic
840 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
843 declare i1 @llvm.coro.alloc(token <id>)
848 The '``llvm.coro.alloc``' intrinsic returns `true` if dynamic allocation is
849 required to obtain a memory for the coroutine frame and `false` otherwise.
854 The first argument is a token returned by a call to '``llvm.coro.id``'
855 identifying the coroutine.
860 A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
861 The intrinsic is used to suppress dynamic allocation of the coroutine frame
870 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
871 %dyn.alloc.required = call i1 @llvm.coro.alloc(token %id)
872 br i1 %dyn.alloc.required, label %coro.alloc, label %coro.begin
875 %frame.size = call i32 @llvm.coro.size()
876 %alloc = call i8* @MyAlloc(i32 %frame.size)
880 %phi = phi i8* [ null, %entry ], [ %alloc, %coro.alloc ]
881 %frame = call i8* @llvm.coro.begin(token %id, i8* %phi)
885 'llvm.coro.noop' Intrinsic
886 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
889 declare i8* @llvm.coro.noop()
894 The '``llvm.coro.noop``' intrinsic returns an address of the coroutine frame of
895 a coroutine that does nothing when resumed or destroyed.
905 This intrinsic is lowered to refer to a private constant coroutine frame. The
906 resume and destroy handlers for this frame are empty functions that do nothing.
907 Note that in different translation units llvm.coro.noop may return different pointers.
911 'llvm.coro.frame' Intrinsic
912 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
915 declare i8* @llvm.coro.frame()
920 The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
921 the enclosing coroutine.
931 This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
932 a frontend convenience intrinsic that makes it easier to refer to the
937 'llvm.coro.id' Intrinsic
938 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
941 declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>,
947 The '``llvm.coro.id``' intrinsic returns a token identifying a coroutine.
952 The first argument provides information on the alignment of the memory returned
953 by the allocation function and given to `coro.begin` by the first argument. If
954 this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
955 This argument only accepts constants.
957 The second argument, if not `null`, designates a particular alloca instruction
958 to be a `coroutine promise`_.
960 The third argument is `null` coming out of the frontend. The CoroEarly pass sets
961 this argument to point to the function this coro.id belongs to.
963 The fourth argument is `null` before coroutine is split, and later is replaced
964 to point to a private global constant array containing function pointers to
965 outlined resume and destroy parts of the coroutine.
971 The purpose of this intrinsic is to tie together `coro.id`, `coro.alloc` and
972 `coro.begin` belonging to the same coroutine to prevent optimization passes from
973 duplicating any of these instructions unless entire body of the coroutine is
976 A frontend should emit exactly one `coro.id` intrinsic per coroutine.
980 'llvm.coro.end' Intrinsic
981 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
984 declare i1 @llvm.coro.end(i8* <handle>, i1 <unwind>)
989 The '``llvm.coro.end``' marks the point where execution of the resume part of
990 the coroutine should end and control should return to the caller.
996 The first argument should refer to the coroutine handle of the enclosing
997 coroutine. A frontend is allowed to supply null as the first parameter, in this
998 case `coro-early` pass will replace the null with an appropriate coroutine
1001 The second argument should be `true` if this coro.end is in the block that is
1002 part of the unwind sequence leaving the coroutine body due to an exception and
1007 The purpose of this intrinsic is to allow frontends to mark the cleanup and
1008 other code that is only relevant during the initial invocation of the coroutine
1009 and should not be present in resume and destroy parts.
1011 This intrinsic is lowered when a coroutine is split into
1012 the start, resume and destroy parts. In the start part, it is a no-op,
1013 in resume and destroy parts, it is replaced with `ret void` instruction and
1014 the rest of the block containing `coro.end` instruction is discarded.
1015 In landing pads it is replaced with an appropriate instruction to unwind to
1016 caller. The handling of coro.end differs depending on whether the target is
1017 using landingpad or WinEH exception model.
1019 For landingpad based exception model, it is expected that frontend uses the
1020 `coro.end`_ intrinsic as follows:
1022 .. code-block:: llvm
1025 %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
1026 br i1 %InResumePart, label %eh.resume, label %cleanup.cont
1029 ; rest of the cleanup
1032 %exn = load i8*, i8** %exn.slot, align 8
1033 %sel = load i32, i32* %ehselector.slot, align 4
1034 %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
1035 %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
1036 resume { i8*, i32 } %lpad.val29
1038 The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
1039 thus leading to immediate unwind to the caller, whereas in start function it
1040 is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
1041 code that is only needed during initial invocation of the coroutine.
1043 For Windows Exception handling model, a frontend should attach a funclet bundle
1044 referring to an enclosing cleanuppad as follows:
1046 .. code-block:: llvm
1049 %tok = cleanuppad within none []
1050 %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
1051 cleanupret from %tok unwind label %RestOfTheCleanup
1053 The `CoroSplit` pass, if the funclet bundle is present, will insert
1054 ``cleanupret from %tok unwind to caller`` before
1055 the `coro.end`_ intrinsic and will remove the rest of the block.
1057 The following table summarizes the handling of `coro.end`_ intrinsic.
1059 +--------------------------+-------------------+-------------------------------+
1060 | | In Start Function | In Resume/Destroy Functions |
1061 +--------------------------+-------------------+-------------------------------+
1062 |unwind=false | nothing |``ret void`` |
1063 +------------+-------------+-------------------+-------------------------------+
1064 | | WinEH | nothing |``cleanupret unwind to caller``|
1065 |unwind=true +-------------+-------------------+-------------------------------+
1066 | | Landingpad | nothing | nothing |
1067 +------------+-------------+-------------------+-------------------------------+
1072 'llvm.coro.suspend' Intrinsic
1073 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1076 declare i8 @llvm.coro.suspend(token <save>, i1 <final>)
1081 The '``llvm.coro.suspend``' marks the point where execution of the coroutine
1082 need to get suspended and control returned back to the caller.
1083 Conditional branches consuming the result of this intrinsic lead to basic blocks
1084 where coroutine should proceed when suspended (-1), resumed (0) or destroyed
1090 The first argument refers to a token of `coro.save` intrinsic that marks the
1091 point when coroutine state is prepared for suspension. If `none` token is passed,
1092 the intrinsic behaves as if there were a `coro.save` immediately preceding
1093 the `coro.suspend` intrinsic.
1095 The second argument indicates whether this suspension point is `final`_.
1096 The second argument only accepts constants. If more than one suspend point is
1097 designated as final, the resume and destroy branches should lead to the same
1100 Example (normal suspend point):
1101 """""""""""""""""""""""""""""""
1103 .. code-block:: llvm
1105 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
1106 switch i8 %0, label %suspend [i8 0, label %resume
1107 i8 1, label %cleanup]
1109 Example (final suspend point):
1110 """"""""""""""""""""""""""""""
1112 .. code-block:: llvm
1115 %s.final = call i8 @llvm.coro.suspend(token none, i1 true)
1116 switch i8 %s.final, label %suspend [i8 0, label %trap
1117 i8 1, label %cleanup]
1119 call void @llvm.trap()
1125 If a coroutine that was suspended at the suspend point marked by this intrinsic
1126 is resumed via `coro.resume`_ the control will transfer to the basic block
1127 of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
1128 basic block indicated by the 1-case. To suspend, coroutine proceed to the
1131 If suspend intrinsic is marked as final, it can consider the `true` branch
1132 unreachable and can perform optimizations that can take advantage of that fact.
1136 'llvm.coro.save' Intrinsic
1137 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1140 declare token @llvm.coro.save(i8* <handle>)
1145 The '``llvm.coro.save``' marks the point where a coroutine need to update its
1146 state to prepare for resumption to be considered suspended (and thus eligible
1152 The first argument points to a coroutine handle of the enclosing coroutine.
1157 Whatever coroutine state changes are required to enable resumption of
1158 the coroutine from the corresponding suspend point should be done at the point
1159 of `coro.save` intrinsic.
1164 Separate save and suspend points are necessary when a coroutine is used to
1165 represent an asynchronous control flow driven by callbacks representing
1166 completions of asynchronous operations.
1168 In such a case, a coroutine should be ready for resumption prior to a call to
1169 `async_op` function that may trigger resumption of a coroutine from the same or
1170 a different thread possibly prior to `async_op` call returning control back
1173 .. code-block:: llvm
1175 %save1 = call token @llvm.coro.save(i8* %hdl)
1176 call void @async_op1(i8* %hdl)
1177 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
1178 switch i8 %suspend1, label %suspend [i8 0, label %resume1
1179 i8 1, label %cleanup]
1183 'llvm.coro.param' Intrinsic
1184 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1187 declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)
1192 The '``llvm.coro.param``' is used by a frontend to mark up the code used to
1193 construct and destruct copies of the parameters. If the optimizer discovers that
1194 a particular parameter copy is not used after any suspends, it can remove the
1195 construction and destruction of the copy by replacing corresponding coro.param
1196 with `i1 false` and replacing any use of the `copy` with the `original`.
1201 The first argument points to an `alloca` storing the value of a parameter to a
1204 The second argument points to an `alloca` storing the value of the copy of that
1210 The optimizer is free to always replace this intrinsic with `i1 true`.
1212 The optimizer is also allowed to replace it with `i1 false` provided that the
1213 parameter copy is only used prior to control flow reaching any of the suspend
1214 points. The code that would be DCE'd if the `coro.param` is replaced with
1215 `i1 false` is not considered to be a use of the parameter copy.
1217 The frontend can emit this intrinsic if its language rules allow for this
1222 Consider the following example. A coroutine takes two parameters `a` and `b`
1223 that has a destructor and a move constructor.
1227 struct A { ~A(); A(A&&); bool foo(); void bar(); };
1229 task<int> f(A a, A b) {
1234 co_await read_async(); // introduces suspend point
1238 Note that, uses of `b` is used after a suspend point and thus must be copied
1239 into a coroutine frame, whereas `a` does not have to, since it never used
1242 A frontend can create parameter copies for `a` and `b` as follows:
1244 .. code-block:: text
1246 task<int> f(A a', A b') {
1249 // move parameters to its copies
1250 if (coro.param(a', a)) A::A(a, A&& a');
1251 if (coro.param(b', b)) A::A(b, A&& b');
1253 // destroy parameters copies
1254 if (coro.param(a', a)) A::~A(a);
1255 if (coro.param(b', b)) A::~A(b);
1258 The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
1259 of `a` with `a'`, since it is not used after suspend.
1261 The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
1262 after suspend and therefore, it has to reside in the coroutine frame.
1264 Coroutine Transformation Passes
1265 ===============================
1268 The pass CoroEarly lowers coroutine intrinsics that hide the details of the
1269 structure of the coroutine frame, but, otherwise not needed to be preserved to
1270 help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
1271 and `coro.promise`_ intrinsics.
1277 The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
1278 into separate functions.
1282 The pass CoroElide examines if the inlined coroutine is eligible for heap
1283 allocation elision optimization. If so, it replaces
1284 `coro.begin` intrinsic with an address of a coroutine frame placed on its caller
1285 and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
1286 respectively to remove the deallocation code.
1287 This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
1288 calls to resume and destroy functions for a particular coroutine where possible.
1292 This pass runs late to lower all coroutine related intrinsics not replaced by
1295 Areas Requiring Attention
1296 =========================
1297 #. A coroutine frame is bigger than it could be. Adding stack packing and stack
1298 coloring like optimization on the coroutine frame will result in tighter
1301 #. Take advantage of the lifetime intrinsics for the data that goes into the
1302 coroutine frame. Leave lifetime intrinsics as is for the data that stays in
1305 #. The CoroElide optimization pass relies on coroutine ramp function to be
1306 inlined. It would be beneficial to split the ramp function further to
1307 increase the chance that it will get inlined into its caller.
1309 #. Design a convention that would make it possible to apply coroutine heap
1310 elision optimization across ABI boundaries.
1312 #. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).
1314 #. Alignment is ignored by coro.begin and coro.free intrinsics.
1316 #. Make required changes to make sure that coroutine optimizations work with
1319 #. More tests, more tests, more tests