1 Notes on the external ABI presented by libgomp. This ought to get
2 transformed into proper documentation at some point.
4 Implementing MASTER construct
6 if (omp_get_thread_num () == 0)
9 Alternately, we generate two copies of the parallel subfunction
10 and only include this in the version run by the master thread.
11 Surely that's not worthwhile though...
13 Implementing CRITICAL construct
15 Without a specified name,
17 void GOMP_critical_start (void);
18 void GOMP_critical_end (void);
20 so that we don't get COPY relocations from libgomp to the main
23 With a specified name, use omp_set_lock and omp_unset_lock with
24 name being transformed into a variable declared like
26 omp_lock_t gomp_critical_user_<name>
27 __attribute__((common))
29 Ideally the ABI would specify that all zero is a valid unlocked
30 state, and so we wouldn't actually need to initialize this at
33 Implementing ATOMIC construct
35 The target should implement the __sync builtins.
37 Failing that we could add
39 void GOMP_atomic_enter (void)
40 void GOMP_atomic_exit (void)
42 which reuses the regular lock code, but with yet another lock
43 object private to the library.
45 Implementing FLUSH construct
47 Expands to the __sync_synchronize builtin.
49 Implementing BARRIER construct
51 void GOMP_barrier (void)
53 Implementing THREADPRIVATE construct
55 In _most_ cases we can map this directly to __thread. Except
56 that OMP allows constructors for C++ objects. We can either
57 refuse to support this (how often is it used?) or we can
58 implement something akin to .ctors.
60 Even more ideally, this ctor feature is handled by extensions
61 to the main pthreads library. Failing that, we can have a set
62 of entry points to register ctor functions to be called.
64 Implementing PRIVATE clause
66 In association with a PARALLEL, or within the lexical extent
67 of a PARALLEL block, the variable becomes a local variable in
68 the parallel subfunction.
70 In association with FOR or SECTIONS blocks, create a new
71 automatic variable within the current function. This preserves
72 the semantic of new variable creation.
74 Implementing FIRSTPRIVATE, LASTPRIVATE, COPYIN, COPYPRIVATE clauses
76 Seems simple enough for PARALLEL blocks. Create a private
77 struct for communicating between parent and subfunction.
78 In the parent, copy in values for scalar and "small" structs;
79 copy in addresses for others TREE_ADDRESSABLE types. In the
80 subfunction, copy the value into the local variable.
82 Not clear at all what to do with bare FOR or SECTION blocks.
83 The only thing I can figure is that we do something like
86 #pragma omp for firstprivate(x) lastprivate(y)
87 for (int i = 0; i < n; ++i)
101 where the "x=x" and "y=y" assignments actually have different
102 uids for the two variables, i.e. not something you could write
103 directly in C. Presumably this only makes sense if the "outer"
104 x and y are global variables.
106 COPYPRIVATE would work the same way, except the structure
107 broadcast would have to happen via SINGLE machinery instead.
109 Implementing REDUCTION clause
111 The private struct mentioned above should have a pointer to
112 an array of the type of the variable, indexed by the thread's
113 team_id. The thread stores its final value into the array,
114 and after the barrier the master thread iterates over the
115 array to collect the values.
117 Implementing PARALLEL construct
126 void subfunction (void *data)
133 GOMP_parallel_start (subfunction, &data, num_threads);
135 GOMP_parallel_end ();
137 void GOMP_parallel_start (void (*fn)(void *), void *data,
138 unsigned num_threads)
140 The FN argument is the subfunction to be run in parallel.
142 The DATA argument is a pointer to a structure used to
143 communicate data in and out of the subfunction, as discussed
144 above wrt FIRSTPRIVATE et al.
146 The NUM_THREADS argument is 1 if an IF clause is present
147 and false, or the value of the NUM_THREADS clause, if
150 The function needs to create the appropriate number of
151 threads and/or launch them from the dock. It needs to
152 create the team structure and assign team ids.
154 void GOMP_parallel_end (void)
156 Tears down the team and return us to the previous
157 omp_in_parallel() state.
159 Implementing FOR construct
161 #pragma omp parallel for
162 for (i = lb; i <= ub; i++)
167 void subfunction (void *data)
170 while (GOMP_loop_static_next (&_s0, &_e0))
173 for (i = _s0; i < _e1; i++)
176 GOMP_loop_end_nowait ();
179 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
181 GOMP_parallel_end ();
183 #pragma omp for schedule(runtime)
184 for (i = 0; i < n; i++)
191 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
194 for (i = _s0, i < _e0; i++)
196 } while (GOMP_loop_runtime_next (&_s0, _&e0));
200 Note that while it looks like there is trickyness to propagating
201 a non-constant STEP, there isn't really. We're explicitly allowed
202 to evaluate it as many times as we want, and any variables involved
203 should automatically be handled as PRIVATE or SHARED like any other
204 variables. So the expression should remain evaluable in the
205 subfunction. We can also pull it into a local variable if we like,
206 but since its supposed to remain unchanged, we can also not if we like.
208 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
209 able to get away with no work-sharing context at all, since we can
210 simply perform the arithmetic directly in each thread to divide up
211 the iterations. Which would mean that we wouldn't need to call any
214 There are separate routines for handling loops with an ORDERED
215 clause. Bookkeeping for that is non-trivial...
217 Implementing ORDERED construct
219 void GOMP_ordered_start (void)
220 void GOMP_ordered_end (void)
222 Implementing SECTIONS construct
236 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
251 Implementing SINGLE construct
260 if (GOMP_single_start ())
265 #pragma omp single copyprivate(x)
270 datap = GOMP_single_copy_start ();
275 GOMP_single_copy_end (&data);