bugs/issue-d0797684fabf05af24e73639e0ce5e30a145a3c5.yaml

   1 --- !ditz.rubyforge.org,2008-03-06/issue
   2 title: "Split both data and code into two sections (e.g., by file separation): common (or shared) and client (i.e., host)."
   3 desc: |-
   4   Rather than pursuing the current approach of having host code incrementally (or
   5   so) feeding source into an aggregate of compilable text, instead isolate the
   6   shared part of code from that specific only to the client, so that this shared
   7   part can be processed (at compile time) into representations appropriate for
   8   both client (i.e., OpenCL "host") and any compute servers (i.e., OpenCL
   9   devices).
  10
  11   In this way, the question of exactly how inclusions and compilation (for
  12   processing by OpenCL) should occur can be worked out independently from the
  13   normal processes of declaring data structures and code.
  14
  15   A possible drawback to the above might be limiting the extensibility into
  16   further clients of this library (either client libraries or end-user
  17   applications), but it should still be possible to provide for this while
  18   preserving the above separation internally (e.g., by offering routines for this
  19   purpose).
  20
  21   Another way of viewing this is that the fundamental functionality available
  22   within and outside of the library need not change, but that source might be
  23   rearranged to allow for a more straightforward representation for the sake of
  24   internal implementation of the library, possibly with external benefits as well
  25   (e.g., if the API header were adjusted to expose a limited subset API for use
  26   with compute servers (devices)).
  27 type: :task
  28 component: libale
  29 release: 0.0.0
  30 reporter: David Hilvert <dhilvert@auricle.dyndns.org>
  31 status: :unstarted
  32 disposition:
  33 creation_time: 2009-10-10 23:26:08.168811 Z
  34 references: []
  35
  36 id: d0797684fabf05af24e73639e0ce5e30a145a3c5
  37 log_events:
  38 - - 2009-10-10 23:26:47.592939 Z
  39   - David Hilvert <dhilvert@auricle.dyndns.org>
  40   - created
  41   - Add task for splitting between common (client and compute server) and client-specific code.
  42 - - 2009-10-10 23:44:24.896284 Z
  43   - David Hilvert <dhilvert@auricle.dyndns.org>
  44   - commented
  45   - |-
  46     Note that exposing a limited subset API for use with compute devices might
  47     well be more properly viewed as a bug than as a feature (although it could be
  48     the latter).  In any case, it is likely that this will occur for the convenience
  49     of implementation, since the types (or some subset) will be needed on the side of
  50     the compute server, and adding #ifdefs (or #ifs or so) to the header is likely
  51     the most convenient manner of accomplishing this.
  52 - - 2009-10-10 23:57:18.054878 Z
  53   - David Hilvert <dhilvert@auricle.dyndns.org>
  54   - commented
  55   - |-
  56     Other resources to look at in this area (client/server separation), but
  57     probably for implementations far more general than currently appearing within
  58     Libale, would include CORBA at least, and possibly things such as dbus, to the
  59     extent that they specify client/server relationships.  (X might also be
  60     relevant, especially as it predominantly involves such relationships within the
  61     same machine.)  One area to look at, for example, would be the IDL of CORBA.
  62     It should be noted, however, that since everything within Libale is currently
  63     implemented internally in C, all of the above would probably be overkill at the
  64     moment.  In the case of CORBA, there is a data parallel spec that could be
  65     looked at (again, probably for something more general than would be needed
  66     here).
  67
  68     All of the above might be useful were there to be interest in directly
  69     accessing the Libale server side (OpenCL compute contexts) from external
  70     libraries or applications.
  71
  72     Relevant links might include:
  73
  74     http://www.omg.org/technology/documents/formal/data_parallel.htm
  75     http://dbus.freedesktop.org/doc/dbus-faq.html
  76     http://developer.gnome.org/doc/guides/corba/html/book1.html
  77 - - 2009-10-11 01:02:22.744954 Z
  78   - David Hilvert <dhilvert@auricle.dyndns.org>
  79   - commented
  80   - |-
  81     For specifics of OpenCL capabilities for the purposes of splitting code, the
  82     spec would be the thing to reference.  E.g., something will have to be devised
  83     for sorting and aggregating sources into a program, which could probably be
  84     done explicitly via a separate include file (much as ALE's d2.cc and d3.cc
  85     aggregate namespace files in a manner recognizing dependencies).  Beyond this,
  86     nothing new should be incurred from the proposed change.  (E.g., recursion is
  87     not supported, but this introduces no new obstacles under the proposed
  88     approach.)  Latest spec as of 16 May 2009 is here:
  89
  90     http://www.khronos.org/registry/cl/specs/opencl-1.0.43.pdf
  91
  92     (For a slide presentation of why GPU and compiler technology don't always mesh
  93     very well, the following might be of interest (although much of this will
  94     ideally be addressed by OpenCL):
  95
  96     http://www.cs.unc.edu/Events/Conferences/GP2/slides/Cooper.pdf
  97
  98     The following is a much better overview of current technologies:
  99
 100     http://www.hpccommunity.org/blogs/bearcat/multi-core-gpu-background-77/
 101
 102     Unfortunately, none of the above address the issue of sharing of data
 103     structures or objects in a constructive manner.  Fortunately, all seem to
 104     suggest OpenCL (or something like it) as the most feasible current approach.
 105 - - 2009-10-11 04:23:45.681282 Z
 106   - David Hilvert <dhilvert@auricle.dyndns.org>
 107   - commented
 108   - |-
 109     Consider that, in addition to client-specific code, and shared code, it would
 110     probably be worthwhile to also provide for server-specific code (i.e., code
 111     specific to the compute device), as various idioms appropriate for the compute
 112     device (e.g., 'kernel' and other attributes, OpenCL built-in functions, etc.)
 113     may not be appropriate for compilation targeted to the client (OpenCL host).
 114 - - 2009-10-11 04:34:21.081366 Z
 115   - David Hilvert <dhilvert@auricle.dyndns.org>
 116   - commented
 117   - |-
 118     Note that separation of client, server, and common code could be achieved by
 119     directory (e.g., server/, client/, common/), filename component (e.g., -server,
 120     -client, -common), or by preprocessor conditionals (as will likely be done for
 121     the API header file), among other methods.
 122
 123     Other names for labeling server code might include 'kernels', 'device'
 124     (which?), or 'cl'; other names for client code might include 'host' (after
 125     OpenCL).
 126
 127     The client/server terminology seems to follow the X Window System model, which
 128     might be appropriate, especially if computation is eventually distributed over
 129     cluster or grid platforms.
 130 - - 2009-10-11 12:45:25.752974 Z
 131   - David Hilvert <dhilvert@auricle.dyndns.org>
 132   - commented
 133   - |-
 134     As a refinement to the above suggestion, consider the following:
 135
 136     If maintenance of a separate file for explicitly indicating dependencies among
 137     sources is necessary (as it would be for the proposed shared-code approach
 138     under a library, such as OpenCL, that does not currently provide for separate
 139     translation units), then such dependencies could also be indicated by calling a
 140     sequence of initializations, one per file, according to the method of
 141     incremental aggregation currently conceived (via ale_eval or equivalent).  The
 142     most obvious differences in implementation between the current approach (of
 143     aggregation of code text at runtime) and the above-proposed approach (of
 144     aggregation of code text at compile time) is that in the current approach, code
 145     sharing is limited mostly to data structures (via the current macro
 146     definitions), as no provision is currently made for sharing other kinds of
 147     code, whereas the above-proposed approach would immediately provide for
 148     complete sharing (via common files).
 149
 150     Unfortunately, it's not immediately clear that the above-proposed approach is
 151     better than an approach that does not share code.  In particular, the current
 152     choice of data-type sharing over code sharing seems, on the face of it, a good
 153     one.  Much of the code between host (or client) and device (or server) domains
 154     is not naturally shared, whereas data types can be.  To establish a large
 155     shared body of code would require that a fair amount of code be usable in both
 156     domains.  (E.g., simple routines for accessing data in a canonical way, or
 157     preserving some invariant during data modification, might be appropriate for
 158     sharing.) If, on the other hand, the degree of sharing is small, then adding a
 159     new structure of managing sharing through files might be more complex than
 160     necessary.  Furthermore, if compilation of server code occurs at run-time (as
 161     may be common under most OpenCL installations), then excessive sharing of code
 162     might impose a run-time penalty.
 163
 164     Given this, the best argument for sharing of code via files might be a more
 165     natural syntax for data structure definition.  In particular, it would no
 166     longer be necessary to define these via the macro processor.  Furthermore, the
 167     sorts of lightweight operations (simple access and modification) indicated
 168     above would now be possible.  Another advantage to having a separate file for
 169     shared code would be avoiding the need for separate consideration of each data
 170     structure (or each macro declaration).
 171
 172     Instead of something like this:
 173
 174     foo.c:
 175
 176     ALE_...(... data_struct_1 ...)
 177     ALE_...(... data_struct_2 ...)
 178     ALE_...(... data_struct_3 ...)
 179
 180     void init_foo() {
 181         ale_eval( ... data_struct_1 ... data_struct_2 ... data_struct_3 ...)
 182     }
 183
 184     bar.c:
 185
 186     init_foo();
 187
 188     We get the more natural representation:
 189
 190     foo-shared.c:
 191
 192     struct N1 { ... };
 193     struct N2 { ... };
 194     struct N3 { ... };
 195
 196     libale-server.c:
 197
 198     Where Nx may include a macro element (e.g., ALE_...(data_struct_1)), and where
 199     libale-server may ultimately be processed, if desired, by a call to ale_eval().
 200
 201     Hence there is a consolidation of data structure definitions on the server
 202     side, in addition to provisions for code sharing.  Maintaining a wrapped
 203     runtime option for structure definition (via ale_eval or otherwise) would
 204     probably be good, however, for the purposes of supporting client code or client
 205     libraries.
 206 - - 2009-10-11 13:10:51.957739 Z
 207   - David Hilvert <dhilvert@auricle.dyndns.org>
 208   - commented
 209   - |-
 210     Note that the empty line following 'libale-server.c' in my most recent comment
 211     should have read '#include "foo-shared.c"'.
 212 - - 2009-10-11 14:28:17.840394 Z
 213   - David Hilvert <dhilvert@auricle.dyndns.org>
 214   - commented
 215   - |-
 216     Consider that, if preprocessor macros were used for implementing the sorts of
 217     methods for accessing and modifying data described, then it should be possible
 218     to implement these in a manner efficient for both client and server
 219     implementations, hence strengthening the case for a body of shared code (in
 220     addition to shared data types) between client and server.
 221
 222     (Note from examples in the OpenCL spec that preprocessor conditionals can be
 223     used a fair amount for kernel definitions.  This sort of code is probably the
 224     sort that would not be good for sharing with the client.  On the other hand, a
 225     kernel could efficiently make use of macros defined in a separate, shared
 226     file.)
 227 - - 2009-10-11 15:27:41.066320 Z
 228   - David Hilvert <dhilvert@auricle.dyndns.org>
 229   - commented
 230   - |-
 231     Consider that the most straightforward approach to partition and naming of
 232     files would probably be the following:
 233
 234     Create a new header file for source files currently defining shared data types,
 235     and move the structure part of the definition into the header file.  Header
 236     file naming follows the usual C convention, so that, e.g., image.c will now
 237     have an associated file image.h.  (Since certain data type details, such as
 238     memory allocation, are currently executed only by the host, these need not be
 239     moved.)  One benefit of such division may be relieving src/libale.h of its duty
 240     as a repository of miscellaneous definitions.
 241
 242     Name files containing code specific to a compute device with the qualifier -cl,
 243     so that, e.g., image.c gains a counterpart image-cl.c.  (This bit is fairly
 244     obvious, and not immediately relevant to the question of file separation of
 245     shared code.)  Handling of the transfer of .h and -cl.c files to the OpenCL API
 246     will likely occur through a process including compile-time translation into a
 247     form allowing use with either direct OpenCL API calls or (more likely) a Libale
 248     intermediate call (e.g., ale_eval), as well as the actual execution of such
 249     calls at runtime.  Hence, a further translation step beyond standard C
 250     preprocessing should probably be added at compile time, perhaps very simple
 251     (e.g., converting a file to a quoted C string, and either including this string
 252     within some new named function or assigning it to a variable that can then be
 253     referenced by other library code).
 254 - - 2009-10-11 15:55:54.298156 Z
 255   - David Hilvert <dhilvert@auricle.dyndns.org>
 256   - commented
 257   - |-
 258     Consider that, since things like ale_eval definitions might occur within code
 259     evaluated at runtime (e.g., defining MAP_PIXEL and such for image code), such
 260     code could either be stored separately (e.g., image-ale.c separate from
 261     image-cl.c), or the two could be stored together (e.g., in the case that the
 262     file will be processed by ale_eval; an appropriate name might be image-rt.c,
 263     for runtime).
 264
 265     In the best of scenarios, it will be possible to implement MAP_PIXEL directly
 266     in the OpenCL language, so that the above is not necessary.  If this is not
 267     sufficient, however, some additional layer (e.g., m4 or a similar processor)
 268     might be used by ale_eval.
 269 - - 2009-10-17 01:31:26.090103 Z
 270   - David Hilvert <dhilvert@auricle.dyndns.org>
 271   - commented
 272   - |-
 273     Consider the following fairly straightforward argument in favor of the proposed
 274     .h (header) separation, in addition to .c source separation:
 275
 276     As previously conceived, the only separation in files for the CL program would
 277     be as a .c file, with, e.g., structure definition occurring via variables or
 278     functions generated by macros.  The revised approach (using an additional .h
 279     file) may be seen as more convenient, as macros are not required for this, as
 280     outlined in earlier comments.  This is, however, the lesser concern at the
 281     moment.
 282
 283     More to the point, compilation is currently failing on the src/align.c file, with
 284     first error occurring at a structure making reference to an elem_bounds_t type
 285     from the transformation class.  In order to reference this structure from align.c,
 286     an inclusion of some sort will be necessary.
 287
 288     Preserving the current file structure, options would be either of include/ale.h
 289     and src/libale.h, but neither of these are ideal.  Better would be to associate
 290     the structure definition more tightly; else, structure definitions would be
 291     distributed somewhat haphazardly between these files and the .c files according
 292     to whether they were used elsewhere.
 293
 294     (Note that there are currently efforts for automatic parallelization within
 295     GCC, under the Graphite framework, but that efforts toward integration with an
 296     OpenCL backend are apparently not being made at the moment.  One possibility
 297     would be to use either something similar to OpenMP (as Graphite may ? ), an
 298     extension to this, or a separate syntax (as I had suggested on the mailing
 299     list) to indicate areas that should be parallelized for an OpenCL backend
 300     specifically.)
 301
 302     (A further alternative would be to await a hardware (e.g., Larrabee) or
 303     software (e.g., OpenCL alternative or successor) solution more suited to
 304     automatic methods.  Binary translation based on QEMU might be one very general
 305     possibility, but probably more challenging than even a compiler [e.g., GCC]
 306     approach, the advantage of these over the current approach being largely
 307     cleanness and generality, which might lead to greater maintainability in the
 308     long run.)
 309 - - 2009-10-17 02:28:18.322606 Z
 310   - David Hilvert <dhilvert@auricle.dyndns.org>
 311   - commented
 312   - |-
 313     The following might be worth looking at:
 314
 315     http://portal.acm.org/citation.cfm?id=1504194
 316 - - 2009-10-18 01:36:20.217266 Z
 317   - David Hilvert <dhilvert@auricle.dyndns.org>
 318   - commented
 319   - |-
 320     Rather lengthy discussion regarding OpenCL and OpenMP can be found here:
 321
 322     http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=119644
 323
 324     This discussion references the earlier-linked paper.  There is also an
 325     (apparently) proprietary compiler linked from this discussion having a CUDA
 326     (also proprietary) back-end, but nothing obvious currently available with an
 327     OpenCL back-end, or otherwise suitable as a standard replacement for direct
 328     OpenCL coding.
 329 - - 2009-10-20 14:41:03.658982 Z
 330   - David Hilvert <dhilvert@auricle.dyndns.org>
 331   - commented
 332   - |-
 333     Note that kernel and hardware mechanisms (esp. page faulting and caching) are
 334     probably the correct approach to pointer sharing, and that it would probably be
 335     best in the long run for either (a) GPUs to adopt a hardware memory management
 336     approach, or (b) GPUs to be sufficiently integrated with CPUs such that memory
 337     management can be shared between the two.  (One might think that Intel would
 338     capitalize on this idea sooner rather than later -- they seem to be postponing
 339     this until Larrabee finally ships instead of planning something sooner; despite
 340     their existing integrated graphics line, they have as yet made no annoucements
 341     on GPGPU advances taking advantage of more uniform memory management between
 342     the processors.)
 343
 344     Given all of the above, while a compiler solution is likely possible to some
 345     extent, and might work for ALE, the difficulty of working out special cases
 346     such as void pointers and casting suggests that finding a long-term maintainer
 347     for such a solution within the GCC maintainer community might be a bit
 348     difficult.  (Where void pointers and casting are exactly the sorts of things
 349     that hardware facilities could trivially cope with, but compiler features might
 350     struggle with.)  For now, management of pointers within Libale via the OpenCL
 351     API might continue to be acceptable, but a hardware solution should probably be
 352     looked for in the long run.
 353 git_branch: