docs/OpenCLTODOList.txt

   1 Gromacs – OpenCL Porting
   2 TODO List
   3
   4 TABLE OF CONTENTS
   5 1. KNOWN LIMITATIONS
   6 2. CODE IMPROVEMENTS
   7 3. ENHANCEMENTS
   8 4. OPTIMIZATIONS
   9 5. OTHER NOTES
  10 6. TESTED CONFIGURATIONS
  11
  12 1. KNOWN LIMITATIONS
  13    =================
  14 - Currently there are no known limitations.
  15
  16 2. CODE IMPROVEMENTS
  17    =================
  18 - Errors returned by OpenCL functions are handled by using assert calls. This
  19   needs to be improved.
  20   See also Issue #6 - https://github.com/StreamComputing/gromacs/issues/6
  21
  22 - clCreateBuffer is always called with CL_MEM_READ_WRITE flag. This needs to be
  23   updated so that only the flags that reflect how the buffer is used are provided.
  24   For example, if the device is only going to read from a buffer,
  25   CL_MEM_READ_ONLY should be used.
  26   See also Issue #13 - https://github.com/StreamComputing/gromacs/issues/13
  27
  28 - The data structures shared between the OpenCL host and device are defined twice:
  29   once in the host code, once in the device code. They must be moved to a single
  30   file and shared between the host and the device.
  31   See also Issue #16 - https://github.com/StreamComputing/gromacs/issues/16
  32
  33 - Quite a few error conditions are unhandled, noted with TODOs in several files
  34
  35 - gmx_device_info_t needs struct field documentation
  36
  37 3. ENHANCEMENTS
  38    ============
  39 - Implement OpenCL kernels for Intel GPUs
  40
  41 - Implement OpenCL kernels for Intel CPUs
  42
  43 - Improve GPU device sorting in detect_gpus
  44   See also Issue #64 - https://github.com/StreamComputing/gromacs/issues/64
  45
  46 - Implement warp independent kernels
  47   See also Issue #66 - https://github.com/StreamComputing/gromacs/issues/66
  48
  49 - Have one OpenCL program object per OpenCL kernel
  50   See also Issue #86 - https://github.com/StreamComputing/gromacs/issues/86
  51
  52 - Consider parallelising JIT of programs over CPU cores to improve startup
  53   time
  54
  55 - Re-consider caching JIT artefacts to improve startup time
  56
  57 4. OPTIMIZATIONS
  58    =============
  59 - Defining nbparam fields as constants when building the OpenCL kernels
  60   See also Issue #87 - https://github.com/StreamComputing/gromacs/issues/87
  61
  62 - Fix the tabulated Ewald kernel. This has the potential of being faster than
  63   the analytical Ewald kernel
  64   See also Issue #65 - https://github.com/StreamComputing/gromacs/issues/65
  65
  66 - Evaluate gpu_min_ci_balanced_factor impact on performance for AMD
  67   See also Issue #69: https://github.com/StreamComputing/gromacs/issues/69
  68
  69 - Update ocl_pmalloc to allocate page locked memory
  70   See also Issue #90: https://github.com/StreamComputing/gromacs/issues/90
  71
  72 - Update kernel for 128/256threads/block
  73   See also Issue #92: https://github.com/StreamComputing/gromacs/issues/92
  74
  75 - Update the kernels to use OpenCL 2.0 workgroup level functions if they prove
  76   to bring a significant speedup.
  77   See also Issue #93: https://github.com/StreamComputing/gromacs/issues/93
  78
  79 - Update the kernels to use fixed precision accumulation for force and energy
  80   values, if this implementation is faster and does not affect precision.
  81   See also Issue #94: https://github.com/StreamComputing/gromacs/issues/94
  82
  83 5. OTHER NOTES
  84    ===========
  85 - NVIDIA GPUs are not handled differently depending on compute capability
  86
  87 - Because the tabulated kernels have a bug not yet fixed, the current
  88   implementation uses only the analytical kernels and never the tabulated ones
  89   See also Issue #65 - https://github.com/StreamComputing/gromacs/issues/65
  90
  91 - Unlike the CUDA version, the OpenCL implementation uses normal buffers
  92   instead of textures
  93   See also Issue #88 - https://github.com/StreamComputing/gromacs/issues/88