Public Git Hosting - gromacs.git/commit

commit	22118220401cee6f51d49c0a034e9fe5b4ba4260
author	Jonathan Vincent <jvincent@nvidia.com>
	Wed, 2 Jan 2019 16:59:18 +0000 (2 08:59 -0800)
committer	Szilárd Páll <pall.szilard@gmail.com>
	Wed, 30 Oct 2019 12:21:23 +0000 (30 13:21 +0100)
tree	e5c0bcc234a57da7ecbefc1a80569dfbc4c2caec	tree \| snapshot (tar.gz zip)
parent	2430a7a9a7f026eb723ff68315f366b086190431	commit \| diff

Update PME CUDA spread/gather

Adds addtional templated kernels to the CUDA spread and
gather kernels. Allowing the use of 4 threads per atom instead of
16 and allowing the spline data to be recalculated in the spread
instead of saved to global memory and reloaded.

The combinations mean we have 4 different kernels that can be called
depending on which is most appropriate for the problem size and
hardware (to be decided heuritically). By default existing method is
used (16 threads per atom, saving and reloading of spline data).

Added an additional option to disable the preloading of charges and
coordinates into shared memory, and instead each thread would
deal with a single atom.

Removed the (currently disabled) PME_GPU_PARALLEL_SPLINE=1 code
path.

Refs #2792 #3185 #3186 #3187 #3188

Change-Id: Ia48d8eb63e38d0d23eefd755dcc228ff9b66d3e6

src/gromacs/ewald/pme_calculate_splines.cuh	[new file with mode: 0644]	blob
src/gromacs/ewald/pme_gather.cu		diff \| blob \| blame \| history
src/gromacs/ewald/pme_gpu_constants.h		diff \| blob \| blame \| history
src/gromacs/ewald/pme_gpu_internal.cpp		diff \| blob \| blame \| history
src/gromacs/ewald/pme_gpu_program_impl.cu		diff \| blob \| blame \| history
src/gromacs/ewald/pme_gpu_program_impl.h		diff \| blob \| blame \| history
src/gromacs/ewald/pme_gpu_program_impl_ocl.cpp		diff \| blob \| blame \| history
src/gromacs/ewald/pme_spread.cu		diff \| blob \| blame \| history