Add SIMD intrinsics version of simple update
To get better performance in cases where the compiler can't vectorize
the simple leap frog integrator loop and to reduce cache pressure of
the invMassPerDim, introduced a SIMD intrinsics version of the simple
leap-frog update without pressure coupling and one T-scale factor.
To achieve this md->invmass now uses the aligned allocation policy
and is padded by GMX_REAL_MAX_SIMD_WIDTH elements.
Asserts have been added to check for the padding.
Change-Id: I98f766e32adc292403782dc67f941a816609e304