Avoid a CPU memory fence, and better order memory barriers to accesses