Use SSE to do 4 samples at once (non-HRTF direct mix), instead of to apply a matrix row