[AArch64] Implement vmul<q>_lane<q>_<fsu><16,32,64> intrinsics in C