Add arm assembler for dsp_apply_gain(). Speeds up this routine by 30-40% on PP502x.