Optimise _krb5_n_fold() a bit.
All in lib/krb5/n-fold.c:
1. eliminate malloc/free from rr13() because it is always a
buffer of the same size called in a tight loop.
2. eliminate memcpy(3) from rr13() by bouncing back and forth
between two buffers buf1, buf2 instead of performing the
calculation into a tmp buffer and memcpy(3)ing the result
back into buf.
3. eliminate code cases from rr13() that I can visually determine
will never occur but I'm guessing that the compiler can't, i.e.
i. now that we're no longer using malloc(3), rr13()
cannot fail, so make it void and avoid the if in
the calling routine checking its error code. In
case you ask, yes, this made the tests run a little
faster,
ii. rr13() has code for being passed a number of bits
not divisble by 8 but _krb5_n_fold() only passes
an int * 8. So, we eliminate this conditional and
the associated code.
4. we make rr13() take 2 destination buffers and copy the results
into both of them, we use this to eliminate another memcpy(3)
from the calling routine. This appears to make it a bit faster
as well.