Fix race condition and use less memory in mono_lookup_icall_symbol. (#14532)
It is racy because it is doing on-demand initialization, which is often racy.
The rewrite changes the result to be one pointer, atomically swapped, which implies a full barrier and data dependency, so no race.
It could also be fixed by copying and sorting an array of pairs, and one pointer to that.
However in order to save memory I instead use indirect data that is an array of uint16 pointing into the original data. That is a slight memory vs. time tradeoff.
Yes there is debugging code left, under #if, that I prefer to leave, both as evidence that I tested it, and to make it somewhat but ideally testable in future.
The code also was using an unnecessary somewhat risky but probably ok here way to compare pointers, that I fixed.
As well the result is marginally smaller and faster because it was inlining something like bubblesort, now reuses qsort.