[netcore][llvm] Implement Sse1-4.2 subsets used by corlib (#18103)
* Implement SSE41 subset used by corlib
* Implement Sse2.MoveMask
* only works with llvm
* Implement Sse3 and Ssse3 subsets used by corlib
* Implement a few SSE1 methods
* Address feedback, also, implement Sse.Add/Subtract
* Fix build
* Implement Sse.Multiply, Sse.Store
* Implement Sse.CompareNotEquals
* Implement Sse.MoveScalar
* Finish SSE1 corlib subset
* Implement Sse.LoadVector128, Sse.Shuffle
* Sse.Shuffle cleanup
* Implement Sse2 APIs
* More of SSE2: LoadAlignedVector128, Compare*
* Implement Sse2.Unpack* and Sse2.StoreScalar
* Implement Sse2.PackUnsignedSaturate
* Implement Sse2.ShiftRightLogical
* Implement Sse2.Shuffle
* Implement Vector128<T>.Zero
* Fix CreateScalarUnsafe
* Implement Vector128.As*, Fix Vector128.CreateScalarUnsafe
* Fix failures
* Fix failures
* Fix Sse.MoveMask
* remove redundant null checks
* Fix AOT failures
* fix compilation warrning
* rename create_vector_mask_*
* Fix failures found via tests
* Index in Sse41.Insert has to be a constant
* add local tests for mono
* Update tests (cleanup)
* Code cleanup
* test
* fix typo
* Clean up
* Clean up
* Implement And, AndNot, Or, Xor, Divide for Sse1
* Cleanup
* Fix build
* limit emit_vector128 with LLVM
* enable IsSupported for corlib
* Fix build on wasm (if-defs issue)
* Don't intrinsify Vector256
* Address feedback