x86: Add SSE2 optimized chacha20
commite169aff0e9aacdcf466357247f1759f2c84b7fe4
authorAdhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Thu, 21 Jul 2022 13:05:03 +0000 (21 10:05 -0300)
committerAdhemerval Zanella <adhemerval.zanella@linaro.org>
Fri, 22 Jul 2022 14:58:27 +0000 (22 11:58 -0300)
treedf2413bece130a87d8b1825aef6dd4e6eea14213
parent4c128c7823e5a19058589cfac42aa96de3e15430
x86: Add SSE2 optimized chacha20

It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.
LICENSES
sysdeps/x86_64/Makefile
sysdeps/x86_64/chacha20-amd64-sse2.S [new file with mode: 0644]
sysdeps/x86_64/chacha20_arch.h [new file with mode: 0644]