I have a small list of numbers containing 16 bits integers.
My list is limited to 32 numbers.
I would like to use SSE with 256 bits registers (__m256i)
The list is stored in TWO mm256 registers offering 32 slots.
The value 0 means that a slot is free.
Here are the operations I want to optimize:
- free all slots containing a value greater than x
- remove a value x
- given some int16 values (not more than 8) insert these values in the list in empty slots. For example, 4 5 8 0 0 0 0 means that we want to insert values 4 5 and 8 in the list.
- gather values: gather all values (different from zero) on the left
We suppose that enough slots are present for insertion dont check that
We do not need that non zero values are all lying on the left after any operation. The gather operation solves this problem.
You should propose a working example in c++ using intrinsics (NO AVX512) only 256 bits registers are allowed