I learned something interesting today, gcc doesn't know memalign() returns aligned pointers (notice all the movdqus). This caused some very confusing benchmark results 😆. The solution was to use aligned_alloc() or __builtin_assume_aligned().
godbolt.org/z/j6vboqcME
3
9
71

