The compiler automatically stripmines your loop and generates a cleanup loop. This means you do not need to unroll your loops, and, in most cases, this will also enable more vectorization.
Before Vectorization |
---|
i=0; while(i<n) { // Original loop code a[i]=b[i]+c[i]; ++i; } |
After Vectorization |
---|
// The vectorizer generates the following two loops i=0;
while(i<(n-n%4)) { // Vector strip-mined loop // Subscript [i:i+3] denotes SIMD execution a[i:i+3]=b[i:i+3]+c[i:i+3]; i=i+4; }
while(i<n) { // Scalar clean-up loop a[i]=b[i]+c[i]; } |