Vinayak - Shenoy - Vectorization Methods
Vinayak - Shenoy - Vectorization Methods
ENG20DS0047
Vectorizatio
n methods
1. V E C T O R I N T R I N S I C S
2. A S S E M B L E R C O D E F O R V E C T O R I Z AT I O N
Vector intrinsics are low-level
programming constructs that
provide direct access to CPU-
specific vector instruction.
implementation
1. Intel x86 vector intrinsics version of the enhanced precision Kahan sum
Methodology:
->Header Inclusion: The code includes the header <x86intrin.h>, which provides access to
SIMD (Single Instruction, Multiple Data) instructions for x86 architectures, such as Intel's AVX.
->Static Variables: It declares a static array sum[4] with alignment of 64 bytes, which will
be used to store intermediate sums during the vectorized summation.
implementation…
->Initialization: Inside the function, it initializes local variables for the sum and the error
correction term using AVX instructions. It also sets up a loop to process the array in chunks
of 4 elements at a time.
->Vectorized Loop: The main loop processes the array elements in parallel using AVX
instructions. It loads 4 double values from the var array, performs the Kahan summation
algorithm on them, and updates the local sum and correction terms.
->Storing Results: After the loop, it stores the final sum and correction terms into the sum
array.
implementation
The second implementation of the Kahan sum algorithm utilizes GCC vector extensions, which
provide a portable way to express SIMD operations in code. Unlike Intel's AVX, GCC vector
extensions can be used with a variety of architectures supported by the GCC compiler.
->Vector Type Definition: Inside the function, it defines a vector type vec4d using GCC vector
extensions. This type represents a vector of four doubles.
->Initialization: It initializes local variables for the sum and the error correction term as vectors
of zeros.
->Vectorized Loop: The main loop processes the array elements in chunks of four doubles at a
time. It loads these elements into vector variables, performs the Kahan summation algorithm on
them, and updates the local sum and correction terms.
Vector Intrinsics-3 rd
implementation
Implementation of the Kahan sum using C++ vector intrinsics