Vinayak - Shenoy - Vectorization Methods

vectorization

Uploaded by

Nandini Hazarika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views9 pages

Vinayak - Shenoy - Vectorization Methods

vectorization

Uploaded by

Nandini Hazarika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Vinayak Mohith Shenoy Y

ENG20DS0047

Vectorizatio
n methods
1. V E C T O R I N T R I N S I C S
2. A S S E M B L E R C O D E F O R V E C T O R I Z AT I O N
Vector intrinsics are low-level
programming constructs that
provide direct access to CPU-
specific vector instruction.

HELPS US TO HANDLE TROUBLESOME LOOPS

Kahan sum, a method used to minimize the
accumulation of rounding errors when summing a
sequence of floating-point numbers.
In standard floating-point arithmetic, small rounding errors
can accumulate, especially when adding a large number
of numbers with varying magnitudes. These errors can
significantly affect the accuracy of the final result,
particularly in numerical computations where precision is
crucial.
Vector Intrinsics-1 st

implementation
1. Intel x86 vector intrinsics version of the enhanced precision Kahan sum

Methodology:

->Header Inclusion: The code includes the header <x86intrin.h>, which provides access to
SIMD (Single Instruction, Multiple Data) instructions for x86 architectures, such as Intel's AVX.

->Static Variables: It declares a static array sum[4] with alignment of 64 bytes, which will
be used to store intermediate sums during the vectorized summation.

->Function Definition: The function do_kahan_sum_v takes a pointer to an array of doubles

(var) and the number of elements in the array (ncells) as inputs. It returns the final sum
calculated using the Kahan summation algorithm.
Vector Intrinsics-1 st

implementation…
->Initialization: Inside the function, it initializes local variables for the sum and the error
correction term using AVX instructions. It also sets up a loop to process the array in chunks
of 4 elements at a time.

->Vectorized Loop: The main loop processes the array elements in parallel using AVX
instructions. It loads 4 double values from the var array, performs the Kahan summation
algorithm on them, and updates the local sum and correction terms.

->Storing Results: After the loop, it stores the final sum and correction terms into the sum
array.

->Sequential Summation: Finally, it performs a sequential summation of the elements in

the sum array to obtain the final result, accounting for any remaining error correction terms.

->Return: The function returns the calculated final sum.

Vector Intrinsics-2 nd

implementation
The second implementation of the Kahan sum algorithm utilizes GCC vector extensions, which
provide a portable way to express SIMD operations in code. Unlike Intel's AVX, GCC vector
extensions can be used with a variety of architectures supported by the GCC compiler.

Changes in Methodology from the previous implementation:

->Vector Type Definition: Inside the function, it defines a vector type vec4d using GCC vector
extensions. This type represents a vector of four doubles.

->Initialization: It initializes local variables for the sum and the error correction term as vectors
of zeros.

->Vectorized Loop: The main loop processes the array elements in chunks of four doubles at a
time. It loads these elements into vector variables, performs the Kahan summation algorithm on
them, and updates the local sum and correction terms.
Vector Intrinsics-3 rd

implementation
Implementation of the Kahan sum using C++ vector intrinsics

Implementation of the Kahan sum using C++ vector intrinsics

The AVX and GCC implementations directly utilize hardware-specific SIMD

instructions, while Agner Fog's library provides a higher-level interface for SIMD
operations in C++, offering enhanced precision and portability across platforms.

Agner Fog's C++ Vector Class Library:

Provides an abstraction layer for SIMD operations in C++.

Offers enhanced precision and performance for numerical computations.

Designed for cross-platform compatibility.

Requires inclusion and usage of the library in C++ code.

Assembler coding for vectorization

 Writing vector assembly instructions offers the potential for maximum

performance optimization.
Requires an in-depth understanding of various vector instructions and
their performance characteristics across different processors.
However, programmers without expertise may achieve better
performance using vector intrinsics.
Vector assembly code is less portable and only compatible with specific
processor architectures.
Due to these limitations and complexity, writing vector assembly
instructions is rarely recommended for most applications.
Thank you