TP1
TP1
Exercise 1:
• This exercise aims to explore the impact of memory access strides on the performance of
a C program.
• The following program allocates an array of doubles, initializes it to 1.0, and then performs
a summation while traversing the array with different strides.
# include "stdio.h"
# include "stdlib.h"
# include "time.h"
# define MAX_STRIDE 20
int main ()
{
int N = 1000000;
double *a;
printf("stride␣,␣sum ,␣time␣(msec),␣rate␣(MB/s)\n");
Compilation
• Compile the program with O0 (without any optimization):
gcc -O0 -o stride stride.c
After unrolling
for (int i = 0; i < N; i += 4) {
sum += arr[i] + arr[i + 1] + arr[i + 2] + arr[i + 3];
}
Expected Output
• The following figures show an example of expected results (results may vary):
Figure 1: CPU time vs. Stride size (left), Memory bandwidth vs. Stride size (right)
Exercise 2:
• Write mxm.c to implement the standard matrix multiplication using three nested loops.
for (int i = 0; i < n; i++)
for (int j = 0; j < n ; j++)
for (int k = 0; k < n ; k++)
c[i][j] += a[i][k]* b[k][j];
• Modify the loop order (jk) to optimize cache usage and improve performance.
• Compute the execution time and memory bandwidth for both versions and compare the
results.
• Explain the output.
3
Exercise 3:
• Write mxm_bloc.c for block matrix multiplication.
• Compute the CPU time and memory bandwidth for different block sizes.
• Determine the optimal block size. Explain why it is the best choice.
Compilation
• Compile the program using:
gcc -O2 -o mxm_block mxm_bloc.c
• Compare the CPU time and bandwidth for each block size.
• Identify the optimal block size and justify why it provides the best performance.
Instructions
• Modify the standard matrix multiplication algorithm to process submatrices (blocks) in-
stead of individual elements.
• Use three nested loops, but ensure that matrix elements are accessed in blocks of size B x
B.
Expected Output
• The following figures show an example of expected results (results may vary):
Figure 2: CPU time vs. Stride size (left), Memory bandwidth vs. Stride size (right)
4
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# define SIZE 5
// Copy values
memcpy(copy , arr , size * sizeof(int ));
return copy;
}
// Main function
int main () {
free_memory(array );
• Modify the program to fix memory leaks and re-run Valgrind to verify.