0% found this document useful (0 votes)

4 views

TP1

The document outlines exercises focused on optimizing memory access in C programs, including the impact of memory access strides, loop unrolling, and instruction scheduling. It also covers matrix multiplication techniques, including standard and block methods, and emphasizes memory management and debugging using Valgrind. Each exercise includes compilation instructions, execution analysis, and expected outputs for performance comparison.

Uploaded by

Mohi Gpt4

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

TP1

Uploaded by

Mohi Gpt4

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Mohammed VI Polytechnic University

TP1 - Optimizing Memory Access

Imad Kissami
February 13, 2025

Exercise 1:
• This exercise aims to explore the impact of memory access strides on the performance of
a C program.

• The following program allocates an array of doubles, initializes it to 1.0, and then performs
a summation while traversing the array with different strides.

# include "stdio.h"
# include "stdlib.h"
# include "time.h"

# define MAX_STRIDE 20

int main ()
{
int N = 1000000;
double *a;

a = malloc(N * MAX_STRIDE * sizeof(double ));

double sum , rate , msec , start , end;

for (int i = 0; i < N * MAX_STRIDE; i++)

a[i] = 1.;

printf("stride␣,␣sum ,␣time␣(msec),␣rate␣(MB/s)\n");

for (int i_stride = 1; i_stride <= MAX_STRIDE; i_stride ++)

{
sum = 0.0;
start = (double)clock () / CLOCKS_PER_SEC;

for (int i = 0; i < N * i_stride; i += i_stride)

sum += a[i];

end = (double)clock () / CLOCKS_PER_SEC;

msec = (end - start) * 1000.0; // Time in milliseconds

rate = sizeof(double) * N * (1000.0 / msec) / (1024 * 1024);

printf("%d,␣%f,␣%f,␣%f\n", i_stride , sum , msec , rate );

}
free(a);
}

Compilation
• Compile the program with O0 (without any optimization):
gcc -O0 -o stride stride.c

• Compile the program with O2 (with level 2 optimization):

gcc -O2 -o stride stride.c

– Loop optimizations: Loop unrolling (partially).

for (int i = 0; i < N; i++) {
sum += arr[i];
}
2

After unrolling
for (int i = 0; i < N; i += 4) {
sum += arr[i] + arr[i + 1] + arr[i + 2] + arr[i + 3];
}

– Instruction scheduling: Reordering instructions to improve pipeline eﬀiciency.

MUL R1, R2, R3 ; Multiply (long latency)
ADD R4, R1, R5 ; Add (depends on R1)
SUB R6, R7, R8 ; Independent subtraction
After reordering
MUL R1, R2, R3 ; Multiply (long latency)
SUB R6, R7, R8 ; Independent subtraction (executed while MUL is running)
ADD R4, R1, R5 ; Add (by now, R1 is ready)

Execution and Analysis

• Run the code using -O0 and -O2 for different strides.
• Compare the execution times and bandwidths.
• Discuss the impact of loop unrolling and instruction scheduling.

Expected Output
• The following figures show an example of expected results (results may vary):

Figure 1: CPU time vs. Stride size (left), Memory bandwidth vs. Stride size (right)

Exercise 2:
• Write mxm.c to implement the standard matrix multiplication using three nested loops.
for (int i = 0; i < n; i++)
for (int j = 0; j < n ; j++)
for (int k = 0; k < n ; k++)
c[i][j] += a[i][k]* b[k][j];

• Modify the loop order (jk) to optimize cache usage and improve performance.
• Compute the execution time and memory bandwidth for both versions and compare the
results.
• Explain the output.
3

Exercise 3:
• Write mxm_bloc.c for block matrix multiplication.

• Compute the CPU time and memory bandwidth for different block sizes.

• Determine the optimal block size. Explain why it is the best choice.

Compilation
• Compile the program using:
gcc -O2 -o mxm_block mxm_bloc.c

Execution and Analysis

• Run the program with different block sizes.

• Compare the CPU time and bandwidth for each block size.

• Identify the optimal block size and justify why it provides the best performance.

Instructions
• Modify the standard matrix multiplication algorithm to process submatrices (blocks) in-
stead of individual elements.

• Use three nested loops, but ensure that matrix elements are accessed in blocks of size B x
B.

• Follow this structure for blocking:

– Divide matrices A, B, and C into blocks of size B x B.

– Compute the result for each block before moving to the next.

Expected Output
• The following figures show an example of expected results (results may vary):

Figure 2: CPU time vs. Stride size (left), Memory bandwidth vs. Stride size (right)
4

Exercise 4: Memory Management and Debugging with Valgrind

• Analyze the following C program, which allocates, initializes, prints, and duplicates an
array.

Code to Analyze (memory_debug.c)

# include <stdio.h>
# include <stdlib.h>
# include <string.h>

# define SIZE 5

// Function to allocate an array of integers

int* allocate_array(int size) {
int *arr = (int *) malloc(size * sizeof(int ));
if (!arr) {
fprintf(stderr , "Memory␣allocation␣failed\n");
exit(EXIT_FAILURE );
}
return arr;
}

// Function to initialize the array with values

void initialize_array(int *arr , int size) {
if (!arr) return; // Avoid segmentation fault
for (int i = 0; i < size; i++) {
arr[i] = i * 10;
}
}

// Function to print the array

void print_array(int *arr , int size) {
if (!arr) return; // Avoid segmentation fault
printf("Array␣elements:␣");
for (int i = 0; i < size; i++) {
printf("%d␣", arr[i]);
}
printf("\n");
}

// Function to create a duplicate of the array

int* duplicate_array(int *arr , int size) {
if (!arr) return NULL;

int copy = (int ) malloc(size * sizeof(int ));

if (! copy) {
fprintf(stderr , "Memory␣allocation␣failed\n");
exit(EXIT_FAILURE );
}

// Copy values
memcpy(copy , arr , size * sizeof(int ));

return copy;
}

// Function to free the allocated memory (deliberate memory leak left)

void free_memory(int *arr) {
// add free memory fine to fix the memory leak
}

// Main function
int main () {

int *array = allocate_array(SIZE );

initialize_array(array , SIZE );
print_array(array , SIZE );

// Creating a duplicate array

int *array_copy = duplicate_array(array , SIZE );
print_array(array_copy , SIZE );

// Free memory (deliberate error: forgetting to free `array_copy `)

free_memory(array );

return 0; // Memory leak on purpose

}

Compilation and Execution

• Compile the program with debugging symbols:
gcc -g -o memory_debug memory_debug.c

• Run the program using Valgrind to check for memory leaks:

valgrind --leak -check=full --track -origins=yes ./ memory_debug

• Use Valgrind’s Memcheck tool to detect memory leaks.

• Modify the program to fix memory leaks and re-run Valgrind to verify.

Useful Tools For Develpment
No ratings yet
Useful Tools For Develpment
18 pages
Arrays & Strings
No ratings yet
Arrays & Strings
76 pages
Csapp Lab3
No ratings yet
Csapp Lab3
7 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
14 pages
Dynamic v1
No ratings yet
Dynamic v1
30 pages
Clase de Progrea 555
No ratings yet
Clase de Progrea 555
35 pages
Code Optimization Sept. 25, 2003: "The Course That Gives CMU Its Zip!"
No ratings yet
Code Optimization Sept. 25, 2003: "The Course That Gives CMU Its Zip!"
57 pages
CO472 - A0 - pin, valgrind, perf, gprof (1)
No ratings yet
CO472 - A0 - pin, valgrind, perf, gprof (1)
3 pages
Lab
No ratings yet
Lab
22 pages
Dynamic Memory Allocation
No ratings yet
Dynamic Memory Allocation
30 pages
POINTERS and STRUCTURES programs
No ratings yet
POINTERS and STRUCTURES programs
19 pages
B.YASWANTH (RA211030010262) : Department of Networking and Communications
No ratings yet
B.YASWANTH (RA211030010262) : Department of Networking and Communications
16 pages
OpenMP Matrix
No ratings yet
OpenMP Matrix
6 pages
Midterm Sample Answer: Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto
No ratings yet
Midterm Sample Answer: Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto
18 pages
Lecture9 - Dynamic Allocation
No ratings yet
Lecture9 - Dynamic Allocation
34 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
CNotes 2
No ratings yet
CNotes 2
51 pages
Memory Management: Chapter 3 - Principles of Data Structures Using C by Vinu V Das
No ratings yet
Memory Management: Chapter 3 - Principles of Data Structures Using C by Vinu V Das
35 pages
C Programming Part2 From Arrays
No ratings yet
C Programming Part2 From Arrays
250 pages
#Include #Include #Define
No ratings yet
#Include #Include #Define
8 pages
PL01 Guiao
No ratings yet
PL01 Guiao
3 pages
Course Overview: Computer Architecture and Organization
No ratings yet
Course Overview: Computer Architecture and Organization
38 pages
Department of Computer Scienc2
No ratings yet
Department of Computer Scienc2
5 pages
Review of Arrays: Array Name
No ratings yet
Review of Arrays: Array Name
43 pages
7-9. Dynamic Array and DMA-1
No ratings yet
7-9. Dynamic Array and DMA-1
43 pages
Spring 2018 Lab Assignment #4: Understanding Cache Memories Assigned: Apr. 25 Deadline: May 9, 04:59:59 PM
No ratings yet
Spring 2018 Lab Assignment #4: Understanding Cache Memories Assigned: Apr. 25 Deadline: May 9, 04:59:59 PM
10 pages
Lab 9 Report
No ratings yet
Lab 9 Report
21 pages
FUNDAMENTAL OF Programming
No ratings yet
FUNDAMENTAL OF Programming
9 pages
Dynamic v1
No ratings yet
Dynamic v1
30 pages
Advanced Topics in C Programming
No ratings yet
Advanced Topics in C Programming
10 pages
Lab5 Mat Ops Pthreads 11
No ratings yet
Lab5 Mat Ops Pthreads 11
6 pages
Session 6
No ratings yet
Session 6
38 pages
Module 11 - Dynamic Memory Allocation
No ratings yet
Module 11 - Dynamic Memory Allocation
58 pages
CSE161 Lec 16 Dynamic Memory Allocation
No ratings yet
CSE161 Lec 16 Dynamic Memory Allocation
17 pages
Unit 2 Basic Optimization Techniques For Serial Code
No ratings yet
Unit 2 Basic Optimization Techniques For Serial Code
31 pages
Short Notes On C/C++
No ratings yet
Short Notes On C/C++
55 pages
KTLTR Week03 Publish
No ratings yet
KTLTR Week03 Publish
61 pages
COSS - Lecture - 6 - With Annotation
No ratings yet
COSS - Lecture - 6 - With Annotation
37 pages
CIS 190: C/C++ Programming: Memory Management in C
No ratings yet
CIS 190: C/C++ Programming: Memory Management in C
79 pages
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
No ratings yet
Parallel Computing Platforms: Chieh-Sen (Jason) Huang
21 pages
Operating Systems Lab Assignment 5: Developing Multi-Threaded Applications
No ratings yet
Operating Systems Lab Assignment 5: Developing Multi-Threaded Applications
7 pages
AT - Better C Code For ARM Devices
No ratings yet
AT - Better C Code For ARM Devices
30 pages
02A-II Stack Arrays Strings
No ratings yet
02A-II Stack Arrays Strings
47 pages
09 Pointers Arrays
No ratings yet
09 Pointers Arrays
34 pages
Short Notes On Dynamic Memory Allocation, Pointer and Data Structure
No ratings yet
Short Notes On Dynamic Memory Allocation, Pointer and Data Structure
25 pages
CSE109 Week8.2
No ratings yet
CSE109 Week8.2
18 pages
Dynamic Memory Allocation: Free. There Is Also Sizeof Function Used To Determine The Number of
No ratings yet
Dynamic Memory Allocation: Free. There Is Also Sizeof Function Used To Determine The Number of
9 pages
CSE23302_Module 2 _Chapter 2.pptx
No ratings yet
CSE23302_Module 2 _Chapter 2.pptx
19 pages
DAA Mini Project (1)
No ratings yet
DAA Mini Project (1)
6 pages
SWAYAM DAY10PTC
No ratings yet
SWAYAM DAY10PTC
14 pages
C++ Manual
No ratings yet
C++ Manual
44 pages
Himanshu
No ratings yet
Himanshu
24 pages
7 Pointers
No ratings yet
7 Pointers
40 pages
Data Structure One To Ten
No ratings yet
Data Structure One To Ten
12 pages
LEC12-Optimization and New Trends
No ratings yet
LEC12-Optimization and New Trends
23 pages
Lab 2
No ratings yet
Lab 2
2 pages
DSA Full Final
No ratings yet
DSA Full Final
74 pages
LAB 12 Dynamic Memory: 12.1 Objectives
No ratings yet
LAB 12 Dynamic Memory: 12.1 Objectives
5 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
EXERCICES ON PHP PROGRAMMING LANGUAGE
No ratings yet
EXERCICES ON PHP PROGRAMMING LANGUAGE
2 pages
C++ Lab Manual
100% (2)
C++ Lab Manual
115 pages
U3L10 Practice PT - Design A Digital Scene - Project and Programming Rubric
No ratings yet
U3L10 Practice PT - Design A Digital Scene - Project and Programming Rubric
1 page
Week 01 Lab 01 Rectangle
No ratings yet
Week 01 Lab 01 Rectangle
4 pages
Java Collection Framework-1
No ratings yet
Java Collection Framework-1
15 pages
Aop in Spring
No ratings yet
Aop in Spring
9 pages
Kadi Sarva Vishwavidhyalaya: LDRP Institute of Technology and Research
No ratings yet
Kadi Sarva Vishwavidhyalaya: LDRP Institute of Technology and Research
4 pages
Exits Badi
100% (1)
Exits Badi
107 pages
Code Program Keypad Karya Ing
No ratings yet
Code Program Keypad Karya Ing
3 pages
BÀI TẬP COUNTER TRONG GIÁO TRÌNH
No ratings yet
BÀI TẬP COUNTER TRONG GIÁO TRÌNH
2 pages
Java Theory Questions
No ratings yet
Java Theory Questions
3 pages
CSE4006: Software Engineering
No ratings yet
CSE4006: Software Engineering
20 pages
C++ Functions: Defining A Function
No ratings yet
C++ Functions: Defining A Function
4 pages
cs304 Short
No ratings yet
cs304 Short
23 pages
Que 1. Create A Snake and Ladder Game: Iostream Cstdio Time.h Stdlib.h Conio.h Time.h Ctype.h Time.h Windows.h Process.h
No ratings yet
Que 1. Create A Snake and Ladder Game: Iostream Cstdio Time.h Stdlib.h Conio.h Time.h Ctype.h Time.h Windows.h Process.h
14 pages
Coding Courses
No ratings yet
Coding Courses
11 pages
Python - 04-10-22
No ratings yet
Python - 04-10-22
2 pages
Jquery - Object Oriented Programming
No ratings yet
Jquery - Object Oriented Programming
6 pages
Clojure PDF
No ratings yet
Clojure PDF
1,801 pages
Lab 8 Oop
No ratings yet
Lab 8 Oop
16 pages
Lesson 2: Object Oriented Analysis and Design
No ratings yet
Lesson 2: Object Oriented Analysis and Design
4 pages
PDF 5142
No ratings yet
PDF 5142
61 pages
10 Spring Boot 3 Aop
No ratings yet
10 Spring Boot 3 Aop
198 pages
Interview-level-QA-on-C Language
No ratings yet
Interview-level-QA-on-C Language
41 pages
Log
No ratings yet
Log
7 pages
Css Midterm 1 Part 1
No ratings yet
Css Midterm 1 Part 1
9 pages
Core Java-Mcqs
No ratings yet
Core Java-Mcqs
65 pages
Hcxhash 2 Cap
No ratings yet
Hcxhash 2 Cap
32 pages
Unit 3
No ratings yet
Unit 3
23 pages
User Defined Methods or Functions Grade 10 B
No ratings yet
User Defined Methods or Functions Grade 10 B
13 pages

TP1

Uploaded by

TP1

Uploaded by

Mohammed VI Polytechnic University

TP1 - Optimizing Memory Access

a = malloc(N * MAX_STRIDE * sizeof(double ));

for (int i = 0; i < N * MAX_STRIDE; i++)

for (int i_stride = 1; i_stride <= MAX_STRIDE; i_stride ++)

for (int i = 0; i < N * i_stride; i += i_stride)

end = (double)clock () / CLOCKS_PER_SEC;

msec = (end - start) * 1000.0; // Time in milliseconds

printf("%d,␣%f,␣%f,␣%f\n", i_stride , sum , msec , rate );

• Compile the program with O2 (with level 2 optimization):

– Loop optimizations: Loop unrolling (partially).

– Instruction scheduling: Reordering instructions to improve pipeline eﬀiciency.

Execution and Analysis

Execution and Analysis

• Follow this structure for blocking:

– Divide matrices A, B, and C into blocks of size B x B.

Exercise 4: Memory Management and Debugging with Valgrind

Code to Analyze (memory_debug.c)

// Function to allocate an array of integers

// Function to initialize the array with values

// Function to print the array

// Function to create a duplicate of the array

int *copy = (int *) malloc(size * sizeof(int ));

// Function to free the allocated memory (deliberate memory leak left)

int *array = allocate_array(SIZE );

// Creating a duplicate array

// Free memory (deliberate error: forgetting to free `array_copy `)

return 0; // Memory leak on purpose

Compilation and Execution

• Run the program using Valgrind to check for memory leaks:

• Use Valgrind’s Memcheck tool to detect memory leaks.

You might also like

int copy = (int ) malloc(size * sizeof(int ));