SlideShare a Scribd company logo
Parallel Programming Basics
Jimmy Hu
Target Audience
 People interests parallel programming topic
 People wants to know how to improve the performance of their code
 People wants to know how to acquire the (possible) peak performance from their computer
(There are a bunch of techniques / methods available for reaching peak performance and this
kind of things is out of the range of our discussion)
 Someone wants to know the way that I am using my computers / servers (X
2
Outline
 Why parallel programming?
 What is parallel programming?
 How to perform parallel programming (in C++ / Matlab / C#)
 Conclusion / Further Discussions
3
Why Parallel Programming?
4
 Please check the following C++ code, what’s the output?
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
Why Parallel Programming?
5
 Answer of the question “Please check the following C++ code, what’s the output?”
// https://godbolt.org/z/fb7TdT495
// https://godbolt.org/z/3Kj1azb4h
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
4, 5, 6,
Why Parallel Programming?
6
 Code Structure
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
Part 1: Variable Initialization
Why Parallel Programming?
7
 Code Structure
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
Part 2: Data Processing / Calculation
Why Parallel Programming?
8
 Code Structure
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
Part 3: Output
Why Parallel Programming?
9
 In the mentioned simple example, the calculating part is simple add operation
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
4, 5, 6,
Why Parallel Programming?
10
 What happened in the case of more complicated operation?
int main()
{
std::vector<Frame> image_frames =
{img1, img2, img3};
auto results = std::vector<Features>(3);
for(int i = 0; i < std::ranges::size(image_frames); ++i)
{
results[i] = feature_extraction(image_frames[i]);
}
…
return 0;
}
This example is calling a function
which named “feature_extraction”.
Why Parallel Programming?
11
Without Parallel Programming With Parallel Programming
Dish A
Dish B
Dish C
…
Dish A Dish B Dish C
Icon is from https://www.hiclipart.com/free-transparent-background-png-clipart-iuxpq/download
Why Parallel Programming?
12
Without Parallel Programming With Parallel Programming
Task A
Task B
Task C
…
Task A Task B Task C
Icon is from https://www.flaticon.com/free-icon/cpu_1250593
The Steps of Execution
13
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
 Let’s review the previous simple case. How’s the program is executed?
The Steps of Execution
14
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
1 2 3
test_vector
The Steps of Execution
15
1 2 3
test_vector
3
a
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Steps of Execution
16
1 2 3
test_vector
3
a
 Then, the execution runs sequentially?
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Steps of Execution
17
1 2 3
test_vector
3
a
 Then, the execution runs sequentially?
= 4
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Steps of Execution
18
4 2 3
test_vector
3
a
 Then, the execution runs sequentially?
= 5
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Steps of Execution
19
4 5 3
test_vector
3
a
 Then, the execution runs sequentially?
= 6
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Steps of Execution
20
4 5 6
test_vector
3
a
 Then, the execution runs sequentially?
int main()
{
std::vector<int> test_vector = {1, 2, 3};
int a = 3;
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < std::ranges::size(test_vector); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
The Concept of Parallelization
21
1 2 3
test_vector
3
 Despite the way of sequentialization, is it possible to speed up?
 Why not let’s make the program runs parallelly (enable the operations run simultaneously)?
a 3
3
New
test_vector 4 5 6
The Concept of Parallelization
22
1 2 3
test_vector
3
 How this can be done in our program? Solution: Parallel Programming!
a 3
3
New
test_vector 4 5 6
The Concept of Parallelization
23
 Parallelization enabling
 Tools in C++:
- OpenMP
- TBB(Threading Building Blocks)
- std::thread
- Execution Policy in STL
 Tools in Matlab
 Tools in C#
Parallelization Implementation
24
 Parallelization enabling with OpenMP #include <omp.h>
int main()
{
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
#pragma omp parallel for
for(int i = 0; i < test_vector.size(); i++)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
test_vector[0] =
test_vector[0] + a;
test_vector[1] =
test_vector[1] + a;
test_vector[2] =
test_vector[2] + a;
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
Parallelization Implementation
25
 Parallelization enabling with OpenMP / TBB
// https://godbolt.org/z/szMc4jbqn
// https://godbolt.org/z/haM1qd6eY
#include <omp.h>
int main()
{
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
#pragma omp parallel for
for(int i = 0; i < test_vector.size(); ++i)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
// https://godbolt.org/z/dcssoWj8K
#include <tbb/parallel_for.h>
int main()
{
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
tbb::parallel_for( tbb::blocked_range<int>(0,test_vector.size()),
[&](tbb::blocked_range<int> r)
{
for (int i=r.begin(); i<r.end(); ++i)
{
test_vector[i] = test_vector[i] + a;
}
});
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
Parallelization Implementation
26
 Parallelization enabling with OpenMP / TBB
// https://godbolt.org/z/haM1qd6eY
#include <omp.h>
int main()
{
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
#pragma omp parallel for
for(int i = 0; i < test_vector.size(); i++)
{
test_vector[i] = test_vector[i] + a;
}
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
// https://godbolt.org/z/dcssoWj8K
#include <tbb/parallel_for.h>
int main()
{
auto test_vector = std::vector<int>{1, 2, 3};
int a = 3;
tbb::parallel_for( tbb::blocked_range<int>(0,test_vector.size()),
[&](tbb::blocked_range<int> r)
{
for (int i=r.begin(); i<r.end(); ++i)
{
test_vector[i] = test_vector[i] + a;
}
});
for(int i = 0; i < test_vector.size(); ++i)
{
std::cout << test_vector[i] << ", ";
}
return 0;
}
A lambda function is here!
Parallelization Methods Comparison
27
 Comparing OpenMP / TBB and std::thread
// The following code is an example of std::thread
// https://stackoverflow.com/a/11229853/6667035
// https://godbolt.org/z/YeY9d4EeP
#include <string>
#include <iostream>
#include <numeric>
#include <thread>
#include <vector>
void task1(std::vector<int> input) // The function we want to execute on the new thread.
{
for(int i = 0; i < input.size(); i++)
{
std::cout << "output from task1 function: " << input[i];
}
}
void function1()
{
auto test_vector1 = std::vector<int>(100);
std::iota(test_vector1.begin(), test_vector1.end(), 1);
int sum = 0;
for(int i = 0; i < test_vector1.size(); i++)
{
sum += test_vector1[i];
}
std::cout << sum << "n”;
}
int main()
{
auto test_vector = std::vector<int>(100);
std::iota(test_vector.begin(), test_vector.end(), 1);
std::thread t1(task1, test_vector);
function1();
t1.join();
return 0;
}
std::thread Concept
28
 Comparing OpenMP / TBB and std::thread
main function
task1 function
function1 function
Parallel Part
end
Execution Policy in STL
29
 When it comes to Execution Policy after C++17…
 std::execution::par
 std::execution::seq
// https://en.cppreference.com/w/cpp/algorithm/transform
// https://godbolt.org/z/bY14q1z3K
#include <algorithm>
#include <execution>
#include <iomanip>
#include <iostream>
#include <string>
#include <thread>
int main()
{
std::string g {"hello"};
std::for_each(std::execution::par, g.begin(), g.end(), [](char& c) // modify in-place
{
c = std::toupper(static_cast<unsigned char>(c));
});
std::cout << "g = " << std::quoted(g) << 'n';
return 0;
}
Parallelization in Matlab
30
 Document of parfor function usage
Parallelization in Matlab
31
 parfor function usage example
Program without parfor Program with parfor
// https://www.mathworks.com/help/parallel-
computing/parfor.html
tic
n = 200;
A = 500;
a = zeros(1,n);
for i = 1:n
a(i) = max(abs(eig(rand(A))));
end
toc
Elapsed time is 31.935373 seconds.
tic
n = 200;
A = 500;
a = zeros(1,n);
parfor i = 1:n
a(i) = max(abs(eig(rand(A))));
end
toc
Elapsed time is 10.760068 seconds.
Parallelization in C#
32
 Document of Parallel.For function usage: https://learn.microsoft.com/en-
us/dotnet/standard/parallel-programming/data-parallelism-task-parallel-library
Parallelization in C#
33
 Parallel.For function usage example
Program without Parallel.For Program with Parallel.For
using System;
using System.Threading.Tasks;
public class ParallelTest
{
public static void Main(string[] args)
{
for(int i = 0; i < 10; i++)
{
Console.WriteLine (i + "n");
};
}
}
using System;
using System.Threading.Tasks;
public class ParallelTest
{
public static void Main(string[] args)
{
Parallel.For(0, 10, i =>
{
Console.WriteLine (i + "n");
}); // Parallel.For
}
}
A lambda function is here!
Parallelization in C#
34
Concept of Parallelable
35
 Please think that what’s the limitation of Parallelization
Concept of Parallelable
36
 Please think that what’s the limitation of Parallelization
Answer: The limitation of parallelization is that the operation which is to be parallelize
should be independent!
What’s the meaning of independent?
Concept of Parallelable
37
 Please think that what’s the limitation of Parallelization
Answer: The limitation of parallelization is that the operation which is to be parallelize
should be independent!
What’s the meaning of independent?
Let’s check the case of dependent first:
A B C
Concept of Parallelable
38
 Please think that what’s the limitation of Parallelization
Answer: The limitation of parallelization is that the operation which is to be parallelize
should be independent!
What’s the meaning of independent?
Let’s check the case of dependent first:
The A, B and C operations cannot be made in parallelization!
Because B operation needs the output from A and
C operation needs the output from B!
A B C
Conclusion / Further Discussions
39
 Parallelization technique can bring some performance increment when you use it
properly
 Parallelization can make higher utilization of computers / computing devices
 Is there any disadvantage of using parallelization method?
Conclusion / Further Discussions
40
 Parallelization technique can bring some performance increment when you use it
properly
 Parallelization can make higher utilization of computers / computing devices
 Is there any disadvantage of using parallelization method?
 Memory usage issue
Ad

More Related Content

Similar to ParallelProgrammingBasics_v2.pdf (20)

How to write clean tests
How to write clean testsHow to write clean tests
How to write clean tests
Danylenko Max
 
Pro Java Fx – Developing Enterprise Applications
Pro Java Fx – Developing Enterprise ApplicationsPro Java Fx – Developing Enterprise Applications
Pro Java Fx – Developing Enterprise Applications
Stephen Chin
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
Andrey Karpov
 
CP 04.pptx
CP 04.pptxCP 04.pptx
CP 04.pptx
RehmanRasheed3
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy code
ShriKant Vashishtha
 
Some stuff about C++ and development
Some stuff about C++ and developmentSome stuff about C++ and development
Some stuff about C++ and development
Jon Jagger
 
OOP program questions with answers
OOP program questions with answersOOP program questions with answers
OOP program questions with answers
Quratulain Naqvi
 
Ch7
Ch7Ch7
Ch7
sanya6900
 
Ch7
Ch7Ch7
Ch7
Ramesh Ankathi
 
Oops lab manual2
Oops lab manual2Oops lab manual2
Oops lab manual2
Mouna Guru
 
Object oriented programming system with C++
Object oriented programming system with C++Object oriented programming system with C++
Object oriented programming system with C++
msharshitha03s
 
Lecture2.ppt
Lecture2.pptLecture2.ppt
Lecture2.ppt
TarekHemdan3
 
How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
Egor Bogatov
 
Array programs
Array programsArray programs
Array programs
ALI RAZA
 
Object Oriented Design and Programming Unit-04
Object Oriented Design and Programming Unit-04Object Oriented Design and Programming Unit-04
Object Oriented Design and Programming Unit-04
Sivakumar M
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
application developer
 
C and Data structure lab manual ECE (2).pdf
C and Data structure lab manual ECE (2).pdfC and Data structure lab manual ECE (2).pdf
C and Data structure lab manual ECE (2).pdf
janakim15
 
Code quailty metrics demystified
Code quailty metrics demystifiedCode quailty metrics demystified
Code quailty metrics demystified
Jeroen Resoort
 
L3. Operators in JS, CSE 202, BN11.pdf JavaScript
L3. Operators in JS, CSE 202, BN11.pdf JavaScriptL3. Operators in JS, CSE 202, BN11.pdf JavaScript
L3. Operators in JS, CSE 202, BN11.pdf JavaScript
SauravBarua11
 
What's New in C++ 11/14?
What's New in C++ 11/14?What's New in C++ 11/14?
What's New in C++ 11/14?
Dina Goldshtein
 
How to write clean tests
How to write clean testsHow to write clean tests
How to write clean tests
Danylenko Max
 
Pro Java Fx – Developing Enterprise Applications
Pro Java Fx – Developing Enterprise ApplicationsPro Java Fx – Developing Enterprise Applications
Pro Java Fx – Developing Enterprise Applications
Stephen Chin
 
C++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical ReviewerC++ Code as Seen by a Hypercritical Reviewer
C++ Code as Seen by a Hypercritical Reviewer
Andrey Karpov
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy code
ShriKant Vashishtha
 
Some stuff about C++ and development
Some stuff about C++ and developmentSome stuff about C++ and development
Some stuff about C++ and development
Jon Jagger
 
OOP program questions with answers
OOP program questions with answersOOP program questions with answers
OOP program questions with answers
Quratulain Naqvi
 
Oops lab manual2
Oops lab manual2Oops lab manual2
Oops lab manual2
Mouna Guru
 
Object oriented programming system with C++
Object oriented programming system with C++Object oriented programming system with C++
Object oriented programming system with C++
msharshitha03s
 
How to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJITHow to add an optimization for C# to RyuJIT
How to add an optimization for C# to RyuJIT
Egor Bogatov
 
Array programs
Array programsArray programs
Array programs
ALI RAZA
 
Object Oriented Design and Programming Unit-04
Object Oriented Design and Programming Unit-04Object Oriented Design and Programming Unit-04
Object Oriented Design and Programming Unit-04
Sivakumar M
 
C and Data structure lab manual ECE (2).pdf
C and Data structure lab manual ECE (2).pdfC and Data structure lab manual ECE (2).pdf
C and Data structure lab manual ECE (2).pdf
janakim15
 
Code quailty metrics demystified
Code quailty metrics demystifiedCode quailty metrics demystified
Code quailty metrics demystified
Jeroen Resoort
 
L3. Operators in JS, CSE 202, BN11.pdf JavaScript
L3. Operators in JS, CSE 202, BN11.pdf JavaScriptL3. Operators in JS, CSE 202, BN11.pdf JavaScript
L3. Operators in JS, CSE 202, BN11.pdf JavaScript
SauravBarua11
 
What's New in C++ 11/14?
What's New in C++ 11/14?What's New in C++ 11/14?
What's New in C++ 11/14?
Dina Goldshtein
 

More from Chen-Hung Hu (12)

淺談電腦檔案系統概念
淺談電腦檔案系統概念淺談電腦檔案系統概念
淺談電腦檔案系統概念
Chen-Hung Hu
 
【智慧核心-CPU】第三節:負數、小數的修正機制
【智慧核心-CPU】第三節:負數、小數的修正機制【智慧核心-CPU】第三節:負數、小數的修正機制
【智慧核心-CPU】第三節:負數、小數的修正機制
Chen-Hung Hu
 
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
Chen-Hung Hu
 
漫談七段顯示器
漫談七段顯示器漫談七段顯示器
漫談七段顯示器
Chen-Hung Hu
 
BJT Transistor分壓偏壓電路分析
BJT Transistor分壓偏壓電路分析BJT Transistor分壓偏壓電路分析
BJT Transistor分壓偏壓電路分析
Chen-Hung Hu
 
淺談類比-數位轉換器
淺談類比-數位轉換器淺談類比-數位轉換器
淺談類比-數位轉換器
Chen-Hung Hu
 
感光元件及其相關迴路之研究 --以光敏電阻為例
感光元件及其相關迴路之研究 --以光敏電阻為例感光元件及其相關迴路之研究 --以光敏電阻為例
感光元件及其相關迴路之研究 --以光敏電阻為例
Chen-Hung Hu
 
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
Chen-Hung Hu
 
Adc0804及其相關迴路之研究
Adc0804及其相關迴路之研究Adc0804及其相關迴路之研究
Adc0804及其相關迴路之研究
Chen-Hung Hu
 
可調式電源供應器之研究
可調式電源供應器之研究可調式電源供應器之研究
可調式電源供應器之研究
Chen-Hung Hu
 
HC 05藍芽模組連線
HC 05藍芽模組連線HC 05藍芽模組連線
HC 05藍芽模組連線
Chen-Hung Hu
 
自動功因改善裝置之研究
自動功因改善裝置之研究自動功因改善裝置之研究
自動功因改善裝置之研究
Chen-Hung Hu
 
淺談電腦檔案系統概念
淺談電腦檔案系統概念淺談電腦檔案系統概念
淺談電腦檔案系統概念
Chen-Hung Hu
 
【智慧核心-CPU】第三節:負數、小數的修正機制
【智慧核心-CPU】第三節:負數、小數的修正機制【智慧核心-CPU】第三節:負數、小數的修正機制
【智慧核心-CPU】第三節:負數、小數的修正機制
Chen-Hung Hu
 
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
【智慧核心-CPU】第二節:正整數進位制的轉換-編碼
Chen-Hung Hu
 
漫談七段顯示器
漫談七段顯示器漫談七段顯示器
漫談七段顯示器
Chen-Hung Hu
 
BJT Transistor分壓偏壓電路分析
BJT Transistor分壓偏壓電路分析BJT Transistor分壓偏壓電路分析
BJT Transistor分壓偏壓電路分析
Chen-Hung Hu
 
淺談類比-數位轉換器
淺談類比-數位轉換器淺談類比-數位轉換器
淺談類比-數位轉換器
Chen-Hung Hu
 
感光元件及其相關迴路之研究 --以光敏電阻為例
感光元件及其相關迴路之研究 --以光敏電阻為例感光元件及其相關迴路之研究 --以光敏電阻為例
感光元件及其相關迴路之研究 --以光敏電阻為例
Chen-Hung Hu
 
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
穩壓元件及其相關迴路之研究 --以可調式輸出電源供應器為例
Chen-Hung Hu
 
Adc0804及其相關迴路之研究
Adc0804及其相關迴路之研究Adc0804及其相關迴路之研究
Adc0804及其相關迴路之研究
Chen-Hung Hu
 
可調式電源供應器之研究
可調式電源供應器之研究可調式電源供應器之研究
可調式電源供應器之研究
Chen-Hung Hu
 
HC 05藍芽模組連線
HC 05藍芽模組連線HC 05藍芽模組連線
HC 05藍芽模組連線
Chen-Hung Hu
 
自動功因改善裝置之研究
自動功因改善裝置之研究自動功因改善裝置之研究
自動功因改善裝置之研究
Chen-Hung Hu
 
Ad

Recently uploaded (20)

Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIHlecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
Abodahab
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIHlecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
Abodahab
 
introduction to machine learining for beginers
introduction to machine learining for beginersintroduction to machine learining for beginers
introduction to machine learining for beginers
JoydebSheet
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Ad

ParallelProgrammingBasics_v2.pdf

  • 2. Target Audience  People interests parallel programming topic  People wants to know how to improve the performance of their code  People wants to know how to acquire the (possible) peak performance from their computer (There are a bunch of techniques / methods available for reaching peak performance and this kind of things is out of the range of our discussion)  Someone wants to know the way that I am using my computers / servers (X 2
  • 3. Outline  Why parallel programming?  What is parallel programming?  How to perform parallel programming (in C++ / Matlab / C#)  Conclusion / Further Discussions 3
  • 4. Why Parallel Programming? 4  Please check the following C++ code, what’s the output? int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 5. Why Parallel Programming? 5  Answer of the question “Please check the following C++ code, what’s the output?” // https://godbolt.org/z/fb7TdT495 // https://godbolt.org/z/3Kj1azb4h int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } 4, 5, 6,
  • 6. Why Parallel Programming? 6  Code Structure int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } Part 1: Variable Initialization
  • 7. Why Parallel Programming? 7  Code Structure int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } Part 2: Data Processing / Calculation
  • 8. Why Parallel Programming? 8  Code Structure int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } Part 3: Output
  • 9. Why Parallel Programming? 9  In the mentioned simple example, the calculating part is simple add operation int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } 4, 5, 6,
  • 10. Why Parallel Programming? 10  What happened in the case of more complicated operation? int main() { std::vector<Frame> image_frames = {img1, img2, img3}; auto results = std::vector<Features>(3); for(int i = 0; i < std::ranges::size(image_frames); ++i) { results[i] = feature_extraction(image_frames[i]); } … return 0; } This example is calling a function which named “feature_extraction”.
  • 11. Why Parallel Programming? 11 Without Parallel Programming With Parallel Programming Dish A Dish B Dish C … Dish A Dish B Dish C Icon is from https://www.hiclipart.com/free-transparent-background-png-clipart-iuxpq/download
  • 12. Why Parallel Programming? 12 Without Parallel Programming With Parallel Programming Task A Task B Task C … Task A Task B Task C Icon is from https://www.flaticon.com/free-icon/cpu_1250593
  • 13. The Steps of Execution 13 int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }  Let’s review the previous simple case. How’s the program is executed?
  • 14. The Steps of Execution 14 int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; } 1 2 3 test_vector
  • 15. The Steps of Execution 15 1 2 3 test_vector 3 a int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 16. The Steps of Execution 16 1 2 3 test_vector 3 a  Then, the execution runs sequentially? int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 17. The Steps of Execution 17 1 2 3 test_vector 3 a  Then, the execution runs sequentially? = 4 int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 18. The Steps of Execution 18 4 2 3 test_vector 3 a  Then, the execution runs sequentially? = 5 int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 19. The Steps of Execution 19 4 5 3 test_vector 3 a  Then, the execution runs sequentially? = 6 int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 20. The Steps of Execution 20 4 5 6 test_vector 3 a  Then, the execution runs sequentially? int main() { std::vector<int> test_vector = {1, 2, 3}; int a = 3; for(int i = 0; i < std::ranges::size(test_vector); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < std::ranges::size(test_vector); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 21. The Concept of Parallelization 21 1 2 3 test_vector 3  Despite the way of sequentialization, is it possible to speed up?  Why not let’s make the program runs parallelly (enable the operations run simultaneously)? a 3 3 New test_vector 4 5 6
  • 22. The Concept of Parallelization 22 1 2 3 test_vector 3  How this can be done in our program? Solution: Parallel Programming! a 3 3 New test_vector 4 5 6
  • 23. The Concept of Parallelization 23  Parallelization enabling  Tools in C++: - OpenMP - TBB(Threading Building Blocks) - std::thread - Execution Policy in STL  Tools in Matlab  Tools in C#
  • 24. Parallelization Implementation 24  Parallelization enabling with OpenMP #include <omp.h> int main() { auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; #pragma omp parallel for for(int i = 0; i < test_vector.size(); i++) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0; } auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; test_vector[0] = test_vector[0] + a; test_vector[1] = test_vector[1] + a; test_vector[2] = test_vector[2] + a; for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0;
  • 25. Parallelization Implementation 25  Parallelization enabling with OpenMP / TBB // https://godbolt.org/z/szMc4jbqn // https://godbolt.org/z/haM1qd6eY #include <omp.h> int main() { auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; #pragma omp parallel for for(int i = 0; i < test_vector.size(); ++i) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0; } // https://godbolt.org/z/dcssoWj8K #include <tbb/parallel_for.h> int main() { auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; tbb::parallel_for( tbb::blocked_range<int>(0,test_vector.size()), [&](tbb::blocked_range<int> r) { for (int i=r.begin(); i<r.end(); ++i) { test_vector[i] = test_vector[i] + a; } }); for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0; }
  • 26. Parallelization Implementation 26  Parallelization enabling with OpenMP / TBB // https://godbolt.org/z/haM1qd6eY #include <omp.h> int main() { auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; #pragma omp parallel for for(int i = 0; i < test_vector.size(); i++) { test_vector[i] = test_vector[i] + a; } for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0; } // https://godbolt.org/z/dcssoWj8K #include <tbb/parallel_for.h> int main() { auto test_vector = std::vector<int>{1, 2, 3}; int a = 3; tbb::parallel_for( tbb::blocked_range<int>(0,test_vector.size()), [&](tbb::blocked_range<int> r) { for (int i=r.begin(); i<r.end(); ++i) { test_vector[i] = test_vector[i] + a; } }); for(int i = 0; i < test_vector.size(); ++i) { std::cout << test_vector[i] << ", "; } return 0; } A lambda function is here!
  • 27. Parallelization Methods Comparison 27  Comparing OpenMP / TBB and std::thread // The following code is an example of std::thread // https://stackoverflow.com/a/11229853/6667035 // https://godbolt.org/z/YeY9d4EeP #include <string> #include <iostream> #include <numeric> #include <thread> #include <vector> void task1(std::vector<int> input) // The function we want to execute on the new thread. { for(int i = 0; i < input.size(); i++) { std::cout << "output from task1 function: " << input[i]; } } void function1() { auto test_vector1 = std::vector<int>(100); std::iota(test_vector1.begin(), test_vector1.end(), 1); int sum = 0; for(int i = 0; i < test_vector1.size(); i++) { sum += test_vector1[i]; } std::cout << sum << "n”; } int main() { auto test_vector = std::vector<int>(100); std::iota(test_vector.begin(), test_vector.end(), 1); std::thread t1(task1, test_vector); function1(); t1.join(); return 0; }
  • 28. std::thread Concept 28  Comparing OpenMP / TBB and std::thread main function task1 function function1 function Parallel Part end
  • 29. Execution Policy in STL 29  When it comes to Execution Policy after C++17…  std::execution::par  std::execution::seq // https://en.cppreference.com/w/cpp/algorithm/transform // https://godbolt.org/z/bY14q1z3K #include <algorithm> #include <execution> #include <iomanip> #include <iostream> #include <string> #include <thread> int main() { std::string g {"hello"}; std::for_each(std::execution::par, g.begin(), g.end(), [](char& c) // modify in-place { c = std::toupper(static_cast<unsigned char>(c)); }); std::cout << "g = " << std::quoted(g) << 'n'; return 0; }
  • 30. Parallelization in Matlab 30  Document of parfor function usage
  • 31. Parallelization in Matlab 31  parfor function usage example Program without parfor Program with parfor // https://www.mathworks.com/help/parallel- computing/parfor.html tic n = 200; A = 500; a = zeros(1,n); for i = 1:n a(i) = max(abs(eig(rand(A)))); end toc Elapsed time is 31.935373 seconds. tic n = 200; A = 500; a = zeros(1,n); parfor i = 1:n a(i) = max(abs(eig(rand(A)))); end toc Elapsed time is 10.760068 seconds.
  • 32. Parallelization in C# 32  Document of Parallel.For function usage: https://learn.microsoft.com/en- us/dotnet/standard/parallel-programming/data-parallelism-task-parallel-library
  • 33. Parallelization in C# 33  Parallel.For function usage example Program without Parallel.For Program with Parallel.For using System; using System.Threading.Tasks; public class ParallelTest { public static void Main(string[] args) { for(int i = 0; i < 10; i++) { Console.WriteLine (i + "n"); }; } } using System; using System.Threading.Tasks; public class ParallelTest { public static void Main(string[] args) { Parallel.For(0, 10, i => { Console.WriteLine (i + "n"); }); // Parallel.For } } A lambda function is here!
  • 35. Concept of Parallelable 35  Please think that what’s the limitation of Parallelization
  • 36. Concept of Parallelable 36  Please think that what’s the limitation of Parallelization Answer: The limitation of parallelization is that the operation which is to be parallelize should be independent! What’s the meaning of independent?
  • 37. Concept of Parallelable 37  Please think that what’s the limitation of Parallelization Answer: The limitation of parallelization is that the operation which is to be parallelize should be independent! What’s the meaning of independent? Let’s check the case of dependent first: A B C
  • 38. Concept of Parallelable 38  Please think that what’s the limitation of Parallelization Answer: The limitation of parallelization is that the operation which is to be parallelize should be independent! What’s the meaning of independent? Let’s check the case of dependent first: The A, B and C operations cannot be made in parallelization! Because B operation needs the output from A and C operation needs the output from B! A B C
  • 39. Conclusion / Further Discussions 39  Parallelization technique can bring some performance increment when you use it properly  Parallelization can make higher utilization of computers / computing devices  Is there any disadvantage of using parallelization method?
  • 40. Conclusion / Further Discussions 40  Parallelization technique can bring some performance increment when you use it properly  Parallelization can make higher utilization of computers / computing devices  Is there any disadvantage of using parallelization method?  Memory usage issue