SlideShare a Scribd company logo
Multicore   Birgit Plötzeneder, 11/24/10
Intro (Why?) Architecture Languages OMP MPI Tools
Darling,  I shrunk the  computer.  * copyright by Prof. Erik Hagersten /   Uppsala, who does   awesome work Signal propagation delay » transistor delay Not enough ILP  for more transistors Power consumption
O RLY? You want FASTER code. NOW. - prefetching  - high comp load  - image/video  - fun
 
Intel Core 2 Quad
AMD  Shanghai (K10)
Intel  Dunnington (Xeon 74xx)
Intel  i7
AMD  Magny Cours
The Secret..
Moving from  1 core  to   4 cores  can give you a factor of
Moving from m emory  to   L 1   can give you a factor of
Disabling  the L2  cache  will  reduce  system performance   more than   disabling  a second CPU core    of a dual-core processor …
* see Iris Christadler, LRZ
OMP and MPI
Program start : only  master   thread  runs Parallel region : team of worker  threads is generated (“fork”)  Threads  synchronize  when leaving parallel region (“join”)  OpenMP-Concept
A First Program
Work-sharing constructs omp for or omp do sections  single  master
Data sharing attribute clauses shared :  visible and accessible by all threads simultaneously. Default  (!i) . a[i]=a[i-1].. private :  each thread will have a local copy, value is not maintained for use outside  firstprivate :  like private except initialized to original value. lastprivate :  like private except original value is updated after construct. reduction  (->reduction ops)
Scheduling clauses   schedule(type, chunk): static dynamic guided
Other clauses critical :  executed by only one thread at a time atomic :  similar to critical section, but may be  better ordered :  executed in the order in which iterations would be executed in a sequential loop b arrier nowait
Using clauses
 
MPI-Concept mpicc <options>  prog.c mpirun  -arch <architecture>  -np<np>  prog
MPI
MPI
MPI program: 6 basic calls MPI messages Communicators MPI MPI_INIT MPI_COMM_RANK MPI_COMM_SIZE MPI_SEND MPI_RECV MPI_FINALIZE data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)
Communication  modes Collective vs P2P One2All, All2All, All2One Blocking / Nonblocking Synchronous / Asynchronous
Communication  modes synchronous mode (&quot;safest&quot;): Is the receiver ready? ready mode (lowest system overhead)- only if there is a receiver waiting (streaming) buffered mode (decouples sender from receiver), buffer size, buffer attachment! standard mode
  Communication Mode Blocking  Routines Non-Blocking Routines   synchronous MPI_SSEND MPI_ISSEND   ready MPI_RSEND MPI_IRSEND   buffered MPI_BSEND MPI_IBSEND   standard MPI_SEND MPI_ISEND   MPI_RECV MPI_IRECV   MPI_SENDRECV   MPI_SENDRECV_REPLACE
Collective   communication Barrier Broadcast Gather Scatter Reduction
gpro f valgrind PAPI
PAPI PAPI  is  a library  that monitors hardware events when a program runs.  Papiex  is  a tool  that makes it easy to  get access to performance counters using PAPI .* * http://icl.cs.utk.edu/papi / papiex –e  <EVENT>  ./my_prog  (to turn of optimizations (use the flag  -O0) for some tests)
Profilers Two Types Statistical   Profilers Event Based Profilers Statistical Profiling : Interrupts at  random intervals  and records which program instruction the CPU is executing. Event Based Profiling : Interrupts triggered by hardware counter events are recorded. Measuring profiles affects performance. Still a lot of data saved.
Tracing Wrappers for function calls (for example MPI_Recv) Records  when  a function was called and  with what parameters Which nodes exchanged messages, message size… Can affect performance
Intel tracing tools Marmot   MPI correctness and portability checker  MpiP   -  http://mpip.sourceforge.net/
Extrae + Paraver module add paraver mpi2prv  -f  TRACE.mpits -o MPImatrix.prv v Scalasca Screenshots and examples of profilers/tracing tools  available – but not on the internet. v
This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via  [email_address] You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me accordingly.  Some of the code was copy-pasted from Wikipedia. I've removed copy-right problematic parts.
Ad

More Related Content

What's hot (20)

OpenMp
OpenMpOpenMp
OpenMp
Neel Bhad
 
MPI in TNT for parallel processing
MPI in TNT for parallel processingMPI in TNT for parallel processing
MPI in TNT for parallel processing
Martín Morales
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
Akhila Prabhakaran
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
Akhila Prabhakaran
 
openmp
openmpopenmp
openmp
Neel Bhad
 
Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
jbp4444
 
Move Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelMove Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next Level
Intel® Software
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
OpenEBS
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
stan_seibert
 
OpenMP
OpenMPOpenMP
OpenMP
Eric Cheng
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
Akhila Prabhakaran
 
ORTE - OCERA Real Time ethernet
ORTE - OCERA Real Time ethernetORTE - OCERA Real Time ethernet
ORTE - OCERA Real Time ethernet
Alexandre Chatiron
 
Open mp
Open mpOpen mp
Open mp
Gopi Saiteja
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
Oleg Nazarevych
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
Prabhakaran V M
 
OpenMP
OpenMPOpenMP
OpenMP
mohammadradpour
 
Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
UTD Computer Security Group
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
Marina Kolpakova
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
Dhanashree Prasad
 
MPI in TNT for parallel processing
MPI in TNT for parallel processingMPI in TNT for parallel processing
MPI in TNT for parallel processing
Martín Morales
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
Akhila Prabhakaran
 
Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
jbp4444
 
Move Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next LevelMove Message Passing Interface Applications to the Next Level
Move Message Passing Interface Applications to the Next Level
Intel® Software
 
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
Dynamic Instrumentation- OpenEBS Golang Meetup July 2017
OpenEBS
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
ORTE - OCERA Real Time ethernet
ORTE - OCERA Real Time ethernetORTE - OCERA Real Time ethernet
ORTE - OCERA Real Time ethernet
Alexandre Chatiron
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
Marina Kolpakova
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
Dhanashree Prasad
 

Viewers also liked (20)

Chapt 01 Assembly Language
Chapt 01 Assembly LanguageChapt 01 Assembly Language
Chapt 01 Assembly Language
Hamza Akram
 
F-F-Fiddle Assembly Instructions
F-F-Fiddle Assembly InstructionsF-F-Fiddle Assembly Instructions
F-F-Fiddle Assembly Instructions
dsp39
 
Lec 04 intro assembly
Lec 04 intro assemblyLec 04 intro assembly
Lec 04 intro assembly
Abdul Khan
 
IMSDB COMMAND CODES - PROC OPTIONS
IMSDB COMMAND CODES - PROC OPTIONSIMSDB COMMAND CODES - PROC OPTIONS
IMSDB COMMAND CODES - PROC OPTIONS
Srinimf-Slides
 
Mips1
Mips1Mips1
Mips1
Stefano Salvatori
 
Session 1
Session 1Session 1
Session 1
pham vu
 
C command
C commandC command
C command
Asif Ali Raza
 
Intro to assembly language
Intro to assembly languageIntro to assembly language
Intro to assembly language
United International University
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3
Motaz Saad
 
Microprocessor chapter 9 - assembly language programming
Microprocessor  chapter 9 - assembly language programmingMicroprocessor  chapter 9 - assembly language programming
Microprocessor chapter 9 - assembly language programming
Wondeson Emeye
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4
Motaz Saad
 
Assembly Language Lecture 2
Assembly Language Lecture 2Assembly Language Lecture 2
Assembly Language Lecture 2
Motaz Saad
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
Shehrevar Davierwala
 
Assembly Language Lecture 5
Assembly Language Lecture 5Assembly Language Lecture 5
Assembly Language Lecture 5
Motaz Saad
 
Part I:Introduction to assembly language
Part I:Introduction to assembly languagePart I:Introduction to assembly language
Part I:Introduction to assembly language
Ahmed M. Abed
 
Assembly Language Basics
Assembly Language BasicsAssembly Language Basics
Assembly Language Basics
Education Front
 
Unix command-line tools
Unix command-line toolsUnix command-line tools
Unix command-line tools
Eric Wilson
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
Motaz Saad
 
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGChapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Frankie Jones
 
Computer languages 11
Computer languages 11Computer languages 11
Computer languages 11
Muhammad Ramzan
 
Chapt 01 Assembly Language
Chapt 01 Assembly LanguageChapt 01 Assembly Language
Chapt 01 Assembly Language
Hamza Akram
 
F-F-Fiddle Assembly Instructions
F-F-Fiddle Assembly InstructionsF-F-Fiddle Assembly Instructions
F-F-Fiddle Assembly Instructions
dsp39
 
Lec 04 intro assembly
Lec 04 intro assemblyLec 04 intro assembly
Lec 04 intro assembly
Abdul Khan
 
IMSDB COMMAND CODES - PROC OPTIONS
IMSDB COMMAND CODES - PROC OPTIONSIMSDB COMMAND CODES - PROC OPTIONS
IMSDB COMMAND CODES - PROC OPTIONS
Srinimf-Slides
 
Session 1
Session 1Session 1
Session 1
pham vu
 
Assembly Language Lecture 3
Assembly Language Lecture 3Assembly Language Lecture 3
Assembly Language Lecture 3
Motaz Saad
 
Microprocessor chapter 9 - assembly language programming
Microprocessor  chapter 9 - assembly language programmingMicroprocessor  chapter 9 - assembly language programming
Microprocessor chapter 9 - assembly language programming
Wondeson Emeye
 
Assembly Language Lecture 4
Assembly Language Lecture 4Assembly Language Lecture 4
Assembly Language Lecture 4
Motaz Saad
 
Assembly Language Lecture 2
Assembly Language Lecture 2Assembly Language Lecture 2
Assembly Language Lecture 2
Motaz Saad
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
Shehrevar Davierwala
 
Assembly Language Lecture 5
Assembly Language Lecture 5Assembly Language Lecture 5
Assembly Language Lecture 5
Motaz Saad
 
Part I:Introduction to assembly language
Part I:Introduction to assembly languagePart I:Introduction to assembly language
Part I:Introduction to assembly language
Ahmed M. Abed
 
Assembly Language Basics
Assembly Language BasicsAssembly Language Basics
Assembly Language Basics
Education Front
 
Unix command-line tools
Unix command-line toolsUnix command-line tools
Unix command-line tools
Eric Wilson
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
Motaz Saad
 
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGChapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMING
Frankie Jones
 
Ad

Similar to Multicore (20)

Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
Intel® Software
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
GopalPatidar13
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
Roman Podoliaka
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Dr. Fabio Baruffa
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
Amin Astaneh
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
Igalia
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to Omega
George Markomanolis
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Ganesan Narayanasamy
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Fernando Luiz Cola
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
Linaro
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
Ganesan Narayanasamy
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
Mark Veltzer
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
AnastasiaStulova
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
Intel® Software
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
Roman Podoliaka
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Dr. Fabio Baruffa
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
Amin Astaneh
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
Igalia
 
Burst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to OmegaBurst Buffer: From Alpha to Omega
Burst Buffer: From Alpha to Omega
George Markomanolis
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
Tulipp. Eu
 
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor toolContainerizing HPC and AI applications using E4S and Performance Monitor tool
Containerizing HPC and AI applications using E4S and Performance Monitor tool
Ganesan Narayanasamy
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Fernando Luiz Cola
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
Linaro
 
Performance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4SPerformance Evaluation using TAU Performance System and E4S
Performance Evaluation using TAU Performance System and E4S
Ganesan Narayanasamy
 
Linux multiplexing
Linux multiplexingLinux multiplexing
Linux multiplexing
Mark Veltzer
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
AnastasiaStulova
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
sean chen
 
Ad

More from Birgit Plötzeneder (13)

Datentypen LabVIEW
Datentypen LabVIEWDatentypen LabVIEW
Datentypen LabVIEW
Birgit Plötzeneder
 
Instant Insanity
Instant Insanity Instant Insanity
Instant Insanity
Birgit Plötzeneder
 
Messen mit LabVIEW - Organisatorisches
Messen mit LabVIEW - Organisatorisches Messen mit LabVIEW - Organisatorisches
Messen mit LabVIEW - Organisatorisches
Birgit Plötzeneder
 
Messen mit LabVIEW - Block 1
Messen mit LabVIEW - Block 1 Messen mit LabVIEW - Block 1
Messen mit LabVIEW - Block 1
Birgit Plötzeneder
 
Füllstand
FüllstandFüllstand
Füllstand
Birgit Plötzeneder
 
Füllstand
FüllstandFüllstand
Füllstand
Birgit Plötzeneder
 
Some random graphs for network models - Birgit Plötzeneder
Some random graphs for network models -  Birgit PlötzenederSome random graphs for network models -  Birgit Plötzeneder
Some random graphs for network models - Birgit Plötzeneder
Birgit Plötzeneder
 

Recently uploaded (20)

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 

Multicore

  • 1. Multicore Birgit Plötzeneder, 11/24/10
  • 2. Intro (Why?) Architecture Languages OMP MPI Tools
  • 3. Darling, I shrunk the computer. * copyright by Prof. Erik Hagersten / Uppsala, who does awesome work Signal propagation delay » transistor delay Not enough ILP for more transistors Power consumption
  • 4. O RLY? You want FASTER code. NOW. - prefetching - high comp load - image/video - fun
  • 5.  
  • 7. AMD Shanghai (K10)
  • 8. Intel Dunnington (Xeon 74xx)
  • 10. AMD Magny Cours
  • 12. Moving from 1 core to 4 cores can give you a factor of
  • 13. Moving from m emory to L 1 can give you a factor of
  • 14. Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor …
  • 15. * see Iris Christadler, LRZ
  • 17. Program start : only master thread runs Parallel region : team of worker threads is generated (“fork”) Threads synchronize when leaving parallel region (“join”) OpenMP-Concept
  • 19. Work-sharing constructs omp for or omp do sections single master
  • 20. Data sharing attribute clauses shared : visible and accessible by all threads simultaneously. Default (!i) . a[i]=a[i-1].. private : each thread will have a local copy, value is not maintained for use outside firstprivate : like private except initialized to original value. lastprivate : like private except original value is updated after construct. reduction (->reduction ops)
  • 21. Scheduling clauses schedule(type, chunk): static dynamic guided
  • 22. Other clauses critical : executed by only one thread at a time atomic : similar to critical section, but may be better ordered : executed in the order in which iterations would be executed in a sequential loop b arrier nowait
  • 24.  
  • 25. MPI-Concept mpicc <options> prog.c mpirun -arch <architecture> -np<np> prog
  • 26. MPI
  • 27. MPI
  • 28. MPI program: 6 basic calls MPI messages Communicators MPI MPI_INIT MPI_COMM_RANK MPI_COMM_SIZE MPI_SEND MPI_RECV MPI_FINALIZE data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)
  • 29. Communication modes Collective vs P2P One2All, All2All, All2One Blocking / Nonblocking Synchronous / Asynchronous
  • 30. Communication modes synchronous mode (&quot;safest&quot;): Is the receiver ready? ready mode (lowest system overhead)- only if there is a receiver waiting (streaming) buffered mode (decouples sender from receiver), buffer size, buffer attachment! standard mode
  • 31.   Communication Mode Blocking Routines Non-Blocking Routines   synchronous MPI_SSEND MPI_ISSEND   ready MPI_RSEND MPI_IRSEND   buffered MPI_BSEND MPI_IBSEND   standard MPI_SEND MPI_ISEND   MPI_RECV MPI_IRECV   MPI_SENDRECV   MPI_SENDRECV_REPLACE
  • 32. Collective communication Barrier Broadcast Gather Scatter Reduction
  • 34. PAPI PAPI is a library that monitors hardware events when a program runs. Papiex is a tool that makes it easy to get access to performance counters using PAPI .* * http://icl.cs.utk.edu/papi / papiex –e <EVENT> ./my_prog (to turn of optimizations (use the flag -O0) for some tests)
  • 35. Profilers Two Types Statistical Profilers Event Based Profilers Statistical Profiling : Interrupts at random intervals and records which program instruction the CPU is executing. Event Based Profiling : Interrupts triggered by hardware counter events are recorded. Measuring profiles affects performance. Still a lot of data saved.
  • 36. Tracing Wrappers for function calls (for example MPI_Recv) Records when a function was called and with what parameters Which nodes exchanged messages, message size… Can affect performance
  • 37. Intel tracing tools Marmot MPI correctness and portability checker MpiP - http://mpip.sourceforge.net/
  • 38. Extrae + Paraver module add paraver mpi2prv -f TRACE.mpits -o MPImatrix.prv v Scalasca Screenshots and examples of profilers/tracing tools available – but not on the internet. v
  • 39. This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via [email_address] You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me accordingly. Some of the code was copy-pasted from Wikipedia. I've removed copy-right problematic parts.