Multicore

Jan 30, 2011Download as PPT, PDF0 likes547 views

Birgit Plötzeneder

This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via [email protected]

Intro (Why?) Architecture Languages OMP MPI Tools

Darling, I shrunk the computer. * copyright by Prof. Erik Hagersten / Uppsala, who does awesome work Signal propagation delay » transistor delay Not enough ILP for more transistors Power consumption

O RLY? You want FASTER code. NOW. - prefetching - high comp load - image/video - fun

Moving from 1 core to 4 cores can give you a factor of

Moving from m emory to L 1 can give you a factor of

Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor …

Program start : only master thread runs Parallel region : team of worker threads is generated (“fork”) Threads synchronize when leaving parallel region (“join”) OpenMP-Concept

Work-sharing constructs omp for or omp do sections single master

Data sharing attribute clauses shared : visible and accessible by all threads simultaneously. Default (!i) . a[i]=a[i-1].. private : each thread will have a local copy, value is not maintained for use outside firstprivate : like private except initialized to original value. lastprivate : like private except original value is updated after construct. reduction (->reduction ops)

Scheduling clauses schedule(type, chunk): static dynamic guided

Other clauses critical : executed by only one thread at a time atomic : similar to critical section, but may be better ordered : executed in the order in which iterations would be executed in a sequential loop b arrier nowait

MPI-Concept mpicc <options> prog.c mpirun -arch <architecture> -np<np> prog

MPI program: 6 basic calls MPI messages Communicators MPI MPI_INIT MPI_COMM_RANK MPI_COMM_SIZE MPI_SEND MPI_RECV MPI_FINALIZE data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)

Communication modes Collective vs P2P One2All, All2All, All2One Blocking / Nonblocking Synchronous / Asynchronous

Communication modes synchronous mode ("safest"): Is the receiver ready? ready mode (lowest system overhead)- only if there is a receiver waiting (streaming) buffered mode (decouples sender from receiver), buffer size, buffer attachment! standard mode

Communication Mode Blocking Routines Non-Blocking Routines synchronous MPI_SSEND MPI_ISSEND ready MPI_RSEND MPI_IRSEND buffered MPI_BSEND MPI_IBSEND standard MPI_SEND MPI_ISEND MPI_RECV MPI_IRECV MPI_SENDRECV MPI_SENDRECV_REPLACE

Collective communication Barrier Broadcast Gather Scatter Reduction

PAPI PAPI is a library that monitors hardware events when a program runs. Papiex is a tool that makes it easy to get access to performance counters using PAPI .* * http://icl.cs.utk.edu/papi / papiex –e <EVENT> ./my_prog (to turn of optimizations (use the flag -O0) for some tests)

Profilers Two Types Statistical Profilers Event Based Profilers Statistical Profiling : Interrupts at random intervals and records which program instruction the CPU is executing. Event Based Profiling : Interrupts triggered by hardware counter events are recorded. Measuring profiles affects performance. Still a lot of data saved.

Tracing Wrappers for function calls (for example MPI_Recv) Records when a function was called and with what parameters Which nodes exchanged messages, message size… Can affect performance

Intel tracing tools Marmot MPI correctness and portability checker MpiP - http://mpip.sourceforge.net/

Extrae + Paraver module add paraver mpi2prv -f TRACE.mpits -o MPImatrix.prv v Scalasca Screenshots and examples of profilers/tracing tools available – but not on the internet. v

This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via [email_address] You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me accordingly. Some of the code was copy-pasted from Wikipedia. I've removed copy-right problematic parts.

The document discusses parallel programming and message passing as a parallel programming model. It provides examples of using MPI (Message Passing Interface) and MapReduce frameworks for parallel programming. Some key applications discussed are financial risk assessment, molecular dynamics simulations, rendering animation, and web indexing. Challenges with parallel programming include potential slowdown due to overhead and limitations of parallel speedup based on sequential fractions of programs.

Everything You Need to Know About the Intel® MPI LibraryIntel® Software

The document discusses tuning the Intel MPI library. It begins with an introduction to factors that impact MPI performance like CPUs, memory, network speed and job size. It notes that MPI libraries must make choices that may not be optimal for all applications. The document then outlines its plan to cover basic tuning techniques like profiling, hostfiles and process placement, as well as intermediate topics like point-to-point optimization and collective tuning. The goal is to help reduce time and memory usage of MPI applications.

OpenmpAmirali Sharifian

OpenMP is a framework for parallel programming that utilizes shared memory multiprocessing. It allows users to split their programs into threads that can run simultaneously across multiple processors or processor cores. OpenMP uses compiler directives, runtime libraries, and environment variables to implement parallel regions, shared memory, and thread synchronization. It is commonly used with C/C++ and Fortran to parallelize loops and speed up computationally intensive programs. A real experiment showed a nested for loop running 3.4x faster when parallelized with OpenMP compared to running sequentially.

OpenMP And C++Dragos Sbîrlea

C++ and OpenMP can be used together to create fast and maintainable parallel programs. However, there are some challenges to parallelizing C++ code using OpenMP due to inconsistencies between the C++ and OpenMP specifications. Objects used in OpenMP clauses like shared, private, and firstprivate require special handling of constructors, destructors, and assignment operators. Parallelizing C++ loops can also be problematic if the loop index is not an integer type or if the loop uses STL iterators. STL containers introduce additional issues for parallelization related to initialization and data distribution across processors.

MPI Raspberry pi 3 clusterArafat Hussain

The document discusses setting up a 4-node MPI Raspberry Pi cluster and Hadoop cluster. It describes the hardware and software needed for the MPI cluster, including 4 Raspberry Pi 3 boards, Ethernet cables, micro SD cards, and MPI software. It also provides an overview of Hadoop, a framework for distributed storage and processing of big data, noting its origins from Google papers and use by companies like Amazon, Facebook, and Netflix.

Migration To Multi Core - Parallel Programming ModelsZvi Avraham

Introduction to OpenMPAkhila Prabhakaran

The document provides an introduction to OpenMP, which is an application programming interface for explicit, portable, shared-memory parallel programming in C/C++ and Fortran. OpenMP consists of compiler directives, runtime calls, and environment variables that are supported by major compilers. It is designed for multi-processor and multi-core shared memory machines, where parallelism is accomplished through threads. Programmers have full control over parallelization through compiler directives that control how the program works, including forking threads, work sharing, synchronization, and data environment.

Concurrent Programming OpenMP @ Distributed System DiscussionCherryBerry2

This powerpoint presentation discusses OpenMP, a programming interface that allows for parallel programming on shared memory architectures. It covers the basic architecture of OpenMP, its core elements like directives and runtime routines, advantages like portability, and disadvantages like potential synchronization bugs. Examples are provided of using OpenMP directives to parallelize a simple "Hello World" program across multiple threads. Fine-grained and coarse-grained parallelism are also defined.

OpenMpNeel Bhad

OpenMP is a portable programming model that allows for parallel programming on shared memory architectures. It utilizes multithreading and shared memory to parallelize serial programs. OpenMP uses compiler directives, runtime libraries, and environment variables to parallelize loops and sections of code. It uses a fork-join model where the master thread forks additional threads to run portions of the program concurrently using shared memory. OpenMP provides a way to incrementally parallelize programs and is supported across many platforms.

MPI in TNT for parallel processingMartín Morales

TNT (a widely used program for phylogenetic analysis) already has PVM (Parallel Virtual Machine) for parallel jobs handling. However: - MPI remains the dominant model used in high-performance computing today. It has become a de facto standard for communication among processes that model a parallel program. - Actual supercomputers such as computer Clusters often run such programs. The project goal was to integrate MPI onto TNT and next to PVM. The user decides one or the other. Syntax and commands are practically the same.

Introduction to MPIAkhila Prabhakaran

The Message Passing Interface (MPI) allows parallel applications to communicate between processes using message passing. MPI programs initialize and finalize a communication environment, and most communication occurs through point-to-point send and receive operations between processes. Collective communication routines like broadcast, scatter, and gather allow all processes to participate in the communication.

Introduction to OpenMP (Performance)Akhila Prabhakaran

openmpNeel Bhad

OpenMP is an application programming interface that supports multi-platform shared memory parallel programming in C/C++ and Fortran. The OpenMP API was first released in 1997 with specifications for Fortran and later expanded to include C/C++. Version 3.0 of OpenMP, released in 2008, introduced tasks and task constructs to the API. OpenMP uses compiler directives to define parallel regions that can be executed concurrently by multiple threads, allowing for nested parallelism. It supports dynamic allocation of threads but leaves input/output and memory consistency handling to the programmer.

Intro to OpenMPjbp4444

Move Message Passing Interface Applications to the Next LevelIntel® Software

Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS

The slides were presented by Jeffry Molanus who is the CTO of OpenEBS in Golang Meetup. OpenEBS is an open source cloud native storage. OpenEBS delivers storage and storage services to containerized environments. OpenEBS allows stateful workloads to be managed more like stateless containers. OpenEBS storage services include: per container (or pod) QoS SLAs, tiering and replica policies across AZs and environments, and predictable and scalable performance.Our vision is simple: let’s let storage and storage services for persistent workloads be so fully integrated into the environment and hence managed automatically that is almost disappears into the background as just yet another infrastructure service that works.

Parallel program designZongYing Lyu

The document discusses parallel program design and parallel programming techniques. It introduces parallel algorithm design based on four steps: partitioning, communication, agglomeration, and mapping. It also covers parallel programming tools including pthreads, OpenMP, and MPI. Common parallel constructs like private, shared, barrier, and reduction are explained. Examples of parallel programs using pthreads and OpenMP are provided.

Numba Overviewstan_seibert

Numba is a just-in-time compiler for Python that can optimize numerical code to achieve speeds comparable to C/C++ without requiring the user to write C/C++ code. It works by compiling Python functions to optimized machine code using type information. Numba supports NumPy arrays and common mathematical functions. It can automatically optimize loops and compile functions for CPU or GPU execution. Numba allows users to write high-performance numerical code in Python without sacrificing readability or development speed.

OpenMPEric Cheng

This document discusses utilizing multicore processors with OpenMP. It provides an overview of OpenMP, including that it is an industry standard for parallel programming in C/C++ that supports parallelizing loops and tasks. Examples are given of using OpenMP to parallelize particle system position calculation and collision detection across multiple threads. Performance tests on dual-core and triple-core systems show speedups of 2-5x from using OpenMP. Some limitations of OpenMP are also outlined.

Introduction to OpenMPAkhila Prabhakaran

This document provides an overview of parallel programming with OpenMP. It discusses how OpenMP allows users to incrementally parallelize serial C/C++ and Fortran programs by adding compiler directives and library functions. OpenMP is based on the fork-join model where all programs start as a single thread and additional threads are created for parallel regions. Core OpenMP elements include parallel regions, work-sharing constructs like #pragma omp for to parallelize loops, and clauses to control data scoping. The document provides examples of using OpenMP for tasks like matrix-vector multiplication and numerical integration. It also covers scheduling, handling race conditions, and other runtime functions.

ORTE - OCERA Real Time ethernetAlexandre Chatiron

The ORTE implements the RTPS communications model for embedded systems, running on a standard UDP/IP stack. It provides a publish-subscribe middleware interface that handles network communication tasks, allowing publishers and subscribers to label messages with topics rather than node addresses. The demo Shape Demo uses ORTE and QT libraries to demonstrate real-time publish-subscribe capabilities by globally transferring shape data between publisher and subscriber nodes configured through a graphical interface. ORTE is implemented as a set of manager, application, writer, and reader objects and supports Linux, RTLinux, and Windows platforms.

Open mpGopi Saiteja

This document discusses shared-memory parallel programming using OpenMP. It begins with an overview of OpenMP and the shared-memory programming model. It then covers key OpenMP constructs for parallelizing loops, including the parallel for pragma and clauses for declaring private variables. It also discusses managing shared data with critical sections and reductions. The document provides several techniques for improving performance, such as loop inversions, if clauses, and dynamic scheduling.

Open mp intro_01Oleg Nazarevych

Open mp directivesPrabhakaran V M

OpenMP directives are used to parallelize sequential programs. The key directives discussed include: 1. Parallel and parallel for to execute loops or code blocks across multiple threads. 2. Sections and parallel sections to execute different code blocks simultaneously in parallel across threads. 3. Critical to ensure a code block is only executed by one thread at a time for mutual exclusion. 4. Single to restrict a code block to only be executed by one thread. OpenMP makes it possible to easily convert sequential programs to leverage multiple threads and processors through directives like these.

OpenMPmohammadradpour

OpenMP is a tool for parallel programming using shared memory multiprocessing. It allows users to split their program into threads that can run simultaneously on multiple processors. OpenMP uses compiler directives to indicate which parts of a program should be run in parallel. It is simple to use as it does not require extensive code changes, and works across platforms supporting C, C++, and Fortran. An experiment showed a sequential program taking 3347.68 ms to run versus 983.576 ms when parallelized using OpenMP, demonstrating its ability to speed up programs by distributing work across multiple threads and processors.

Return Oriented ProgrammingUTD Computer Security Group

Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova

There are three key aspects of computer architecture: instruction set architecture, microarchitecture, and hardware design. Modern architectures aim to either hide latency or maximize throughput. Reduced instruction set computers (RISC) became popular due to simpler decoding and pipelining allowing higher clock speeds. While complex instruction set computers (CISC) focused on code density, RISC architectures are now dominant due to their efficiency. Very long instruction word (VLIW) and vector processors targeted specialized workloads but their concepts influence modern designs. Load-store RISC architectures with fixed-width instructions and minimal addressing modes provide an optimal balance between performance and efficiency.

OpenMP Tutorial for BeginnersDhanashree Prasad

OpenMP is an API used for multi-threaded parallel programming on shared memory machines. It uses compiler directives, runtime libraries and environment variables. OpenMP supports C/C++ and Fortran. The programming model uses a fork-join execution model with explicit parallelism defined by the programmer. Compiler directives like #pragma omp parallel are used to define parallel regions. Work is shared between threads using constructs like for, sections and tasks. Synchronization is implemented using barriers, critical sections and locks.

Chapt 01 Assembly LanguageHamza Akram

The document summarizes key concepts from Chapter 1 of an assembly language textbook, including: 1) It introduces assembly language and discusses how it relates to both machine language and higher-level languages like C++ and Java. 2) It describes the virtual machine concept and different levels of abstraction in a computer system from high-level languages down to digital logic. 3) It covers data representation in computers including binary, hexadecimal, integer, and character representation and conversions between numbering systems. 4) It discusses Boolean logic and operations like NOT, AND, OR used to design computer hardware and software.

F-F-Fiddle Assembly Instructionsdsp39

More Related Content

What's hot (20)

OpenMpNeel Bhad

MPI in TNT for parallel processingMartín Morales

Introduction to MPIAkhila Prabhakaran

Introduction to OpenMP (Performance)Akhila Prabhakaran

openmpNeel Bhad

Intro to OpenMPjbp4444

Move Message Passing Interface Applications to the Next LevelIntel® Software

Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS

Parallel program designZongYing Lyu

Numba Overviewstan_seibert

OpenMPEric Cheng

Introduction to OpenMPAkhila Prabhakaran

ORTE - OCERA Real Time ethernetAlexandre Chatiron

Open mpGopi Saiteja

Open mp intro_01Oleg Nazarevych

Open mp directivesPrabhakaran V M

OpenMPmohammadradpour

Return Oriented ProgrammingUTD Computer Security Group

Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova

OpenMP Tutorial for BeginnersDhanashree Prasad

OpenMpNeel Bhad

MPI in TNT for parallel processingMartín Morales

Introduction to MPIAkhila Prabhakaran

Introduction to OpenMP (Performance)Akhila Prabhakaran

openmpNeel Bhad

Intro to OpenMPjbp4444

Move Message Passing Interface Applications to the Next LevelIntel® Software

Dynamic Instrumentation- OpenEBS Golang Meetup July 2017OpenEBS

Parallel program designZongYing Lyu

Numba Overviewstan_seibert

OpenMPEric Cheng

Introduction to OpenMPAkhila Prabhakaran

ORTE - OCERA Real Time ethernetAlexandre Chatiron

Open mpGopi Saiteja

Open mp intro_01Oleg Nazarevych

Open mp directivesPrabhakaran V M

OpenMPmohammadradpour

Return Oriented ProgrammingUTD Computer Security Group

Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova

OpenMP Tutorial for BeginnersDhanashree Prasad

Viewers also liked (20)

Chapt 01 Assembly LanguageHamza Akram

F-F-Fiddle Assembly Instructionsdsp39

Lec 04 intro assemblyAbdul Khan

The document provides an introduction to assembly language programming including: - The basic elements of assembly language such as instructions, directives, constants, identifiers, and comments. - A flat memory program template that includes TITLE, MODEL, STACK, DATA, CODE, and other directives. - An example program that adds and subtracts integers and calls a procedure to display registers. - An overview of the assemble-link-debug cycle used to develop assembly language programs.

IMSDB COMMAND CODES - PROC OPTIONSSrinimf-Slides

This document provides information about IMSDB command codes and processing options. It lists the command codes A through Z, describing what each command code does when accessing an IMS database. It also indicates which command codes can be used with different processing options like GU, GHU, GN, etc. Finally, it defines some common processing options like G, R, I, D, and describes special options for DEDB and Fast Path databases like GON, GONP, GOT, and GOTP.

Mips1Stefano Salvatori

Session 1pham vu

C was created in the early 1970s and is widely used for systems programming like operating systems and utilities. The document discusses the basics of C including its origins, typical uses, program structure using keywords, comments, functions and libraries. It also covers flowcharts and pseudocode as ways to represent algorithms and solve problems through structured programming techniques like conditionals and loops.

C commandAsif Ali Raza

Intro to assembly languageUnited International University

1) The document discusses different levels of programming languages including machine language, assembly language, and high-level languages. Assembly language uses symbolic instructions that directly correspond to machine language instructions. 2) It describes the components of the Intel 8086 processor including its 16-bit registers like the accumulator, base, count, and data registers as well as its segment, pointer, index, and status flag registers. 3) Binary numbers can be represented in signed magnitude, one's complement, or two's complement form. Two's complement is commonly used in modern computers as it allows for efficient addition and subtraction of binary numbers.

Assembly Language Lecture 3Motaz Saad

The document discusses fundamentals of assembly language including instruction execution and addressing, directives, procedures, data types, arithmetic instructions, and a practice problem. It explains that instructions are translated to object code while directives control assembly but generate no machine code. Common directives include TITLE, STACK, DATA, CODE, and PROC. Data can be defined using directives like DB, DW, DD, DQ. Instructions like ADD, SUB, MUL, DIV perform arithmetic calculations on registers and memory.

Microprocessor chapter 9 - assembly language programmingWondeson Emeye

This document provides an overview of assembly language programming concepts for the 8086 processor. It discusses variables which are stored in registers, assignment using MOV instructions, input/output using INT 21h to call operating system functions and pass parameters in registers, and complete program examples that demonstrate displaying characters, reading input, and terminating programs. It also provides sample programs and exercises for students to practice core concepts like loops, conditional jumps, arithmetic operations on numbers in various formats.

Assembly Language Lecture 4Motaz Saad

Assembly Language Lecture 2Motaz Saad

This document discusses assembly language fundamentals and MS-DOS functions using software interrupts. It covers the INT instruction, interrupt vector table, common interrupts like INT 10h for video and INT 21h for MS-DOS services. It provides examples of using INT 21h functions for input/output, including reading/writing characters and strings, reading the date/time, and displaying the date and time. The document is intended as an overview and introduction to assembly language and MS-DOS function calls.

Assembly language programming_fundamentals 8086Shehrevar Davierwala

This document provides an introduction to assembly language programming fundamentals. It discusses machine languages and low-level languages. It also covers data representation and numbering systems. Key assembly language concepts like instructions format, directives, procedures, macros and input/output are described. Examples are given to illustrate variables, assignment, conditional jumps, loops and other common programming elements in assembly language.

Assembly Language Lecture 5Motaz Saad

This document discusses assembly language fundamentals including conditional processing, status flags, Boolean and comparison instructions, conditional jumps, and conditional structures. It provides examples of how to implement if-else statements, while loops, and switch selections using assembly language instructions and directives. It also explains how MASM generates conditional jump code for decision directives based on operand types.

Part I:Introduction to assembly languageAhmed M. Abed

Assembly Language BasicsEducation Front

This document outlines the basics of assembly language, including basic elements, statements, program data, variables, constants, instructions, translation to assembly language, and program structure. It discusses statement syntax, valid names, operation and operand fields. It also covers common instructions like MOV, ADD, SUB, INC, DEC, and NEG. Finally, it discusses program segments, memory models, and how to define the data, stack, and code segments.

Unix command-line toolsEric Wilson

The document discusses several common Unix command line utilities for text processing and file searching: - find - Searches for files and directories based on various criteria like name, type, size, and modification time. Results can be piped to xargs to perform actions. - grep - Searches files for text patterns. Has options for case-insensitive, recursive, and whole word searches. - sed - Stream editor for modifying text, especially useful for find-and-replace. Can capture groups and perform transformations.

Assembly Language Lecture 1Motaz Saad

Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGFrankie Jones

Computer languages 11Muhammad Ramzan

Computer languages allow humans to communicate with computers through programming. There are different types of computer languages at different levels of abstraction from machine language up to high-level languages. High-level languages are closer to human language while low-level languages are closer to machine-readable code. Programs written in high-level languages require compilers or interpreters to convert them to machine-readable code that can be executed by computers.

Chapt 01 Assembly LanguageHamza Akram

F-F-Fiddle Assembly Instructionsdsp39

Lec 04 intro assemblyAbdul Khan

IMSDB COMMAND CODES - PROC OPTIONSSrinimf-Slides

Mips1Stefano Salvatori

Session 1pham vu

C commandAsif Ali Raza

Intro to assembly languageUnited International University

Assembly Language Lecture 3Motaz Saad

Microprocessor chapter 9 - assembly language programmingWondeson Emeye

Assembly Language Lecture 4Motaz Saad

Assembly Language Lecture 2Motaz Saad

Assembly language programming_fundamentals 8086Shehrevar Davierwala

Assembly Language Lecture 5Motaz Saad

Part I:Introduction to assembly languageAhmed M. Abed

Assembly Language BasicsEducation Front

Unix command-line toolsEric Wilson

Assembly Language Lecture 1Motaz Saad

Chapter 3 INSTRUCTION SET AND ASSEMBLY LANGUAGE PROGRAMMINGFrankie Jones

Computer languages 11Muhammad Ramzan

Similar to Multicore (20)

Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software

The document discusses best practices and a performance study of HPC clusters. It covers system configuration and tuning, building applications, Intel Xeon processors, efficient execution methods, tools for boosting performance, and application performance highlights using HPL and HPCG benchmarks. The document contains agenda items, market share data, typical BIOS settings, compiler flags, MPI usage, and performance results from single node and cluster runs of the benchmarks.

25-MPI-OpenMP.pptxGopalPatidar13

This document discusses MPI (Message Passing Interface) and OpenMP for parallel programming. MPI is a standard for message passing parallel programs that requires explicit communication between processes. It provides functions for point-to-point and collective communication. OpenMP is a specification for shared memory parallel programming that uses compiler directives to parallelize loops and sections of code. It provides constructs for work sharing, synchronization, and managing shared memory between threads. The document compares the two approaches and provides examples of simple MPI and OpenMP programs.

Unmanaged Parallelization via P/InvokeDmitri Nesteruk

Parallelization of Coupled Cluster Code with OpenMPAnil Bohare

This document discusses parallelizing a Coupled Cluster Singles and Doubles (CCSD) molecular dynamics application code using OpenMP to reduce its execution time on multi-core systems. Specifically, it identifies compute-intensive loops in the CCSD code for parallelization with OpenMP directives like PARALLEL DO. Performance evaluations show the optimized OpenMP version achieves a 35.66% reduction in wall clock time as the number of cores increases, demonstrating the effectiveness of the parallelization approach. Further improvements could involve a hybrid OpenMP-MPI model.

Debugging Python with gdbRoman Podoliaka

Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa

In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.

Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com

In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions. "Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms. This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems." Watch the video: https://wp.me/p3RLHQ-lPT Learn more: https://www.iwocl.org/iwocl-2020/conference-program/ and https://www.anl.gov/topic/aurora Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh

Over the past few years the Linux kernel has gained features that allow us to learn more about what's really happening on our servers and the applications that run on them. This talk will explore how these new features, particularly perf_events and ebpf, enable us to answer questions about what a Drupal site is doing in real time beyond what the standard logs, server performance tools, and even strace will reveal. Attendees will be provided a brief introduction to example uses of these tools to diagnose performance problems. This talk is intended for attendees that are familiar with Linux, the command line, and have used host observability tools in the past (top, netstat, etc).

D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia

By Andy Wingo. Snabb is an open-source toolkit for building fast, flexible network functions. Since its beginnings in 2012, Snabb has seen some modest deployment success ranging from simple one-off diagnosis tools to border routers that process all IPv4 traffic for entire countries. This talk will give an introduction to Snabb. After going over Snabb's fundamental components and how they combine, the talk will move on to examples of how network engineers are taking advantage of Snabb in practice, mentioning a few of the many open-source network functions built on Snabb. (c) RIPE 77 15 - 19 October 2018 Amsterdam, Netherlands https://ripe77.ripe.net

Burst Buffer: From Alpha to OmegaGeorge Markomanolis

HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu

This document summarizes a workshop on the Tulipp project, which aims to develop ubiquitous low-power image processing platforms. The workshop covered shortcomings of existing platforms, introduced the Maestro real-time operating system as the reference platform, and described the concept of the Tulipp project to provide an operating system and tools to support heterogeneous architectures including FPGA and multi-core processors. Attendees participated in hands-on labs demonstrating how to build applications with Maestro, leverage OpenMP for parallelism, and use SDSoC tools to automatically accelerate functions in FPGA hardware.

Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy

The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes. This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.

Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software

Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola

This document discusses developing embedded solutions using asymmetric multiprocessing (AMP) architectures. It provides an overview of AMP vs symmetric multiprocessing (SMP), examples of AMP applications, and the NXP I.MX7 dual-core processor architecture. It then demonstrates inter-processor communication between Linux on an ARM Cortex-A7 core and FreeRTOS on a Cortex-M4 core using RPMSG. Finally, it shows an example Qt application running on Linux that receives sensor data from FreeRTOS via RPMSG and displays it in real-time charts.

BUD17-300: Journey of a packetLinaro

"Session ID: BUD17-300 Session Name: Journey of a packet - BUD17-300 Speaker: Maxim Uvarov Track: LNG ★ Session Summary ★ Describe step by step what components a packet goes through and details cases when components are implemented in hardware or in software. Attendees will have the definite presentation to understand fundamental differences with DPDK and how ODP solves low end and high end networking issues. --------------------------------------------------- ★ Resources ★ Event Page: http://connect.linaro.org/resource/bud17/bud17-300/ Presentation: https://www.slideshare.net/linaroorg/bud17300-journey-of-a-packet Video: https://youtu.be/wRZXw_xBT20 --------------------------------------------------- ★ Event Details ★ Linaro Connect Budapest 2017 (BUD17) 6-10 March 2017 Corinthia Hotel, Budapest, Erzsébet krt. 43-49, 1073 Hungary --------------------------------------------------- Keyword: packet, LNG http://www.linaro.org http://connect.linaro.org --------------------------------------------------- Follow us on Social Media https://www.facebook.com/LinaroOrg https://twitter.com/linaroorg https://www.youtube.com/user/linaroorg?sub_confirmation=1 https://www.linkedin.com/company/1026961

Performance Evaluation using TAU Performance System and E4SGanesan Narayanasamy

DOE Exascale Computing Project (EC) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S), it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures - including the IBM OpenPOWER with NVIDIA GPU systems. E4S features a broad collection of HPC software packages including the TAU Performance System(R) for performance evaluation of HPC and AI/ML codes. TAU is a versatile profiling and tracing toolkit that supports performance engineering of codes written for CPU and GPUs and has support for most IBM platforms. This talk will give an overview of TAU and E4S and how developers can use these tools to analyze the performance of their codes. TAU supports transparent instrumentation of codes without modifying the application binary. The talk will describe TAU's support for CUDA, OpenACC, pthread, OpenMP, Kokkos, and MPI applications. It will describe TAU's use for Python based frameworks such as Tensorflow and PyTorch. It will cover the use of TAU in E4S containers using Docker and Singularity runtimes under ppc64le. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users.

Linux multiplexingMark Veltzer

Linux provides powerful multiplexing capabilities through file descriptors and APIs like epoll. Multiplexing allows a single thread to handle multiple I/O operations simultaneously. File descriptors can represent network sockets, pipes, timers, signals and more. The epoll API in particular provides efficient waiting on large numbers of file descriptors in kernel space. This allows applications to achieve high concurrency with fewer threads than alternative approaches like multi-threading.

Parallel and Distributed Computing Chapter 8AbdullahMunir32

This document provides an overview of parallel and distributed computing using GPUs. It discusses GPU architecture and how GPUs are designed for massively parallel processing using hundreds of smaller cores compared to CPUs which use 4-8 larger cores. The document also covers GPU memory hierarchy, programming GPUs using OpenCL, and key concepts like work items, work groups, and occupancy which is keeping GPU compute units busy with work to process.

Challenges in GPU compilersAnastasiaStulova

The document discusses challenges in GPU compilers. It begins with introductions and abbreviations. It then outlines the topics to be covered: a brief history of GPUs, what makes GPUs special, how to program GPUs, writing a GPU compiler including front-end, middle-end, and back-end aspects, and a few words about graphics. Key points are that GPUs are massively data-parallel, execute instructions in lockstep, and require supporting new language features like OpenCL as well as optimizing for and mapping to the GPU hardware architecture.

Virtual platformsean chen

This document discusses building a virtual platform for the OpenRISC architecture using SystemC and transaction-level modeling. It covers setting up the toolchain, writing test programs, and simulating the platform using event-driven or cycle-accurate simulation with Icarus Verilog or the Vorpsoc simulator. The virtual platform allows fast development and debugging of OpenRISC code without requiring physical hardware.

Best Practices and Performance Studies for High-Performance Computing ClustersIntel® Software

25-MPI-OpenMP.pptxGopalPatidar13

Unmanaged Parallelization via P/InvokeDmitri Nesteruk

Parallelization of Coupled Cluster Code with OpenMPAnil Bohare

Debugging Python with gdbRoman Podoliaka

Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa

Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com

Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh

D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia

Burst Buffer: From Alpha to OmegaGeorge Markomanolis

HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu

Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy

Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software

Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola

BUD17-300: Journey of a packetLinaro

Performance Evaluation using TAU Performance System and E4SGanesan Narayanasamy

Linux multiplexingMark Veltzer

Parallel and Distributed Computing Chapter 8AbdullahMunir32

Challenges in GPU compilersAnastasiaStulova

Virtual platformsean chen

More from Birgit Plötzeneder (13)

Datentypen LabVIEWBirgit Plötzeneder

Instant Insanity Birgit Plötzeneder

Messen mit LabVIEW - Block 6Birgit Plötzeneder

Messen mit LabVIEW - Block 5Birgit Plötzeneder

Messen mit LabVIEW- Block 3Birgit Plötzeneder

LabVIEW-Kurs FallstudieBirgit Plötzeneder

Messen mit LabVIEW - Block 4Birgit Plötzeneder

Messen mit LabVIEW - Block 2Birgit Plötzeneder

Messen mit LabVIEW - Organisatorisches Birgit Plötzeneder

Messen mit LabVIEW - Block 1 Birgit Plötzeneder

FüllstandBirgit Plötzeneder

Some random graphs for network models - Birgit PlötzenederBirgit Plötzeneder

The document summarizes several random graph models: 1) Erdös and Renyi proposed connecting nodes with probability p, resulting in bell-shaped degree distributions. 2) Watts and Strogatz modeled small-world networks by rewiring edges in a ring lattice with probability p, finding short paths like social networks. 3) Barabasi and Albert grew networks by preferentially attaching new nodes to popular existing nodes, producing scale-free networks with power-law degree distributions and hubs.

Datentypen LabVIEWBirgit Plötzeneder

Instant Insanity Birgit Plötzeneder

Messen mit LabVIEW - Block 6Birgit Plötzeneder

Messen mit LabVIEW - Block 5Birgit Plötzeneder

Messen mit LabVIEW- Block 3Birgit Plötzeneder

LabVIEW-Kurs FallstudieBirgit Plötzeneder

Messen mit LabVIEW - Block 4Birgit Plötzeneder

Messen mit LabVIEW - Block 2Birgit Plötzeneder

Messen mit LabVIEW - Organisatorisches Birgit Plötzeneder

Messen mit LabVIEW - Block 1 Birgit Plötzeneder

FüllstandBirgit Plötzeneder

Some random graphs for network models - Birgit PlötzenederBirgit Plötzeneder

Recently uploaded (20)

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In Francechb3

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark? At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍 Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock

Building 10x Organizations with Modern Productivity Metrics 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’ Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them. But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

Technology Trends in 2025: AI and Big Data AnalyticsInData Labs

At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including: -Artificial Intelligence Market Overview -Strategies for AI Adoption in 2025 -Anticipated drivers of AI adoption and transformative technologies -Benefits of AI and Big data for your business -Tips on how to prepare your business for innovation -AI and data privacy: Strategies for securing data privacy in AI models, etc. Download your free copy nowand implement the key findings to improve your business.

Build Your Own Copilot & Agents For DevsBrian McKeiver

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

This is a Quick Research Guide (QRG). QRGs include the following: - A brief, high-level overview of the QRG topic. - A milestone timeline for the QRG topic. - Links to various free online resource materials to provide a deeper dive into the QRG topic. - Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic. QRGs planned for the series: - Artificial Intelligence QRG - Quantum Computing QRG - Big Data Analytics QRG - Spacecraft Guidance, Navigation & Control QRG (coming 2026) - UK Home Computing & The Birth of ARM QRG (coming 2027) Any questions or comments? - Please contact Arthur Morgan at [email protected]. 100% human made.

Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity

This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots. 📕 Here's what you can expect: - Modeling: Build end-to-end processes using BPMN. - Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes. - Operating: Control process instances with rewind, replay, pause, and stop functions. - Monitoring: Use dashboards and embedded analytics for real-time insights into process instances. This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes. 👨‍🏫 Speaker: Andrei Vintila, Principal Product Manager @UiPath This session streamed live on April 29, 2025, 16:00 CET. Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Linux Professional Institute LPIC-1 Exam.pdfRHCSA Guru

Cyber Awareness overview for 2025 month of securityriccardosl1

TrsLabs - Fintech Product & Business ConsultingTrs Labs

Hybrid Growth Mandate Model with TrsLabs Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant. An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices. Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company Talk to us & Unlock the competitive advantage

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada

Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next. Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/ Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Into The Box Conference Keynote Day 1 (ITB2025)Ortus Solutions, Corp

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc

Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency. This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data. Attendees will learn: - Consumer awareness around data brokers and what consumers are doing to limit data collection - How businesses assess third-party vendors and their consent management operations - Where business preparedness needs improvement - What these trends mean for the future of privacy governance and public trust This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In Francechb3

Rusty Waters: Elevating Lakehouses Beyond Sparkcarlyakerly1

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock

SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfPrecisely

Technology Trends in 2025: AI and Big Data AnalyticsInData Labs

Build Your Own Copilot & Agents For DevsBrian McKeiver

Big Data Analytics Quick Research Guide by Arthur MorganArthur Morgan

Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity

ThousandEyes Partner Innovation Updates for May 2025ThousandEyes

Linux Professional Institute LPIC-1 Exam.pdfRHCSA Guru

Cyber Awareness overview for 2025 month of securityriccardosl1

TrsLabs - Fintech Product & Business ConsultingTrs Labs

#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada

The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john

Into The Box Conference Keynote Day 1 (ITB2025)Ortus Solutions, Corp

Drupalcamp Finland – Measuring Front-end Energy ConsumptionExove

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc

Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma

Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Aqusag Technologies

Multicore

1. Multicore Birgit Plötzeneder, 11/24/10

2. Intro (Why?) Architecture Languages OMP MPI Tools

3. Darling, I shrunk the computer. * copyright by Prof. Erik Hagersten / Uppsala, who does awesome work Signal propagation delay » transistor delay Not enough ILP for more transistors Power consumption

4. O RLY? You want FASTER code. NOW. - prefetching - high comp load - image/video - fun

6. Intel Core 2 Quad

7. AMD Shanghai (K10)

8. Intel Dunnington (Xeon 74xx)

9. Intel i7

10. AMD Magny Cours

11. The Secret..

12. Moving from 1 core to 4 cores can give you a factor of

13. Moving from m emory to L 1 can give you a factor of

14. Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor …

15. * see Iris Christadler, LRZ

16. OMP and MPI

17. Program start : only master thread runs Parallel region : team of worker threads is generated (“fork”) Threads synchronize when leaving parallel region (“join”) OpenMP-Concept

18. A First Program

19. Work-sharing constructs omp for or omp do sections single master

20. Data sharing attribute clauses shared : visible and accessible by all threads simultaneously. Default (!i) . a[i]=a[i-1].. private : each thread will have a local copy, value is not maintained for use outside firstprivate : like private except initialized to original value. lastprivate : like private except original value is updated after construct. reduction (->reduction ops)

21. Scheduling clauses schedule(type, chunk): static dynamic guided

22. Other clauses critical : executed by only one thread at a time atomic : similar to critical section, but may be better ordered : executed in the order in which iterations would be executed in a sequential loop b arrier nowait

23. Using clauses

24.

25. MPI-Concept mpicc <options> prog.c mpirun -arch <architecture> -np<np> prog

26. MPI

27. MPI

28. MPI program: 6 basic calls MPI messages Communicators MPI MPI_INIT MPI_COMM_RANK MPI_COMM_SIZE MPI_SEND MPI_RECV MPI_FINALIZE data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)

29. Communication modes Collective vs P2P One2All, All2All, All2One Blocking / Nonblocking Synchronous / Asynchronous

30. Communication modes synchronous mode ("safest"): Is the receiver ready? ready mode (lowest system overhead)- only if there is a receiver waiting (streaming) buffered mode (decouples sender from receiver), buffer size, buffer attachment! standard mode

31. Communication Mode Blocking Routines Non-Blocking Routines synchronous MPI_SSEND MPI_ISSEND ready MPI_RSEND MPI_IRSEND buffered MPI_BSEND MPI_IBSEND standard MPI_SEND MPI_ISEND MPI_RECV MPI_IRECV MPI_SENDRECV MPI_SENDRECV_REPLACE

32. Collective communication Barrier Broadcast Gather Scatter Reduction

33. gpro f valgrind PAPI

34. PAPI PAPI is a library that monitors hardware events when a program runs. Papiex is a tool that makes it easy to get access to performance counters using PAPI .* * http://icl.cs.utk.edu/papi / papiex –e <EVENT> ./my_prog (to turn of optimizations (use the flag -O0) for some tests)

35. Profilers Two Types Statistical Profilers Event Based Profilers Statistical Profiling : Interrupts at random intervals and records which program instruction the CPU is executing. Event Based Profiling : Interrupts triggered by hardware counter events are recorded. Measuring profiles affects performance. Still a lot of data saved.

36. Tracing Wrappers for function calls (for example MPI_Recv) Records when a function was called and with what parameters Which nodes exchanged messages, message size… Can affect performance

37. Intel tracing tools Marmot MPI correctness and portability checker MpiP - http://mpip.sourceforge.net/

38. Extrae + Paraver module add paraver mpi2prv -f TRACE.mpits -o MPImatrix.prv v Scalasca Screenshots and examples of profilers/tracing tools available – but not on the internet. v

39. This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via [email_address] You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me accordingly. Some of the code was copy-pasted from Wikipedia. I've removed copy-right problematic parts.

Multicore

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Multicore (20)

More from Birgit Plötzeneder (13)

Recently uploaded (20)

Multicore