0% found this document useful (0 votes)

27 views

20.optimization I

Uploaded by

Johan Dahlin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

20.optimization I

Uploaded by

Johan Dahlin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Modern C++

Programming
20. Performance Optimization I
Basic Concepts

Federico Busato
2023-11-14
Table of Context

1 Introduction
Moore’s Law
Moore’s Law Limitations
Reasons for Optimizing
2 Basic Concepts
Asymptotic Complexity
Time-Memory Trade-off
Developing Cycle
Ahmdal’s Law
Throughput, Bandwidth, Latency
Performance Bounds
Arithmetic Intensity 1/50
Table of Context

3 Basic Architecture Concepts

Instruction-Level Parallelism
Little’s Law
Data-Level Parallelism (DLP)
Thread-Level Parallelism (TLP)
Single Instruction Multiple Threads (SIMT)
RISC, CISC Instruction Sets

4 Memory Hierarchy
Memory Hierarchy Concepts
Memory Locality

2/50
Introduction
Performance and Technological Progress

3/50
Performance and Technological Progress

4/50
Moore’s Law 1/2

“The number of transistors incorporated in a chip will approximately

double every 24 months.” (40% per year)
Gordon Moore, Intel co-founder

5/50
Moore’s Law 2/2

The Moore’s Law is not (yet) dead but the same concept is not true for clock
frequency, single-thread performance, power consumption, and cost. How we can
provide value?

6/50
Single-Thread Performance Trend

A Look Back at Single-Threaded CPU Performance

7/50
Herb Sutter, The Free Lunch Is Over
Moore’s Law Limitations 1/2

Higher performance over time is not merely dictated by the number of transistors.
Specific hardware improvements, software engineering, and algorithms play a crucial
rule in driving the computer performance.

8/50
Moore’s Law Limitations - Two Examples 2/2

Specialized Hardware
Reduced precision, matrix multiplication engine, and sparsity provided orders
of magnitude performance improvement for AI applications

Forget Moore’s Law. Algorithms drive technology forward

“Algorithmic improvements make more efficient use of existing resources and allow
computers to do a task faster, cheaper, or both. Think of how easy the smaller MP3
format made music storage and transfer. That compression was because of an algorithm.”

• Forget Moore’s Law

• What will drive computer performance after Moore’s law?
9/50
• Heeding Huang’s Law
Reasons for Optimizing

• In the first decades, the computer performance was extremely limited. Low-level
optimizations were essential to fully exploit the hardware

• Modern systems provide much higher performance, but we cannot more rely on
hardware improvement on short-period

• Performance and efficiency add market value (fast program for a given task), e.g.
search, page loading, etc.

• Optimized code uses less resources, e.g. in a program that runs on a server for
months or years, a small reduction in the execution time/power consumption
translates in a big saving of power consumption
10/50
Software Optimization is Complex

from ”Speed is Found in the Minds of People“,

Andrei Alexandrescu, CppCon 2019 11/50
References

• Optimized C++, Kurt Guntheroth

• Awesome C/C++ performance optimization resources, Bartlomiej Filipek

• Optimizing C++, wikibook

• Optimizing software in C++, Agner Fog

• Hacker Delight (2nd), Henry S. Warren

• Algorithmica: Algorithms for Modern Hardware

• What scientists must know about hardware to write fast code

12/50
Basic Concepts
Asymptotic Complexity 1/2

The asymptotic analysis refers to estimate the execution time or memory usage as
function of the input size (the order of growing)
The asymptotic behavior is opposed to a low-level analysis of the code
(instruction/loop counting/weighting, cache accesses, etc.)

Drawbacks:
• The worst-case is not the average-case
• Asymptotic complexity does not consider small inputs (think to insertion sort)
• The hidden constant can be relevant in practice
• Asymptotic complexity does not consider instructions cost and hardware details
13/50
Asymptotic Complexity 2/2

Be aware that only real-world problems with a small asymptotic complexity or small
size can be solved in a “user” acceptable time
Three examples:

• Sorting: O (n log n), try to sort an array of some billion elements

• Diameter of a (sparse) graph: O V 2 , just for graphs with a few hundred

thousand vertices it becomes impractical without advanced techniques

• Matrix multiplication: O N 3 , even for small sizes N (e.g. 8K, 16K), it requires

special accelerators (e.g. GPU, TPU, etc.) for achieving acceptable performance

14/50
Time-Memory Trade-off

The time-memory trade-off is a way of solving a problem or calculation in less time

by using more storage space (less often the opposite direction)

Examples:

• Memoization (e.g. used in dynamic programming): returning the cached result

when the same inputs occur again
• Hash table: number of entries vs. efficiency
• Lookup tables: precomputed data instead branches
• Uncompressed data: bitmap image vs. jpeg

15/50
Developing Cycle 1/3

“If you’re not writing a program, don’t use a programming language”

Leslie Lamport, Turing Award

“First solve the problem, then write the code”

“Inside every large program is an algorithm trying to get out”

Tony Hoare, Turing Award

“Premature optimization is the root of all evil”

Donald Knuth, Turing Award

“Code for correctness first, then optimize!”

16/50
Developing Cycle 2/3

17/50
Developing Cycle 3/3

• One of the most important phase of the optimization cycle is the

application profiling for finding regions of code that are critical for
performance (hotspot)
→ Expensive code region (absolute)
→ Code regions executed many times (cumulative)

• Most of the times, there is no the perfect algorithm for all cases (e.g.
insertion, merge, radix sort). Optimizing also refers in finding the correct
heuristics for different program inputs/platforms instead of modifying the
existing code

18/50
Ahmdal’s Law 1/3

Ahmdal’s Law
The Ahmdal’s law expresses the maximum improvement possible by improving a
particular part of a system

Observation: The performance of any system is constrained by the speed of the

slowest point
S : improvement factor expressed as a factor of P

19/50
Ahmdal’s Law 2/3

1
Overall Improvement =
P
(1 − P) +
S

P\S 25% 50% 75% 2x 3x 4x 5x 10x ∞

10% 1.02x 1.03x 1.04x 1.05x 1.07x 1.08x 1.09x 1.10x 1.11x
20% 1.04x 1.07x 1.09x 1.11x 1.15x 1.18x 1.19x 1.22x 1.25x
30% 1.06x 1.11x 1.15x 1.18x 1.25x 1.29x 1.31x 1.37x 1.49x
40% 1.09x 1.15x 1.20x 1.25x 1.36x 1.43x 1.47x 1.56x 1.67x
50% 1.11x 1.20x 1.27x 1.33x 1.50x 1.60x 1.66x 1.82x 2.00x
60% 1.37x 1.25x 1.35x 1.43x 1.67x 1.82x 1.92x 2.17x 2.50x
70% 1.16x 1.30x 1.43x 1.54x 1.88x 2.10x 2.27x 2.70x 3.33x
80% 1.19x 1.36x 1.52x 1.67x 2.14x 2.50x 2.78x 3.57x 5.00x
90% 1.22x 1.43x 1.63x 1.82x 2.50x 3.08x 3.57x 5.26x 10.00x 20/50
Ahmdal’s Law 3/3

note: s is the portion of the system that cannot be improved 21/50

Throughput, Bandwidth, Latency

The throughput is the rate at which operations are performed

Peak throughput:
(CPU speed in Hz) x (CPU instructions per cycle) x
(number of CPU cores) x (number of CPUs per node)
NOTE: modern processors have more than one computation unit

The memory bandwidth is the amount of data that can be loaded from or stored into
a particular memory space
Peak bandwidth:
(Frequency in Hz) x (Bus width in bit / 8) x (Pump rate, memory type multiplier)

The latency is the amount of time needed for an operation to complete

22/50
Performance Bounds 1/2

The performance of a program is bounded by one or more aspects of its computation.

This is also strictly related to the underlying hardware

• Memory-bound. The program spends its time primarily in performing memory

accesses. The performance is limited by the memory bandwidth (rarely
memory-bound also refers to the amount of memory available)

• Compute-bound (Math-bound). The program spends its time primarily in

computing arithmetic instructions. The performance is limited by the speed of the
CPU

23/50
Performance Bounds 2/2

• Latency-bound. The program spends its time primarily in waiting the data are
ready (instruction/memory dependencies). The performance is limited by the
latency of the CPU/memory

• I/O Bound. The program spends its time primarily in performing I/O operations
(network, user input, storage, etc.). The performance is limited by the speed of
the I/O subsystem

24/50
Arithmetic Intensity 1/2

Arithmetic Intensity
Arithmetic/Operational Intensity is the ratio of total operations to total data
movement (bytes or words)

The naive matrix multiplication algorithm requires N 3 · 2 floating-point operations*

(multiplication + addition), while it involves N 2 · 4B · 3 data movement

25/50
* What Is a Flop?
Arithmetic Intensity 2/2

ops 2n3 n
R= = =
bytes 12n 2 6
which means that for every byte accessed, the algorithm performs n
6 operations →
compute-bound

N Operations Data Movement Ratio Exec. Time

512 268 · 106 3 MB 85 2 ms
1024 2 · 109 12 MB 170 21 ms
2048 17 · 109 50 MB 341 170 ms
4096 137 · 109 201 MB 682 1.3 s
8192 1 · 1012 806 MB 1365 11 s
16384 9 · 1012 3 GB 2730 90 s

A modern CPU performs 100 GFlops, and has about 50 GB/s memory bandwidth 26/50
Basic Architecture
Concepts
Instruction-Level Parallelism (ILP) 1/3

Modern processor architectures are deeply pipelined → superscalar processor

Instruction-Level Parallelism (ILP) is a measure of how many instructions in a
computer program can be executed simultaneously by issuing independent instructions
in sequence (out-of-order )
Instruction pipelining is a technique for implementing ILP within a single processor

27/50
Instruction-Level Parallelism (ILP) 2/3

Pipeline
Microarchitecture
stages
Core 14
Bonnell 16
Sandy Bridge 14
Silvermont 14 to 17
Haswell 14
Skylake 14
Kabylake 14

The pipeline efficiency is affected by

• Instruction stalls, e.g. cache miss, an execution unit not available, etc.
• Bad speculation, branch misprediction
28/50
Instruction-Level Parallelism (ILP) 3/3

for (int i = 0; i < N; i++) // with no optimizations, the loop

C[i] = A[i] * B[i]; // is executed in sequence

can be rewritten as:

for (int i = 0; i < N; i += 4) { // four independent multiplications
C[i] = A[i] * B[i]; // per iteration
C[i + 1] = A[i + 1] * B[i + 1]; // A, B, C are not alias
C[i + 2] = A[i + 2] * B[i + 2];
C[i + 3] = A[i + 3] * B[i + 3];
}

29/50
ILP and Little’s Law

The Little’s Law expresses the relation between latency and throughput. The
throughput of a system λ is equal to the number of elements in the system divided by
the average time spent (latency ) W for each element in the system:
L
L = λW → λ =
W
• L: average number of customers in a store
• λ: arrival rate (throughput)
• W : average time spent (latency )

30/50
Data-Level Parallelism (DLP)

Data-Level Parallelism (DLP) refers to the execution of the same operation on

multiple data in parallel
Vector processors or array processors provide SIMD (Single Instruction-Multiple Data)
or vector instructions for exploiting data-level parallelism

The popular vector instruction sets are:

MMX MultiMedia eXtension. 80-bit width (Intel, AMD)
SSE (SSE2, SSE3, SSE4) Streaming SIMD Extensions. 128-bit width (Intel, AMD)
AVX (AVX, AVX2, AVX-512) Advanced Vector Extensions. 512-bit width (Intel, AMD)
NEON Media Processing Engine. 128-bit width (ARM)
SVE (SVE, SVE2) Scalable Vector Extension. 128-2048 bit width (ARM)
31/50
Thread-Level Parallelism (TLP)

A thread is a single sequential execution flow within a program with its state
(instructions, data, PC, register state, and so on)

Thread-level parallelism (TLP) refers to the execution of separate computation

“thread” on different processing units (e.g. CPU cores)

32/50
Single Instruction Multiple Threads (SIMT)

An alternative approach to the classical data-level parallelism is Single Instruction

Multiple Threads (SIMT), where multiple threads execute the same instruction
simultaneously, with each thread operating on different data.
GPUs are successful examples of SIMT architectures.

SIMT can be thought of as an evolution of SIMD (Single Instruction Multiple Data).

SIMD requires that all data processed by the instruction be of the same type and
requires no dependencies or inter-thread communication. On the other hand, SIMT is
more flexible and does not have these restrictions. Each thread has access to its own
memory and can operate independently.

33/50
RISC, CISC Instruction Sets

The Instruction Set Architecture (ISA) is an abstract model of the CPU to

represent its behavior. It consists of addressing modes, instructions, data types,
registers, memory architecture, interrupt, etc.
It does not defined how an instruction is processed

The microarchitecture (µarch) is the implementation of an ISA which includes

pipelines, caches, etc.

34/50
CISC

Complex Instruction Set Computer (CISC)

• Complex instructions for special tasks even if used infrequently
• Assembly instructions follow software. Little compiler effort for translating
high-level language into assembly
• Initially designed for saving cost of computer memory and disk storage (1960)
• High number of instructions with different size
• Instructions require complex micro-ops decoding (translation) for exploiting ILP
• Multiple low-level instructions per clock but with high latency
Hardware implications
• High number of transistors
• Extra logic for decoding. Heat dissipation
• Hard to scale 35/50
RISC

Reduced Instruction Set Computer (RISC)

• Simple instructions
• Small number of instructions with fixed size
• 1 clock per instruction
• Assembly instructions does not follow software
• No instruction decoding
Hardware implications
• High ILP, easy to schedule
• Small number of transistors
• Little power consumption
• Easy to scale
36/50
Instruction Set Comparison

x86 Instruction set ARM Instruction set

MOV AX, 15; AH = 00, AL = 0Fh MOV R3, # 10
AAA; AH = 01, AL = 05 AND R2, R0, # 0xF
RET CMP R2, R3
IT LT
BLT elsebranch
ADD R2. # 6
ADD R1. #1
elsebranch:
END

37/50
ARM vs x86: What’s the difference between the two processor architectures?
CISC vs. RISC

• Hardware market:
- RISC (ARM, IBM): Qualcomm Snapdragon, Amazon Graviton, Nvidia Grace,
Nintendo Switch, Fujitsu Fukaku, Apple M1, Apple Iphone/Ipod/Mac, Tesla
Full Self-Driving Chip, PowerPC
- CISC (Intel, AMD): all x86-64 processors

• Software market:
- RISC : Android, Linux, Apple OS, Windows
- CISC : Windows, Linux

• Power consumption:
- CISC : Intel i5 10th Generation: 64W
- RISC : Arm-based smartphone < 5W
38/50
ARM Quote

“Incidentally, the first ARM1 chips required so little power, when the
first one from the factory was plugged into the development system to
test it, the microprocessor immediately sprung to life by drawing current
from the IO interface – before its own power supply could be properly
connected.”

Happy birthday, ARM1. It is 35 years since Britain’s Acorn RISC Machine chip
39/50
sipped power for the first time
Memory Hierarchy
The Von Neumann Bottleneck

Access to memory dominates other costs in a processor

40/50
The Von Neumann Bottleneck

The efficiency of computer architectures is limited by the Memory Wall

problem, namely the memory is the slowest part of the system

Moving data to and from main memory consumes the vast majority of time and
energy of the system

41/50
Memory Hierarchy

42/50
Memory Hierarchy 1/3

Modern architectures rely on complex memory hierarchy (primary memory, caches,

registers, scratchpad memory, etc.). Each level has different characteristics and
constrains (size, latency, bandwidth, concurrent accesses, etc.)

1 byte of RAM (1946) IBM 5MB hard drive (1956)

43/50
twitter.com/MIT CSAIL
Memory Hierarchy 2/3

Source:
“Accelerating Linear Algebra on Small Matrices from Batched BLAS to Large Scale Solvers”,
ICL, University of Tennessee 44/50
Memory Hierarchy 3/3

Intel Alder Lake 12th-gen Core-i9-12900k (Q1’21) + DDR4-3733 example:

Latency Bandwidth
Hierarchy level Size Latency Bandwidth
Ratio Ratio

L1 cache 192 KB 1 ns 1.0x 1,600 GB/s 1.0x

L2 cache 1.5 MB 3 ns 3x 1,200 GB/s 1.3x

L3 cache 12 MB 6 - 20 ns 6-20x 900 GB/s 1.7x

DRAM / 50 - 90 ns 50-90x 80 GB/s 20x

SDD Disk (swap) / 70µs 105 x 2 GB/s 800x

HDD Disk (swap) / 10 ms 107 x 2 GB/s 800x

• en.wikichip.org/wiki/WikiChip
45/50
• Memory Bandwidth Napkin Math
Memory Hierarchy Concepts 1/4

A cache is a small and fast memory located close to the processor that stores
frequently used instructions and data. It is part of the processor package and takes 40
to 60 percent of the chip area
Characteristics and content:
Registers Program counter (PC), General purpose registers, Instruction Register
(IR), etc.
L1 Cache Instruction cache and data cache, private/exclusive per CPU core,
located on-chip
L2 Cache Private/exclusive per single CPU core or a cluster of cores, located
off-chip
L3 Cache Shared between all cores and located off-chip (e.g. motherboard), up to
46/50
128/256MB
Memory Hierarchy Concepts 2/4

47/50
Memory Hierarchy Concepts 3/4

A cache line or cache block is the unit of data transfer between the cache and main
memory, namely the memory is loaded at the granularity of a cache line
The typical size of the cache line is 64 bytes. A cache line can be further organized in
banks or sectors
Cache access type:

Hot Closest-processor cached, L1

Warm L2 or L3 caches

Cold First load, cache empty

48/50
Memory Hierarchy Concepts 4/4

• A cache hit occurs when a requested data is successfully found in the cache
memory

• The cache hit rate is the number of cache hits divided by the number of memory
requests

• A cache miss occurs when a requested data is not found in the cache memory

• The miss penalty refers to the extra time required to load the data into cache
from the main memory when a cache miss occurs

• A page fault occurs when a requested data is in the process address space, but it
is not currently located in the main memory (swap/pagefile)

• Page thrashing occurs when page faults are frequent and the OS spends
significant time to swap data in and out the physical RAM 49/50
Memory Locality

• Spatial Locality refers to the use of data elements within

relatively close storage locations e.g. scan arrays in increasing order, matrices by
row. It involves mechanisms such as memory prefetching and access granularity
When spatial locality is low, many words in the cache line are not used

• Temporal Locality refers to the reuse of the same data within a relatively
small time duration, and, as consequence, exploit lower levels of the memory
hierarchy (caches), e.g. multiple sparse accesses
Heavily used memory locations can be accessed more quickly than less heavily
used locations

50/50

Mobis Isir Manual
No ratings yet
Mobis Isir Manual
17 pages
Chapter 1: Computer Abstractions and Technology
No ratings yet
Chapter 1: Computer Abstractions and Technology
50 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
253 pages
There's Plenty of Room at The Top: What Will Drive Computer Performance After Moore's Law?
No ratings yet
There's Plenty of Room at The Top: What Will Drive Computer Performance After Moore's Law?
7 pages
Week 5 - The Impact of Multi-Core Computing On Computational Optimization
No ratings yet
Week 5 - The Impact of Multi-Core Computing On Computational Optimization
11 pages
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
No ratings yet
Simultaneous Multithreading: Pratyusa Manadhata, Vyas Sekar (Pratyus, Vyass) @cs - Cmu.edu
4 pages
Onur 447 Spring15 Lecture2 Isa Afterlecture
No ratings yet
Onur 447 Spring15 Lecture2 Isa Afterlecture
57 pages
Analysis of Low Power High Speed Design of Multipliers in CMOS Technologies
No ratings yet
Analysis of Low Power High Speed Design of Multipliers in CMOS Technologies
4 pages
PC 1
No ratings yet
PC 1
53 pages
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
No ratings yet
MICROCONTROLLERS AND MICROPROCESSORS SYSTEMS DESIGN - Chapter
12 pages
Lecture1 Notes
No ratings yet
Lecture1 Notes
19 pages
Research Paper On Significance of Code O
No ratings yet
Research Paper On Significance of Code O
7 pages
Chapter01-OSedition7Final
No ratings yet
Chapter01-OSedition7Final
62 pages
Unit I-Basic Structure of A Computer: System
No ratings yet
Unit I-Basic Structure of A Computer: System
64 pages
Parallel Computing Seminar Report
100% (3)
Parallel Computing Seminar Report
35 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
BacaanMinggu2 - Terminologi Komputasi Paralel
No ratings yet
BacaanMinggu2 - Terminologi Komputasi Paralel
38 pages
Lecture 7 - The CPU (Part 2)
No ratings yet
Lecture 7 - The CPU (Part 2)
29 pages
LECTURE 5
No ratings yet
LECTURE 5
21 pages
CA Classes-41-45
No ratings yet
CA Classes-41-45
5 pages
CSE 820 Graduate Computer Architecture: Dr. Enbody
No ratings yet
CSE 820 Graduate Computer Architecture: Dr. Enbody
25 pages
Cs1304-Computer Architecture Department of Cse & It
No ratings yet
Cs1304-Computer Architecture Department of Cse & It
105 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
Cs3691-Unit 2
No ratings yet
Cs3691-Unit 2
20 pages
L1.0 HPC Overview
No ratings yet
L1.0 HPC Overview
58 pages
Realtime Operating Systems For Embedded Computing
No ratings yet
Realtime Operating Systems For Embedded Computing
5 pages
10.1.1.55.3223
No ratings yet
10.1.1.55.3223
6 pages
Chapter 2
No ratings yet
Chapter 2
28 pages
Analysis of Algorithms
No ratings yet
Analysis of Algorithms
17 pages
February 22, 2010
No ratings yet
February 22, 2010
53 pages
L3 - Fundamentals of Algo Problem Solving
No ratings yet
L3 - Fundamentals of Algo Problem Solving
17 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
Unit 1
No ratings yet
Unit 1
54 pages
Computer Architecture
No ratings yet
Computer Architecture
56 pages
CA Lecture 1
No ratings yet
CA Lecture 1
28 pages
Operating System
No ratings yet
Operating System
15 pages
Lecture 1 8405 Computer Architecture
No ratings yet
Lecture 1 8405 Computer Architecture
15 pages
Instructor: L. N. Bhuyan
No ratings yet
Instructor: L. N. Bhuyan
32 pages
Abrar HHG Gjugh KGJGH
No ratings yet
Abrar HHG Gjugh KGJGH
23 pages
Computer Architecture
No ratings yet
Computer Architecture
3 pages
ACSA1-Introduction
No ratings yet
ACSA1-Introduction
33 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Introduction To More Than MOORE (Co-Design With System)
No ratings yet
Introduction To More Than MOORE (Co-Design With System)
7 pages
20231130_IntroductionToAISystems
No ratings yet
20231130_IntroductionToAISystems
29 pages
Report On Mpsoc'04: Students' Summary of Lectures Xi Chen
No ratings yet
Report On Mpsoc'04: Students' Summary of Lectures Xi Chen
45 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
Computer Performance Enhancing Techniques - Session-2
100% (1)
Computer Performance Enhancing Techniques - Session-2
36 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Computer System: Operating Systems: Internals and Design Principles
No ratings yet
Computer System: Operating Systems: Internals and Design Principles
62 pages
BY Arun Kumar D: 1MV06MCA07
No ratings yet
BY Arun Kumar D: 1MV06MCA07
18 pages
Parallel Processor Computing Unit 1
No ratings yet
Parallel Processor Computing Unit 1
10 pages
apr08certcntreport
No ratings yet
apr08certcntreport
13 pages
Parallel Computing Varun Patial
No ratings yet
Parallel Computing Varun Patial
41 pages
ΕCE 338 Parallel Computer Architecture Spring 2022
No ratings yet
ΕCE 338 Parallel Computer Architecture Spring 2022
44 pages
Fix Common Failures
From Everand
Fix Common Failures
Mei Gates
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Digital Electronics, Computer Architecture and Microprocessor Design Principles
From Everand
Digital Electronics, Computer Architecture and Microprocessor Design Principles
Jagdish Krishanlal Arora
No ratings yet
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
10.templates II
No ratings yet
10.templates II
73 pages
12.translation Units II
No ratings yet
12.translation Units II
59 pages
15.ecosystem II
No ratings yet
15.ecosystem II
47 pages
19.advanced Topics II
No ratings yet
19.advanced Topics II
70 pages
18.advanced Topics I
No ratings yet
18.advanced Topics I
64 pages
Supp Exams TT
No ratings yet
Supp Exams TT
18 pages
General MT SWIFT
No ratings yet
General MT SWIFT
91 pages
Numerical Analysis Introduction
No ratings yet
Numerical Analysis Introduction
33 pages
Course On VBA
No ratings yet
Course On VBA
44 pages
Registration System
100% (2)
Registration System
35 pages
Datasheet 579314 (59-8010-A0) en 120V 60Hz
No ratings yet
Datasheet 579314 (59-8010-A0) en 120V 60Hz
89 pages
Business Environment of Reliance Communication .
100% (1)
Business Environment of Reliance Communication .
25 pages
Philips FWM-996
No ratings yet
Philips FWM-996
37 pages
Over 5 Hours: Ubuntu Linux Mint Elementary Tails
67% (3)
Over 5 Hours: Ubuntu Linux Mint Elementary Tails
10 pages
Win CC
No ratings yet
Win CC
14 pages
Huda Nur Aziz 28 36
No ratings yet
Huda Nur Aziz 28 36
10 pages
Index-Hazard Analysis Techniques For System Safety - Ericson - Wiley Online Library
No ratings yet
Index-Hazard Analysis Techniques For System Safety - Ericson - Wiley Online Library
4 pages
Sewerage and Drainage System For Murree City (PC-II)
No ratings yet
Sewerage and Drainage System For Murree City (PC-II)
13 pages
Constant and Variable Types
No ratings yet
Constant and Variable Types
5 pages
STEP 7 V18 Orderlist
No ratings yet
STEP 7 V18 Orderlist
5 pages
07 Telemetry Fundamentals and Practices
No ratings yet
07 Telemetry Fundamentals and Practices
52 pages
SCBA Clerk - Supreme Court of India
No ratings yet
SCBA Clerk - Supreme Court of India
2 pages
Communication Lab Matlab-Zeytinoglu
No ratings yet
Communication Lab Matlab-Zeytinoglu
114 pages
Unit-4 SEngineering
No ratings yet
Unit-4 SEngineering
11 pages
Interview 2-Weebly1
No ratings yet
Interview 2-Weebly1
10 pages
Susan Tang Resume 20240228
No ratings yet
Susan Tang Resume 20240228
1 page
Section 11 - Project Management System - M
No ratings yet
Section 11 - Project Management System - M
9 pages
AI For Subject Matter Experts
No ratings yet
AI For Subject Matter Experts
2 pages
Project Report1
No ratings yet
Project Report1
49 pages
Serials Win10 Prof
No ratings yet
Serials Win10 Prof
2 pages
Difference Between Primary and Secondary Memory Gate Notes 35
No ratings yet
Difference Between Primary and Secondary Memory Gate Notes 35
3 pages
Class XII-Maxima and Minima
No ratings yet
Class XII-Maxima and Minima
33 pages
Presentation On Management Information System
No ratings yet
Presentation On Management Information System
24 pages
Library Management: Article Information
No ratings yet
Library Management: Article Information
14 pages