Open navigation menu

Scribd

0% found this document useful (0 votes)

16 views

Parallel Distributed Computing

Chapter 2

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Parallel Distributed Computing

Chapter 2

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

GeForce 7800 GTX

Board Details
SLI Connector Single slot cooling

sVideo
TV Out

DVI x 2

256MB/256-bit DDR3
600 MHz
16x PCI-Express 8 pieces of 8Mx32
GeForce 7800 GTX
GPU Details
302 million transistors
430 MHz core clock
256-bit memory interface

Notable Functionality
• Non-power-of-two textures with mipmaps
• Floating-point (fp16) blending and filtering
• sRGB color space texture filtering and
frame buffer blending
• Vertex textures
• 16x anisotropic texture filtering
• Dynamic vertex and fragment branching
• Double-rate depth/stencil-only rendering
• Early depth/stencil culling
• Transparency antialiasing
GeForce 7800 GTX
Parallelism
8 Vertex Engines

Z-Cull Triangle Setup/Raster

Shader Instruction Dispatch 24 Fragment Shaders

Fragment Crossbar 16 Raster Operation Pipelines

Memory Memory Memory Memory

Partition Partition Partition Partition
GeForce Graphics Pipeline

Separate dedicated units

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Vertex Engine
Vertex pulling
Vector floating-point instructions
Dynamic branching
Vertex texture
Vertex stream frequency

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Setup
Prepare triangle for
rasterization
215M triangles/sec setup

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Raster
Compute coverage
Points, lines, and triangles
Rotated grid multisampling

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Z Cull

Discard fragments early based on Z

Up to 64 pixels/clock
Multisampled: 256 samples/clock

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Fragment Shader
User-programmed fragment coloring
Dynamic branching
Long shaders
Multiple render targets
fp16 and fp32 vectors

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Texture
fp16 and sRGB filtering
16x anisotropic filtering
Non-power-of-two mipmapping
Shadow maps, cube maps, and 3D
Floating-point textures

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
GeForce Graphics Pipeline
Texture
2x and 4x multisampling
fp16 and sRGB blending
Multiple render targets
Color and depth compression
Double-speed depth/stencil only

Vertex Fragment Raster Frame

CPU Engine Setup Raster Shader Ops Buffer

Z Cull Texture
Single GeForce 7800
Vertex Unit
Primitive Assembly + Vertex Processing Engine
Attribute Processing • MIMD Architecture
• Dual Issue
• Low-penalty branching
• Shader Model 3.0
• 32 vector registers
Vertex FP32 FP32
• 512 static instructions per
Texture Scalar Vector
Fetch Unit Unit
program
• Indexed input and output
registers

Texture Branch
Vertex Texture Fetch
Cache Unit
• Non-stalling
• Up to 4 texture units
Viewport Processing • Unlimited fetches
• Mipmapping, no filtering

To Setup
Vertex Texturing Example

Vertex
Program

Flat tessellated mesh Displaced mesh

Height field
texture
Vertex Textures for Dynamic
Displacement Mapping

Without Vertex Textures With Vertex Textures

Images used with permission from Pacific Fighters. © 2004 Developed by 1C:Maddox Games.
All rights reserved. © 2004 Ubi Soft Entertainment.
Vertex Textures to Drive
Particle Systems
◼ Render-to-texture
 Simulation runs
in floating-point
frame buffer, also
usable as texture
◼ Vertex textures
 Determines particle
location with
vertex texture
fetch
Single GeForce 7800
Fragment Shader Pipeline
Texture Input Fragment Texture Processor
Data Data
16 texture units
1 texture fetch at full speed
Bilinear or tri-linear filtering
FP32 16x anisotropic filtering
Texture
Shader
Processor Floating-point (fp16) texture filtering
Unit 1
Shader Unit 1
FP32 4 MULs + RCP
Texture Dual Issue
Shader
Cache
Unit 2 Texture address calculation
Fast fp16 normalize
Branch Free: negate, abs, condition codes
Processor
Shader Unit 2
Output 4 MADs or DP4
Fixed-function
Shaded Dual Issue
Fog Unit
Fragments
Free: negate, abs, condition codes
Operations Per Fragment
Shader Pass

Shader 4 Components 1 Texture /

Unit 1 1 Op / component
or fragment at full
4 ops / fragment
per pass speed per pass
Texture

Shader 4 Components
1 Op / component
Unit 2 4 Ops / fragment
per pass

8 Operations / fragment per pass

Fragment Shader
Component Co-issue
◼ Use 4 components various ways
 RGBA all together
 RGB and A
 RG and GB
Shader
◼ Both shader units Unit 1 R G B A

◼ Two operations Operation 1 Operation 2

per shader unit

Shader
Unit 2 R G B A

Operation 3 Operation 4
Single GeForce 7800
Raster Operations Pipeline
Input
Shaded Pixel Crossbar
Fragment Interconnect Functionality
Data
• OpenEXR
Multisample Antialiasing floating-point
blending
• sRGB
Depth Color blending
Compression Compression • 4x rotated grid
multisampling
Depth Color • Lossless color
Raster Raster and depth
Operations Operations compression
• Multiple
render targets
Memory Frame Buffer Partition
GeForce 7800
Transparency Antialiasing

Conventional 4x antialiasing Transparency antialiasing

with alpha tested context with alpha tested context
Scalable Link Interface (SLI)

◼ Gang two GeForce 6600, 6800, or 7800

graphics boards together
 Can almost double your performance

SLI
Connector

Two 6800 Ultras

pictured
SLI Rendering Modes
◼ Split Frame Rendering (SFR)
 One GPU renders top of screen; other renders the bottom
 Scales fragment processing but not vertex processing
◼ Alternate Frame Rendering (AFR)
 Scales both vertex and fragment processing
 Adds frame latency
 Rendering must be free of CPU synchronization
◼ SLI Antialiasing: SLI8x and SLI16x
 Better antialiasing quality rather than performance
 Each card renders with slightly different sub-pixel offset
Current High-end “Fermi” GPU
◼ Current high-end graphics card
◼ 512 graphics “cores”
◼ 1.5Gb memory
◼ System power: 600W
◼ OpenGL 4.2 / DirectX 11
functionality
High-level “Fermi” Architecture
◼ GF100
◼ Four Graphics
Processor
Clusters (GPCs)
 Each is self-
contained
graphics
pipeline
 Smaller chips
have fewer
GPcs
◼ Shared L2 cache
◼ 6 Memory
Controllers
 1.5 Gb
Inside Each
Graphics Processing Cluster
◼ Raster engine
◼ Four SMs
 Streaming
Multiprocessor
◼ Texture fetch
resources
◼ Tessellation and
vertex
processing
resources
 Polymorph
Engine
Streaming
Multiprocessor (SM)
◼ Multi-processor
execution unit
 32 scalar processor
cores
 Warp is a unit of
thread execution of up
to 32 threads
◼ Two workloads
 Graphics
◼ Vertex shader
◼ Tessellation
◼ Geometry shader
◼ Fragment shader
 Compute
OpenGL Pipeline Programmable
Domains run on Unified Hardware
◼ Unified Streaming Processor Array (SPA) architecture
means same capabilities for all domains
 Plus tessellation + compute (not shown below)

,
GPU Vertex Primitive Clipping, Setup,
Raster
Front End Assembly Assembly and Rasterization Operations

Can be Vertex Primitive Fragment

unified Program Program Program

hardware!
Attribute Fetch Parameter Buffer Read Texture Fetch Framebuffer Access

Memory Interface
Dual Warp Scheduling

32 threads launch!
Shader or CUDA Core,
Same Unit but Two Personalities
◼ Execution unit
 Scalar floating-point
 Scalar integer
Levels of Caching in Fermi GPU
◼ 12 KB L1 Texture cache
 Per texture unit
◼ SM 64 K cache
 Split into dedicated 16K or 48K
Load/Store cache
 Shared memory 48K or 16K

◼ L2 unifies texture cache, raster

operation cache, and internal
buffering in prior generation
 768 K
 Read / write
 Fully coherent
Cache Use Strategies
in Fermi GPU
◼ Pipeline stages can communicate efficiently through
GPU’s L1 and L2 caches
 Buffering between stages stays all on chip
 Only vertex, texel, and pixel read/writes need to go to DRAM
Vertex and Tessellation
Processing Tasks
◼ Fixed-function graphics engines
 Pull attributes and assemble vertex
 Manage tessellation control and domain shader evaluation
 Viewport transform
 Attribute setup of plane equations for rasterization
 Stream out vertices into buffers
Rasterization Tasks
◼ Turns primitives into fragments
 Computes edge equations
 Two-stage rasterization
◼ Coarse raster finds tiles the primitive could be in
◼ Fine raster evaluates sample positions within tiles
 Zcull efficiently eliminates occluded fragments
Base
Input Mesh Input Mesh

From Metro 2033, © THQ and 4A Games

Apply Phong Tessellation

From Metro 2033, © THQ and 4A Games

Add
Apply Displacement
Displacement Mapping Mapping

From Metro 2033, © THQ and 4A Games

GPUs as Compute Nodes
◼ Architecture of GPU has evolved into a high-
performance, high-bandwidth compute node

Small form factor

Compute

Integrated CPU-GPU OEM CPU Server + Workstations

Servers & Blades Compute 1U 2 to 4 Tesla
GPUs
Compute Programming Model
◼ Cooperative Thread Array (CTA)
 Single Program, Multiple Data
 Organized in 1D, 2D, or 3D

◼ Programming APIs
 CUDA, OpenCL, DirectCompute
◼ APIs + language = parallel processing system
 OpenGL or Direct3D through shaders
◼ Cg, HLSL, GLSL

You might also like

Unreal Engine Graphics & Rendering
100% (1)
Unreal Engine Graphics & Rendering
32 pages
33 Milliseconds - Public With Notes
100% (1)
33 Milliseconds - Public With Notes
38 pages
TDCI Arch
No ratings yet
TDCI Arch
77 pages
Architectural Details of Tesla GPU Microarchitecture
No ratings yet
Architectural Details of Tesla GPU Microarchitecture
9 pages
Graphics Performance Optimization
No ratings yet
Graphics Performance Optimization
44 pages
Vertex & Pixel Shaders: CPS124 - Computer Graphics
No ratings yet
Vertex & Pixel Shaders: CPS124 - Computer Graphics
11 pages
El Mansouri Jalal Rendering Rainbow Six PDF
No ratings yet
El Mansouri Jalal Rendering Rainbow Six PDF
82 pages
GPU Fundamentals
No ratings yet
GPU Fundamentals
21 pages
Deferred Shading Optimizations
No ratings yet
Deferred Shading Optimizations
40 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
Graphics Card:: FPS or Frames Per Second
No ratings yet
Graphics Card:: FPS or Frames Per Second
10 pages
Aaltonen Sebastian GPU Based Clay
No ratings yet
Aaltonen Sebastian GPU Based Clay
70 pages
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
No ratings yet
The End of The Gpu Roadmap: Tim Sweeney CEO, Founder Epic Games
74 pages
Presentation Prepared by Saatwik Kumar 1101219423 ETC, ET-2
No ratings yet
Presentation Prepared by Saatwik Kumar 1101219423 ETC, ET-2
18 pages
Unite 2011 Shadowgun Optimisation
No ratings yet
Unite 2011 Shadowgun Optimisation
41 pages
How A GPU Works - Kayvon Fatahalian
No ratings yet
How A GPU Works - Kayvon Fatahalian
87 pages
How A GPU Works: Kayvon Fatahalian 15-462 (Fall 2011)
No ratings yet
How A GPU Works: Kayvon Fatahalian 15-462 (Fall 2011)
87 pages
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
No ratings yet
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
39 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Bringing High-End Graphics To Handheld Devices
No ratings yet
Bringing High-End Graphics To Handheld Devices
28 pages
Immediate download GPU Gems 2 Programming Techniques for High Performance Graphics and General Purpose Computation 1st Edition Matt Pharr ebooks 2024
100% (8)
Immediate download GPU Gems 2 Programming Techniques for High Performance Graphics and General Purpose Computation 1st Edition Matt Pharr ebooks 2024
85 pages
Graphic Technology: Presentation On
No ratings yet
Graphic Technology: Presentation On
50 pages
Implementing Low Level GPU Hans Kristian Munich 2019
No ratings yet
Implementing Low Level GPU Hans Kristian Munich 2019
44 pages
GPU 01.intro
No ratings yet
GPU 01.intro
36 pages
Modern GPU Architecture
No ratings yet
Modern GPU Architecture
93 pages
FidelityFX FSR Overview Integration
No ratings yet
FidelityFX FSR Overview Integration
37 pages
3-1
No ratings yet
3-1
35 pages
GPU
No ratings yet
GPU
17 pages
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
No ratings yet
Z-Buffer Optimizations: Patrick Cozzi Analytical Graphics, Inc
37 pages
Object Space Lighting Rev 21
No ratings yet
Object Space Lighting Rev 21
62 pages
GDC2003 PhysSimOnGPUs
No ratings yet
GDC2003 PhysSimOnGPUs
66 pages
Valient Killzone Shadow Fall Demo Postmortem
No ratings yet
Valient Killzone Shadow Fall Demo Postmortem
103 pages
Grass Using DirectX11 in Port
No ratings yet
Grass Using DirectX11 in Port
31 pages
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
No ratings yet
Amd 2018 Porting To Vulkan dx12 Adam Sawicki
45 pages
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
No ratings yet
Introduction To Graphics Hardware and Gpus Introduction To Graphics Hardware and Gpus
22 pages
Introduction To GPU Architecture: © 2006 University of Central Florida
100% (1)
Introduction To GPU Architecture: © 2006 University of Central Florida
41 pages
Direct3D 11 Computer Shader More Generality For Advanced Techniques
No ratings yet
Direct3D 11 Computer Shader More Generality For Advanced Techniques
54 pages
Mset Rendering April29 2014
No ratings yet
Mset Rendering April29 2014
41 pages
IEEEMicro Ge Force 6800
No ratings yet
IEEEMicro Ge Force 6800
12 pages
A Brief Overview of The Graphics Pipeline: Cedric Lee
No ratings yet
A Brief Overview of The Graphics Pipeline: Cedric Lee
33 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
22 pages
s7459 VR Rendering Improvements Featuring Autodesk Vred
No ratings yet
s7459 VR Rendering Improvements Featuring Autodesk Vred
47 pages
The Technology Behind The Elemental Demo (Unreal Engine 4)
100% (1)
The Technology Behind The Elemental Demo (Unreal Engine 4)
71 pages
Understanding The Graphics Pipeline
No ratings yet
Understanding The Graphics Pipeline
35 pages
Firaxis Lore
No ratings yet
Firaxis Lore
26 pages
Reac2023 Modern Mobile Rendering at Hypehype
No ratings yet
Reac2023 Modern Mobile Rendering at Hypehype
28 pages
GPU
No ratings yet
GPU
13 pages
Graphics Pipeline & Rasterization MIT
No ratings yet
Graphics Pipeline & Rasterization MIT
98 pages
Graphics Processing Unit (Gpu) : BY Amal Raj.R Electronics C.P.T.C
No ratings yet
Graphics Processing Unit (Gpu) : BY Amal Raj.R Electronics C.P.T.C
30 pages
GDC A Guided Tour of Blackreef
No ratings yet
GDC A Guided Tour of Blackreef
74 pages
chapter 2
No ratings yet
chapter 2
21 pages
Openglfor2015 150902085548 Lva1 App6891 PDF
No ratings yet
Openglfor2015 150902085548 Lva1 App6891 PDF
47 pages
Introduction To The Graphics Pipeline of The PS3
No ratings yet
Introduction To The Graphics Pipeline of The PS3
29 pages
Mingpu: A Minimum Gpu Library For Computer Vision: Pavel Babenko and Mubarak Shah
No ratings yet
Mingpu: A Minimum Gpu Library For Computer Vision: Pavel Babenko and Mubarak Shah
30 pages
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Shader: Exploring Visual Realms with Shader: A Journey into Computer Vision
From Everand
Shader: Exploring Visual Realms with Shader: A Journey into Computer Vision
Fouad Sabry
No ratings yet
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
From Everand
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
Rodrigo Copetti
No ratings yet
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
From Everand
Master System Architecture: Architecture of Consoles: A Practical Analysis, #15
Rodrigo Copetti
2/5 (1)
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Volume Rendering: Exploring Visual Realism in Computer Vision
From Everand
Volume Rendering: Exploring Visual Realism in Computer Vision
Fouad Sabry
No ratings yet
OS Theory
No ratings yet
OS Theory
3 pages
OS Theory
No ratings yet
OS Theory
3 pages
OS Theory
No ratings yet
OS Theory
2 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
64 pages
Parallel Distributed Computing
No ratings yet
Parallel Distributed Computing
51 pages
Reynhout:Chrx: Chromebook Unix
No ratings yet
Reynhout:Chrx: Chromebook Unix
8 pages
HT8970 Voice Echo: Features
No ratings yet
HT8970 Voice Echo: Features
9 pages
Manual Instalare Nice Toona 4024
100% (1)
Manual Instalare Nice Toona 4024
68 pages
Input Output Process: Conceptual Framework
No ratings yet
Input Output Process: Conceptual Framework
3 pages
PAC E500 English Manual (060214)
No ratings yet
PAC E500 English Manual (060214)
76 pages
Jzusabilitytest
No ratings yet
Jzusabilitytest
9 pages
The Future of Borescope Inspections.: Iso 9001:2000 Registered As9000 Compliant
No ratings yet
The Future of Borescope Inspections.: Iso 9001:2000 Registered As9000 Compliant
4 pages
LEVEL 5 25J PERFORM COMPUTER REPAIR AND MAINTENANCE
No ratings yet
LEVEL 5 25J PERFORM COMPUTER REPAIR AND MAINTENANCE
46 pages
Datasheet LCD CAR
No ratings yet
Datasheet LCD CAR
26 pages
Virtual Host Installation Guide PDF
No ratings yet
Virtual Host Installation Guide PDF
88 pages
STS When Technology and Humanity Cross
No ratings yet
STS When Technology and Humanity Cross
30 pages
API Function
No ratings yet
API Function
30 pages
WWW - Electronics-Tutorials - Ws RC RC 1
No ratings yet
WWW - Electronics-Tutorials - Ws RC RC 1
3 pages
Kernel: Cciss: Fifo Full Errors: Requirement at Wipro Technology - Posted On June 04th, 2007
No ratings yet
Kernel: Cciss: Fifo Full Errors: Requirement at Wipro Technology - Posted On June 04th, 2007
4 pages
Supplier Database Management System (CPM) 1
No ratings yet
Supplier Database Management System (CPM) 1
1 page
Xbilling Manual
No ratings yet
Xbilling Manual
3 pages
TAFJ-AS JBossInstall v5.2 EAP PDF
No ratings yet
TAFJ-AS JBossInstall v5.2 EAP PDF
33 pages
FANUC Robot M-710ic Series Axis Limit Setup
No ratings yet
FANUC Robot M-710ic Series Axis Limit Setup
11 pages
Unit 1
100% (1)
Unit 1
44 pages
Rane Ac 22b Circa 2003 Active Crossover Schematic
100% (1)
Rane Ac 22b Circa 2003 Active Crossover Schematic
3 pages
001.03 - UFP Cards, Configuration & Replacement
No ratings yet
001.03 - UFP Cards, Configuration & Replacement
54 pages
CLW
No ratings yet
CLW
26 pages
Manual Allen & Heath QU-24
No ratings yet
Manual Allen & Heath QU-24
24 pages
Just A Minute (Jam) Circuit: Govt. College of Engineering Kannur
100% (1)
Just A Minute (Jam) Circuit: Govt. College of Engineering Kannur
26 pages
Episode1Introduction ACIInitialTasksCablingtheFabric
No ratings yet
Episode1Introduction ACIInitialTasksCablingtheFabric
8 pages
Hybrid Architectures
No ratings yet
Hybrid Architectures
14 pages
Deadlock Handling
100% (3)
Deadlock Handling
3 pages
College Report of Optical Burst Switching
No ratings yet
College Report of Optical Burst Switching
21 pages
Malware Quiz
No ratings yet
Malware Quiz
5 pages
AC-LP1501 Guide14
No ratings yet
AC-LP1501 Guide14
11 pages