0% found this document useful (0 votes)
32 views

Introduction To Interactive 3D CG Note

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Introduction To Interactive 3D CG Note

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Interactive 3D Graphics

Course

Continue where you left off:

Introduction

Curriculum

Overview

Instructors

Last Updated March 7, 2022

Prerequisites:

No experience required

This course will teach you the principles of 3D computer graphics: meshes, transforms,
lighting, animation, and making interactive 3D applications run in a browser.

Introduction

First demo shown - three.js WebGL materials demo(opens in a new tab) with texture from Humus(opens
in a new tab).
Second demo - three.js WebGL materials bumpmap skin demo(opens in a new tab) with Lee Perry-Smith
head(opens in a new tab).

If you have problems seeing these demos on your browser, please check out the WebGL
Troubleshooting page(opens in a new tab). Still stuck? We'll work more on getting you set up in a few
lessons from now.

You may want to visit Eric Haines' blog - Realtime Rendering(opens in a new tab) - or Twitter feed -
@pointinpolygon(opens in a new tab).

This page(opens in a new tab) shows what browsers support WebGL.

Interactive 3D Rendering

Photographs throughout this course are from Wikimedia Commons(opens in a new tab), unless
otherwise noted. Building(opens in a new tab), XYZ(opens in a new tab), interactive(opens in a new tab).

You can try the brain program(opens in a new tab) yourself. Chrome is the best browser for it. If it
doesn’t work for you, don’t worry - the next lesson will help guide you through setting up WebGL on
your machine.

WebGL Setup

To see if your machine is set up right, try this site(opens in a new tab). If your machine doesn’t show a
spinning cube, read our help page(opens in a new tab) or go to this page(opens in a new tab) and look
under “Implementations”. For Safari on the Mac, follow these instructions(opens in a new tab).

An excellent summary of what supports WebGL is here(opens in a new tab). Some graphics cards are
blacklisted because their drivers are old, never to be updated, and won’t work with WebGL. See this
page(opens in a new tab) for more information. It’s possible to override the blacklisting in Firefox, see
this article(opens in a new tab) - this might be an acceptable “if all else fails” solution, since this course
will be using fairly vanilla WebGL features. Google Chrome has blacklisted XP, so there’s a similar
workaround (opens in a new tab). If all else fails, try different browsers, as they have different
limitations.

Interactivity and FPS


The Wikipedia page on motion blur(opens in a new tab) gives a start on the topic. You can see a little bit
of some motion blur correction in this demo(opens in a new tab), Go to view 4 (hit the 4 key) and toggle
motion blur on an off with the “b” key. The ground will be blurrier as you move when motion blur is on.

WARNING! The demo on the next page has my voice blaring out at a loud level. Be ready to turn down
your volume.

FPS and Refresh Rate

The Wikipedia page on motion blur(opens in a new tab) gives a start on the topic.

Some applications will aim to avoid a rate between 30 and 60 FPS, since then the frame rate doesn’t
align with the refresh rate. This video(opens in a new tab) explains in detail how this mismatch can
cause a sense that the interaction with the application is not smooth. That said, many games simply
strive for as fast a rate as possible.

Math Refresher

This question is simplifying the situation. The 50 Hz rate is actually the interlaced field update rate for
European TV, the frame rate is then 25 Hz. Here we're interested in the time between field refreshes at
50 Hz. If you want to learn more, see this page (opens in a new tab).

This question is simplifying the situation. The 50 Hz rate is actually the interlaced field update rate for
European TV, the frame rate is then 25 Hz. Here we're interested in the time between field refreshes at
50 Hz. If you want to learn more about this aspect of television broadcasting, see this page (opens in a
new tab). In the field of computer graphics we typically don't have interlacing, so this distinction does
not exist.

Some applications will aim to avoid a rate between 30 and 60 FPS, since then the frame rate doesn’t
align with the refresh rate. This video(opens in a new tab) explains in detail how this mismatch can
cause a sense that the interaction with the application is not smooth. That said, many games simply
strive for as fast a rate as possible.

The Eye
See the Wikipedia article on the eye(opens in a new tab) for some truly amazing facts about different
types of eyes.

Incognito(opens in a new tab) is a great book on the brain. The first part is all about how the brain
interprets what the eye sees.

Seeing Is Believing

You can find the original images for the illusion here http://persci.mit.edu/gallery/checkershadow(opens
in a new tab), along with a full explanation of how it works.

There’s also a clever video http://www.youtube.com/watch?v=z9Sen1HTu5o(opens in a new tab)


showing the effect.

I love optical illusions; my favorite sites include Michael Bach’s collection


http://www.michaelbach.de/ot/(opens in a new tab), Kitaoka’s works
http://www.ritsumei.ac.jp/~akitaoka/saishin-e.html(opens in a new tab), and these wallpapers
http://www.flickr.com/photos/w00kie/sets/180637/show/(opens in a new tab). You can also lose a day
wandering through the “Mighty Optical Illusions” blog http://www.moillusions.com/(opens in a new
tab).

Eye vs. Camera

For this question, compare the human visual system (eye and brain together) to the lens mechanism of a
camera.

Eyes are fascinating organs, especially since there are a wide range of designs. See the Wikipedia article
on the eye http://en.wikipedia.org/wiki/Eye(opens in a new tab) for some truly amazing knowledge.

Real-Time Rendering

Tracking the latest developments in interactive rendering techniques


Skip to content

HomeAbout

Do you spell these two words correctly?

We all have dumb little blind spots. As a kid, I thought “Achilles” was pronounced “a-chi-elz” and,
heaven knows how, “etiquette” was somehow “eh-teak”. When you say goofy things to other people,
someone eventually corrects you. However, if most of the people around you are making the same
mistake (I’m sorry, “nuclear” is not pronounced “new-cue-lar”, it just ain’t so), the error never gets
corrected. I’ve already mentioned the faux pas of pronouncing SIGGRAPH as “see-graph”, which seems
to be popular among non-researchers (well, admittedly there’s no “correct” pronunciation on that one,
it’s just that when the conference was small and mostly researchers that “sih-graph” was the way to say
it. If the majority now say “see-graph”, so be it – you then identify yourself as a general attendee or a
sales person and I can feel superior to you for no valid reason, thanks).

Certain spelling errors persist in computer graphics, perhaps because it’s more work to give feedback on
writing mistakes. We also see others make the same mistakes and assume they’re correct. So, here are
the two I believe are the most popular goofs in computer graphics (and I can attest that I used to make
them myself, once upon a time):

Tesselation – that’s incorrect, it’s “tessellation”. By all rules of English, this word truly should have just
one “l”: relation, violation, adulation, ululation, emulation, and on and on, they have just one “l”. The
only exceptions I could find with two “l”s were “collation”, “illation” (what the heck is that?), and a word
starting with “fe” (I don’t want this post to get filtered).

The word “tessellation” is derived from “tessella” (plural “tessellae”), which is a small piece of stone or
glass used in a mosaic. It’s the diminutive of “tessera”, which can also mean a small tablet or block used
as a ticket or token (but “tessella” is never a small ticket). Whatever. In Ionic Greek “tesseres” means
“four”, so “tessella” makes sense as being a small four-sided thing. For me, knowing that “tessella” is
from the ancient Greek word for a piece in a mosaic somehow helps me to catch my spelling of it –
maybe it will work for you. I know that in typing “tessella” in this post I still first put a single “l”
numerous times, that’s what English tells me to do.

Google test: searching on “tessellation” on Google gives 2,580,000 pages. Searching on “tesselation -
tessellation”, which gives only pages with the misspelled version, gives 1,800,000 pages. It’s nice to see
that the correct spelling still outnumbers the incorrect, but the race is on. That said, this sort of test is
accurate to within say plus or minus say 350%. If you search on “tessellation -tesselation”, which should
give a smaller number of pages (subtracting out those that I assume say “‘tesselation’ is a misspelling of
‘tessellation'” or that reference a paper with “tesselation” in the title), you get 8,450,000! How you can
get more than 3 times as many pages as just searching on “tessellation” is a mystery. Finally, searching
on “tessellation tesselation”, both words on the same page, gives 3,150,000 results. Makes me want to
go count those pages by hand. No it doesn’t.
One other place to search is the ACM Digital Library. There are 2,973 entries with “tessellation” in them,
375 with “tesselation”. To search just computer graphics publications, GRAPHBIB is a bit clunky but will
do: 89 hits for “tessellation”, 18 hits for the wrong one. Not terrible, but that’s still a solid 20% incorrect.

Frustrum – that’s incorrect, it’s “frustum” (plural “frusta”, which even looks wrong to me – I want to say
“frustra”). The word means a (finite) cone or pyramid with the tip chopped off, and we use it (always) to
mean the pyramidal volume in graphics. I don’t know why the extra “r” got into this word for some
people (myself included). Maybe it’s because the word then sort-of rhymes with itself, the “ru” from the
first part mirrored in the second. But “frustra” looks even more correct to me, no idea why. Maybe it’s
that it rolls off the tongue better.

Morgan McGuire pointed this one out to me as the most common misspelling he sees. As a professor,
he no doubt spends more time teaching about frusta than tessellations. Using the wildly-inaccurate
Google test, there are 673,000 frustum pages and 363,000 “frustrum -frustum” pages. And, confusingly,
again, 2,100,000 “frustum -frustrum” pages, more than three times as many as pages as just “frustum”.
Please explain, someone. For the digital library, 1,114 vs. 53. For GRAPHBIB I was happy to see 42 hits vs.
just 1 hit (“General Clipping on an Oblique Viewing Frustrum”).

So the frustum misspell looks like one that is less likely at the start and is almost gone by the time
practitioners are publishing articles, vs. the tessellation misspell, which appears to have more staying
power.

Addenda: Aaron Hertzmann notes that the US and Britain double their letters differently (“calliper”?
That’s just unnatural, Brits). He also notes the Oxford English Dictionary says about tessellate: “(US also
tesselate)”. Which actually is fine with me, except for the fact that Microsoft Word, Google’s
spellchecker, and even this blog’s software flags “tesselate” as a misspelling. If only we had the
equivalent of the Académie française to decide how we all should spell (on second thought, no).

Spike Hughes notes: “I think the answer for ‘frustrum’ is that it starts out like ‘frustrate’ (and indeed,
seems logically related: the pyramid WANTS to go all the way to the eye point, but is frustrated by the
near-plane).” This makes a lot of sense to me, and would explain why “frustra” feels even more correct.
Maybe that’s the mnemonic aid, like how with “it’s” vs. “its” there’s “It’s a wise dog that knows its own
fleas”. You don’t have to remember the spelling of each “its”, just remember that they differ; then
knowing “it’s” is “it is” means you can derive that the possessive “its” doesn’t have an apostrophe. Or
something. So maybe, “Don’t get frustrated when drawing a frustum”, remembering that they differ.
Andrew Glassner offers: “There’s no rum in a frustum,” because the poor thing has the top chopped off,
so all the rum we poured inside has evaporated.

3D Scene
You can look at the demo shown in the video by going to this link(opens in a new tab), letting it load and
then clicking “Start”.

For an in-depth overview of how three.js labels elements in a scene, see this page(opens in a new tab).

How Many Pixels?

In case you're not watching the video and just doing the quiz, the answer should really be in terms of
pixels per second. Also, please don't use commas or periods in your answer, just numerals.

“HIDE MY PROFILE” OPTION WILL BE DISABLED SUNDAY 7/14/2024

We’re going to disable the “Hide My Profile” option on Sunday, 7/14/2024, at roughly 8pm CDT. After
this time, you will no longer be able to hide your profile and any optional information you’ve posted in it
will be publicly accessible.

If your profile is now hidden and you put something in it you don’t want others to see, now’s the time to
edit or delete this material. To access your profile, click on:

Your avatar (graphic or initial at upper right) > Preferences icon (bottom of list) > Profile

This calls up a page showing any Profile information you previously entered. Please edit or delete
anything you don’t want others to see. All entries on this page are optional and can be changed at any
time - you can leave all input fields blank if you want.

The reasons for this change and a fuller explanation of what Profiles are may found at:

After much investigation and internal debate, we’ve tentatively decided to disable the Hide
Profile/Presence toggle in the Preferences section that enables users to customize their SDMB
experience. However, we’ll leave the Presence feature enabled. Below is an explanation of what that
means. We’re going to leave this thread open for comments before we do this in case there’s some
downside we’ve overlooked: Your Profile is what pops up when somebody clicks on your user name in a
post. It contai…

ADVERTISEMENT

how many photons emitted from a light bulb

Factual Questions

Sep 2005

Sep 2005
Sigene

Charter Member

Sep 2005

Lets say its a standard office type fluorescent bulb about 4 ft long. I don’t know how many watts…how
about whatever is standard 40? 60? How many photons would be emitted by such a white light source
per second?

Is it easy to figure out…If so how would one do this?

Estimates are fine, I just want a general ball park number that has some credible thought behind it.

For a rough ballpark estimate: if your bulb uses 40 watts, that’s 40 Joules in a second. A
photon of visible light has a wavelength of around 500 nanometres; since the energy of
a photon is given by E = hc/l, where h is Planck’s constant, c is the speed of light, and l
is the wavelength, each photon has… ::: calculates ::: about 4 x 10[sup]-19[/sup] Joules.
So to emit 40 Joules worth of light in a second, the bulb would have to emit about
10[sup]20[/sup] photons in a second.

This is a very rough calculation, of course. To refine it, you’d have to take the efficiency
of the bulb into account (not all 40 watts go into light energy), and look at the spectrum
of the bulb to figure out exactly what the average photon energy is. I suspect that these
factors might cause the final answer to differ by a factor of 10 or so, but not much more
than that.

Actually, incandescent bulbs are easier. You can ignore the variations in emissivity of
the tungsten filament and trweat it like a pure blackbody radiator. According to the RCA
Electro-Optics Handbook, the spectral radiance in photons per second is
n(lambda) = 2c/((lambda)^4)(exp(h(nu)/kT)-1) photons /sec-m^2-steradian-m
(p. 36) That gives you the number of photons per wavelength increment. You gotta
integrate over wavelengths to get the total number of photons.

It’s easier to get the radiant emmittance integrated over all wavelengths and angles in
terms of power rather than photons – it follows the simple Stefan-Boltzman equation
M(Watts per sq. meter) = (sigma)T^4

sigma is 5.6697 X 10^-8 Watt/m^2-K^4


For a rough ballpark estimate: if your bulb uses 40 watts, that’s 40 Joules in a second. A
photon of visible light has a wavelength of around 500 nanometres; since the energy of
a photon is given by E = hc/l, where h is Planck’s constant, c is the speed of light, and l
is the wavelength, each photon has… ::: calculates ::: about 4 x 10[sup]-19[/sup] Joules.
So to emit 40 Joules worth of light in a second, the bulb would have to emit about
10[sup]20[/sup] photons in a second.

This is a very rough calculation, of course. To refine it, you’d have to take the efficiency
of the bulb into account (not all 40 watts go into light energy), and look at the spectrum
of the bulb to figure out exactly what the average photon energy is. I suspect that these
factors might cause the final answer to differ by a factor of 10 or so, but not much more
than that.

This site 11 has a table of the photon emission rates of various lamps expressed in
microeinsteins per second. An Einstein represents 1 mole (6.02 X 10[sup]23[/sup]) of
photons, so a microeinstein is 6.02 X 10[sup]17[/sup] photons. Being a plant site, the
values in the table only count photosynthetically active radiation, so there’ll be some
photons missed at the red and blue ends of the spectrum:

The standard measure that quantifies the energy available for photosynthesis is “Photosynthetic
Active Radiation” (aka “Photosynthetic Available Radiation”) or PAR. Contrary to the lumen
measure that takes into account the human eye response, PAR is an unweighted measure. It
accounts with equal weight for all the output a light source emits in the wavelength range
between 400 and 700 nm. PAR also differs from the lumen in the fact that it is not a direct
measure of energy. It is expressed in “number of photons per second”, whose relationship with
“energy per second” (power) is intermediated by the spectral curve of the light source. One
cannot be directly converted into the other without the spectral curve.

A 40 watt cool white fluorescent puts out about 42.4 microensteins per second. That’s
2.55 X 10[sup]19[/sup] photons per second.

So far everyone has based their answer on the electrical power input to the bulb. As soon as I
can get around to it I’ll try to figure out how many photons are in the light output in lumens.

You’re referring to efficacy, which is usually an empirically derived number for real
world purposes, but can be theoretically approximated in many cases. Here’s a page
with some good random lighting links and info.

The page states a 100W incandescent has an efficay of 17.1 lumens/watt

I suspect that these factors [inefficiency and spectral distribution] might cause the final answer to
differ by a factor of 10 or so, but not much more than that.

Interestingly, when you take inefficiency into account, you’ll get more photons, not less.
Almost all of the energy which goes into a bulb comes out as light. It’s just that only a
small amount of it is visible light. Most of the energy is in infrared light, which has less
energy per photon. So you’ll need more photons total to carry the same amount of
energy.

You’re referring to efficacy, which is usually an empirically derived number for real
world purposes, but can be theoretically approximated in many cases. Here’s a page
with some good random lighting links and info.

The page states a 100W incandescent has an efficay of 17.1 lumens/watt

Most packaging gives the light output in lumens for the bulb. The OP asked how many
photons of light were put out by a light bulb, not how many photons would be contained
in the power input.

So far I’ve managed to dig up that 1 candela is 1/643 W/unit solid angle, and 1 candela
is 4Pi lumens. Now as soon as I can figure out just what unit solid angle they are talking
about (1 steradian?) the rest is a downhill pull.

But all of the power output of an incandescent is in photons. The OP wasn’t clear about
‘visible light’ or not.

It’s still a worthy calculation if you’re doing it

The candela takes into account the sensitivity of the eye. From here.

Hyperphysics:

The candela is the luminous intensity, in a given direction. of a source that emits monochromatic
radiation of frequency 540 x 1012 hertz and that has a radiant intensity in that direction of 1/683
watt per steradian.

The page on the lumen (see ‘light’, then ‘light intensity’) also explains the ‘solid angle’
thing with a diagram.

The OP wasn’t clear about ‘visible light’ or not.

Would it be asking too much for both photons of visible light and all photons?

Your question is complicated by the fact that photons of different color (wavelength) have
different energies. So you can’t simply convert output in Watts to numbers of photons without
knowing how many photons of which color are present. In the case of blackbody radiation the
relativer components of each color are wel–known, but in other cases, like a fluorescent lamp
with visible phosphors, the problem become more complex. You gotta know how much light of
each color is present.

But all of the power output of an incandescent is in photons.


No, some of it will be in the form of heat. Conduction and convecti

Almost all of the energy which goes into a bulb comes out as light. It’s just that only a small
amount of it is visible light. Most of the energy is in infrared light, which has less energy per
photon.

Is this true for fluorescent bulbs as well, or just incandescent?

Is this true for fluorescent bulbs as well, or just incandescent?

It’ll depend more on the housing, shade, and the like than on the bulb technology itself.
Any energy which goes into the bulb will either turn directly into light or into heat (for an
incandescent, it’s all into heat). The energy which went into heat will then leave the bulb
through one of three mechanisms: Conduction, convection, or radiation. For most light
fixtures, radiation would be the dominant form of heat transfer, which would mean that
most of the heat energy is leaving via photons (this is in fact the only mechanism by
which incandescent bulbs produce photons). What frequencies these photons are
produced at will depend on the temperature; for ordinary incandescent bulb
temperatures, most of them are infrared.

It’ll depend more on the housing, shade, and the like than on the bulb technology itself. Any
energy which goes into the bulb will either turn directly into light or into heat (for an
incandescent, it’s all into heat). The energy which went into heat will then leave the bulb through
one of three mechanisms: Conduction, convection, or radiation. For most light fixtures, radiation
would be the dominant form of heat transfer, which would mean that most of the heat energy is
leaving via photons (this is in fact the only mechanism by which incandescent bulbs produce
photons). What frequencies these photons are produced at will depend on the temperature; for
ordinary incandescent bulb temperatures, most of them are infrared.

I’ve looked around some but I haven’t found any data on how much of the heat is
convected away. But you are pretty close to right that the majority is radiated in one
form or another. Even some of that which is conducted away is emitted as radiation. I’ll
withdraw my objection to using power input as a measure of how many photons come
from a light bulb.
Continuing what David Simmons & Chonos were just discussing, the filament gives off
almost 100% of it’s energy input as photons. Very little heat gets conducted back into
the base of the bulb down the filament supports. Now when those photons hit the
frosted glass globe, a bunch of them get absorbed and converted to heat that convects
or oonducts.

So the answer depends a bunch on whether we’re talking about filament output or bulb
output.
History of the Teapot
I confirmed with Jim Blinn on August 11, 2015, that the teapot was squished because it looked nicer.
Picture here(opens in a new tab) of Blinn with a 3D printed teapot, at SIGGRAPH 2015.

There are good articles about the history of the teapot by Frank Crow(opens in a new tab), S.J.
Baker(opens in a new tab), and on Wikipedia(opens in a new tab). There are a number of iconic models
and images in computer graphics(opens in a new tab). Some famous models can be found here(opens in
a new tab) and here(opens in a new tab); my own teapot code is available(opens in a new tab). Teapots
still rule over all, with their own fan club(opens in a new tab) and teapot sightings page(opens in a new
tab). I photographed the whole collection(opens in a new tab). Oh, and Pixar made a short(opens in a
new tab).

The demos shown can be run in your browser: teapot(opens in a new tab), teaspoon(opens in a new
tab), and teacup(opens in a new tab).

The teapot sketch(opens in a new tab) is courtesy of Martin Newell, who is working to put it onto
Wikimedia Commons. The teapotahedron image is courtesy of Erin Shaw. The teapot photos are from
here(opens in a new tab) and here(opens in a new tab) on Wikimedia Commons.
Utah teapot

Article

Talk

Read

Edit

View history

Tools

Appearance hide

Text

Small

Standard

Large

Width

Standard

Wide

Color (beta)

Automatic
Light

Dark

From Wikipedia, the free encyclopedia

A 3D STL model of the teapot

A 2008 rendering of the Utah teapot model

The Utah teapot, or the Newell teapot, is one of the standard reference test models in 3D modeling and
an in-joke[1] within the computer graphics community. It is a mathematical model of an ordinary
Melitta-brand teapot that appears solid with a nearly rotationally symmetrical body. Using a teapot
model is considered the 3D equivalent of a "Hello, World!" program, a way to create an easy 3D scene
with a somewhat complex model acting as the basic geometry for a scene with a light setup. Some
programming libraries, such as the OpenGL Utility Toolkit,[2] even have functions dedicated to drawing
teapots.

The teapot model was created in 1975 by early computer graphics researcher Martin Newell, a member
of the pioneering graphics program at the University of Utah.[3] It was one of the first to be modeled
using Bézier curves rather than precisely measured.

History

The actual Melitta teapot that Martin Newell modelled, displayed at the Computer History Museum in
Mountain View, California (1990–present)

External image

image icon A scan of the original diagram Martin Newell drew up, to plan the Utah Teapot before
inputing it digitally.

Image courtesy of Computer History Museum.


For his work, Newell needed a simple mathematical model of a familiar object. His wife, Sandra Newell,
suggested modelling their tea set since they were sitting down for tea at the time. He sketched the
teapot free-hand using graph paper and a pencil.[4] Following that, he went back to the computer
laboratory and edited bézier control points on a Tektronix storage tube, again by hand. [citation needed]

The teapot shape contained a number of elements that made it ideal for the graphics experiments of the
time: it was round, contained saddle points, had a genus greater than zero because of the hole in the
handle, could project a shadow on itself, and could be displayed accurately without a surface texture.

Newell made the mathematical data that described the teapot's geometry (a set of three-dimensional
coordinates) publicly available, and soon other researchers began to use the same data for their
computer graphics experiments. These researchers needed something with roughly the same
characteristics that Newell had, and using the teapot data meant they did not have to laboriously enter
geometric data for some other object. Although technical progress has meant that the act of rendering
the teapot is no longer the challenge it was in 1975, the teapot continued to be used as a reference
object for increasingly advanced graphics techniques.

Over the following decades, editions of computer graphics journals (such as the ACM SIGGRAPH's
quarterly) regularly featured versions of the teapot: faceted or smooth-shaded, wireframe, bumpy,
translucent, refractive, even leopard-skin and furry teapots were created.

Having no surface to represent its base, the original teapot model was not intended to be seen from
below. Later versions of the data set fixed this.

The real teapot is 33% taller (ratio 4:3)[5] than the computer model. Jim Blinn stated that he scaled the
model on the vertical axis during a demo in the lab to demonstrate that they could manipulate it. They
preferred the appearance of this new version and decided to save the file out of that preference.[6]

Versions of the teapot model — or sample scenes containing it — are distributed with or freely available
for nearly every current rendering and modelling program and even many graphic APIs, including
AutoCAD, Houdini, Lightwave 3D, MODO, POV-Ray, 3ds Max, and the OpenGL and Direct3D helper
libraries. Some RenderMan-compliant renderers support the teapot as a built-in geometry by calling
RiGeometry("teapot", RI_NULL). Along with the expected cubes and spheres, the GLUT library even
provides the function glutSolidTeapot() as a graphics primitive, as does its Direct3D counterpart D3DX
(D3DXCreateTeapot()). While D3DX for Direct3D 11 does not provide this functionality anymore, it is
supported in the DirectX Tool Kit.[7] Mac OS X Tiger and Leopard also include the teapot as part of
Quartz Composer; Leopard's teapot supports bump mapping. BeOS and Haiku include a small demo of a
rotating 3D teapot, intended to show off the platform's multimedia facilities.

Teapot scenes are commonly used for renderer self-tests and benchmarks.[8][9]

Original teapot model

The original, physical teapot was purchased from ZCMI (a department store in Salt Lake City) in 1974. It
was donated to the Boston Computer Museum in 1984, where it was on display until 1990. It now
resides in the ephemera collection at the Computer History Museum in Mountain View, California where
it is catalogued as "Teapot used for Computer Graphics rendering" and bears the catalogue number
X00398.1984.[10] The original teapot the Utah teapot was based on used to be available from Friesland
Porzellan, once part of the German Melitta group.[11][12] Originally it was given the rather plain name
Haushaltsteekanne ('household teapot');[13] the company only found out about their product's
reputation in 2017, whereupon they officially renamed it "Utah Teapot". It was available in three
different sizes and various colors; the one Martin Newell had used is the white "1,4L Utah Teapot".[14]

Appearances

"The Six Platonic Solids", an image that humorously adds the Utah teapot to the five standard Platonic
solids

One famous ray-traced image, by James Arvo and David Kirk in 1987,[15] shows six stone columns, five
of which are surmounted by the Platonic solids (tetrahedron, cube, octahedron, dodecahedron,
icosahedron). The sixth column supports a teapot.[16] The image is titled "The Six Platonic Solids", with
Arvo and Kirk calling the teapot "the newly discovered Teapotahedron".[15] This image appeared on the
covers of several books and computer graphic journals.

The Utah teapot sometimes appears in the "Pipes" screensaver shipped with Microsoft Windows,[17]
but only in versions prior to Windows XP, and has been included in the "polyhedra" XScreenSaver hack
since 2008.[18]
Jim Blinn (in one of his "Project MATHEMATICS!" videos) proves an amusing (but trivial) version of the
Pythagorean theorem: construct a (2D) teapot on each side of a right triangle and the area of the teapot
on the hypotenuse is equal to the sum of the areas of the teapots on the other two sides.[19]

Vulkan and OpenGL graphics APIs feature the Utah teapot along with the Stanford dragon and the
Stanford bunny on their badges.[20]

With the advent of the first computer-generated short films, and later full-length feature films, it has
become an in-joke to hide the Utah teapot in films' scenes.[21] For example, in the movie Toy Story, the
Utah teapot appears in a short tea-party scene. The teapot also appears in The Simpsons episode
"Treehouse of Horror VI" in which Homer discovers the "third dimension."[22] In The Sims 2, a picture of
the Utah teapot is one of the paintings available to buy in-game, titled "Handle and Spout".

An origami version of the teapot, folded by Tomohiro Tachi, was shown at the Tikotin Museum of
Japanese Art in Israel in a 2007–2008 exhibit.[23]

'Smithfield Utah' public sculpture in Dublin, Ireland

In Oct 2021 "Smithfield Utah" by Alan Butler which was inspired by the Utah teapot was unveiled in
Dublin, Ireland.[24][25]

In The Amazing Digital Circus episode "Candy Carrier Chaos!", the floating blue Utah teapots can be seen
after Pomni and Gummigoo clipped under the map out of bounds.

OBJ conversion

Although the original tea set by Newell can be downloaded directly, this tea set is specified using a set of
Bézier patches in a custom format, which can be difficult to import directly into many popular 3D
modeling applications. As such, a tesselated conversion of the dataset in the popular OBJ file format can
be useful. One such conversion of the complete Newell teaset is available on the University of Utah
website.
3D printing

Through 3D printing, the Utah Teapot has come full circle from being a computer model based on an
actual teapot to being an actual teapot based on the computer model. It is widely available in many
renderings in different materials from small plastic knick-knacks to a fully functional ceramic teapot. It is
sometimes intentionally rendered as a low poly object to celebrate its origin as a computer model.
[citation needed]

In 2009, a Belgian design studio, Unfold, 3D printed the Utah Teapot in ceramic with the objective of
returning the iconographic teapot to its roots as a piece of functional dishware while showing its status
as an icon of the digital world.[26]

In 2015, the California-based company Emerging Objects followed suit, but this time printed the teapot,
along with teacups and teaspoons, out of actual tea.[27]

Gallery

Wireframe Utah teapot on a 1960s-era Calcomp plotter

Wireframe Utah teapot on a 1960s-era Calcomp plotter

The Utah teapot

The Utah teapot

Environment mapping on the teapot

Environment mapping on the teapot

See also

List of common 3D test models

3DBenchy

Cornell box

Stanford bunny
Stanford dragon

Suzanne (3D model)

List of filmmaker's signatures

Lenna

References

Dunietz, Jesse (February 29, 2016). "The Most Important Object In Computer Graphics History Is This
Teapot". Nautilus. Retrieved March 3, 2019.

Mark Kilgard (February 23, 1996). "11.9 glutSolidTeapot, glutWireTeapot". www.opengl.org. Retrieved
October 7, 2011.

Torrence, Ann (2006). "Martin Newell's original teapot: Copyright restrictions prevent ACM from
providing the full text for this work". ACM SIGGRAPH 2006 Teapot on - SIGGRAPH '06. p. 29.
doi:10.1145/1180098.1180128. ISBN 978-1-59593-364-5. S2CID 23272447. Article No. 29.

"The Utah Teapot - CHM Revolution". Computer History Museum. Retrieved March 20, 2016.

"The Utah Teapot". www.holmes3d.net. Retrieved July 10, 2021.

Seymour, Mike (July 25, 2012). "Founders Series: Industry Legend Jim Blinn". fxguide.com. Archived
from the original on July 29, 2012. Retrieved April 15, 2015.

"DirectX Tool Kit". GitHub. November 29, 2022.

Wald, Ingo; Benthin, Carsten; Slusallek, Philipp (2002). "A Simple and Practical Method for Interactive
Ray Tracing of Dynamic Scenes" (PDF). Technical Report, Computer Graphics Group. Saarland University.
Archived from the original (PDF) on March 23, 2012.

Klimaszewski, K.; Sederberg, T.W. (1997). "Faster ray tracing using adaptive grids". IEEE Computer
Graphics and Applications. 17 (1): 42–51. doi:10.1109/38.576857. S2CID 29664150.

Original Utah Teapot at the Computer History Museum. September 28, 2001. {{cite book}}: |website=
ignored (help)

Sander, Antje; Siems, Maren; Wördemann, Wilfried; Meyer, Stefan; Janssen, Nina (2015). Siems, Maren
(ed.). Melitta und Friesland Porzellan - 60 Jahre Keramikherstellung in Varel [Melitta and Friesland
Porzellan - 60 years manufacturing of ceramics in Varel]. Schloss Museum Jever [de] (in German). Vol.
Jever Heft 33 (1 ed.). Oldenburg, Germany: Isensee Verlag [de]. ISBN 978-3-7308-1177-1. Begleitkatalog
zur Ausstellung: Jeverland - in Ton gebrannt. (48 pages)
Friesland Porzellan [@FrieslandPorzel] (March 24, 2017). "The original Utah Teapot was always
produced by Friesland. We were part of the Melitta Group once, thats right. Got yours already?" (Tweet)
– via Twitter.

"Eine Teekanne als Filmstar" (in German). Radio Bremen. Archived from the original on April 1, 2019.
Retrieved March 1, 2019.

"Teekanne 1,4l Weiß Utah Teapot" (in German). Friesland Versand GmbH. Archived from the original on
March 29, 2023. Retrieved November 15, 2023.

Arvo, James; Kirk, David (1987). "Fast ray tracing by ray classification". ACM SIGGRAPH Computer
Graphics. 21 (4): 55–64. doi:10.1145/37402.37409.

Carlson, Wayne (2007). "A Critical History of Computer Graphics and Animation". OSU.edu. Archived
from the original on February 12, 2012. Retrieved April 15, 2015.

"Windows NT Easter Egg – Pipes Screensaver". The Easter Egg Archive. Retrieved May 5, 2018.

"changelog (Added the missing Utah Teapotahedron to polyhedra)". Xscreensaver. August 10, 2008.

Project Mathematica: Theorem Of Pythagoras. NASA. 1988. Event occurs at 14:00. Retrieved July 28,
2015 – via archive.org.

Rob Williams (March 8, 2018). "Khronos Group Announces Vulkan 1.1". Techgage Networks. Retrieved
January 18, 2020.

"Tempest in a Teapot". Continuum. Winter 2006–2007. Archived from the original on July 12, 2014.

"Pacific Data Images – Homer3". Archived from the original on July 24, 2008.

"Tomohiro Tachi". Treasures of Origami Art. Tikotin Museum of Japanese Art. August 17, 2007.
Retrieved June 18, 2021.

"Dublin City Council commission of public sculpture for Smithfield Square" (PDF). Retrieved April 23,
2023.

"Central Area: Smithfield Square Lower – Sculpture Dublin". Retrieved April 23, 2023.

"Utanalog, Ceramic Utah Teapot". Unfold Design Studio. October 28, 2009. Retrieved May 12, 2015.

Virginia San Fratello & Ronald Rael (2015). "The Utah Tea Set". Emerging Objects. Retrieved May 12,
2015.

External links

Wikimedia Commons has media related to Utah teapot.


Image of Utah teapot at the Computer History Museum

Newell's teapot sketch at the Computer History Museum

S.J. Baker's History of the teapot Archived November 20, 2014, at the Wayback Machine, including patch
data

Teapot history and images, from A Critical History of Computer Graphics and Animation (Wayback
Machine copy)

WebGL teapot demonstration

History of the Teapot video from Udacity's online Interactive 3D Graphics course

The World's Most Famous Teapot - Tom Scott explains the story of Martin Newell's digital creation
(YouTube)

vte

Standard test items

PangramReference implementationSanity checkStandard test image

Artificial intelligence

Chinese roomTuring test

Television (test card)

SMPTE color barsEBU colour barsIndian-head test patternEIA 1956 resolution chartBBC Test Card A, B, C,
D, E, F, G, H, J, W, XETP-1Philips circle pattern (PM 5538, PM 5540, PM 5544, PM 5644)Snell & Wilcox
SW2/SW4Telefunken FuBKTVE test cardUEIT

Computer languages

"Hello, World!" programQuineTrabb Pardo–Knuth algorithmMan or boy testJust another Perl hacker

Data compression

Calgary corpusCanterbury corpusSilesia corpusenwik8, enwik9

3D computer graphics

Cornell boxStanford bunnyStanford dragonUtah teapotList

Machine learning

ImageNetMNIST databaseList
Typography (filler text)

Etaoin shrdluHamburgevonsLorem ipsumThe quick brown fox jumps over the lazy dog

Other

3DBenchyAcid 123"Bad Apple!!"EICAR test filefunctions for optimizationGTUBEHarvard


sentencesLenna"The North Wind and the Sun""Tom's Diner"SMPTE universal leaderEURion
constellationShakedownWebdriver Torso1951 USAF resolution test chart
Standard Procedural Databases

by Eric Haines et al.

balls gears mount rings teapot tetra tree

click on an image to see a full size rendering

(rendered with POV-Ray 3.1)

This is the code described in:

Eric Haines, "A Proposal for Standard Graphics Environments," IEEE Computer Graphics and
Applications, 7(11), Nov. 1987, p. 3-5.

You can download the latest version of the SPD (currently 3.14), and also view the original IEEE CG&A
article from Nov. 1987. The code is on Github.

This software package is not copyrighted and can be used freely. All source is in K&R vanilla C (though
ANSI headers can be enabled) and has been used on many systems.

For a newer set of more realistic environments for benchmarking ray tracers (or renderers in general),
see BART: A Benchmark for Animated Ray Tracing. The focus is software that generates an animated set
of frames for a ray tracer to render. These scenes use an NFF-like language (AFF), and the authors
provide a number of tools for parsing and visualization.

This software is meant to act as a set of basic test images for ray tracing algorithms. The programs
generate databases of objects which are fairly familiar and "standard" to the graphics community, such
as the teapot, a fractal mountain, a tree, a recursively built tetrahedral structure, etc. I originally created
them for my own testing of ray tracing efficiency schemes. Since their first release other researchers
have used them to test new algorithms. In this way, research on algorithmic improvements can be
compared in a more standardized fashion. If one researcher ray-traces a car, another a tree, the
question arises, "How many cars to the tree?" With these databases we may be comparing oranges and
apples, but it's better than comparing oranges and orangutans. Using these statistics along with the
same scenes allows us to compare results in a more meaningful way.
Another interesting use for the SPD has been noted: debugging. By comparing the images and the
statistics with the output of your own ray tracer, you can detect program errors. For example, "mount"
is useful for checking if refraction rays are generated correctly, and "balls" (a.k.a. "sphereflake") can
check for the correctness of eye and reflection rays.

The images for these databases and other information about them can be found in A Proposal for
Standard Graphics Environments, IEEE Computer Graphics and Applications, vol. 7, no. 11, November
1987, pp. 3-5. See IEEE CG&A, vol. 8, no. 1, January 1988, p. 18 for the correct image of the tree
database (the only difference is that the sky is blue, not orange). The teapot database was added later.

The Neutral File Format (NFF) is the default output format from SPD programs. This format is trivial to
parse (if you can use sscanf, you can parse it), and each type of object is defined in human terms (e.g. a
cone is defined by two endpoints and radii). The basic shapes supported are polygon and polygon patch
(normal per vertex), cylinder, cone, and sphere. Note that there are primitives supported within the SPD
which are not part of NFF, e.g. heightfield, NURBS, and torus, so more elaborate programs can be
written. If a format does not support a given primitive, the primitive is tessellated and output as
polygons.

Ares Lagae has written libnff, a modern C++ library for parsing NFF that also supports conversion to
Wavefront OBJ.

I converted the sphereflake demo to a more modern form, for RTX hardware. The code is in the DXR-
Sphereflake directory in this code base (which now won't run, because Falcor has changed). My blog
post is here, gallery here, and longer NVIDIA post here.

Other output formats are supported:

POV-Ray 1.0

POV-Ray 2.0 to 2.2

POV-Ray 3.1
Polyray 1.4 to 1.6

Vivid 2.0

QRT 1.5

Rayshade 4.0.6

RTrace 8.0.0

Art 2.3 (from Vort)

RenderMan RIB

AutoCAD DXF [object data only]

Wavefront OBJ format (polygons only)

RenderWare RWX script file

Apple 3DMF

VRML 1.0

VRML 2.0

Alexander Enzmann receives most of the credit for creating the various file format output routines,
along with many others who contributed.

There are also reader programs for the various formats. Currently the following formats can be read and
converted:

NFF

DXF (just 3DFACEs)

OBJ

This makes the NFF format a nice, simple language for quickly creating models (whether by hand or by
program), as any NFF file can be converted to many different formats. Warnings:

The conversions tend to be verbose in many cases (e.g. there is currently no code in place to group
polygons of the same material into polygon mesh primitives used in some formats).

No real tessellation of polygons is done when needed for conversion, all that happens are that polygon
fans are created.
You might find the images you obtain are mirror reversed with some formats (e.g. VRML 2.0 files).

The Graphics Gems V code distribution has a simple z-buffer renderer by Raghu Karinthi, using NFF as
the input language.

On hashing: a sore point in mount.c, the fractal mountain generator, has been its hashing function. Mark
VandeWettering has provided a great hashing function by Bob Jenkins. To show what a difference it
makes, check out images of models made with the original hash function with a large size factor,
replacement hash function I wrote (still no cigar), and Jenkins' hash function.

For more information on the SPD, see the README.txt file included in the distribution.

Compatibility Notes

Linux

On some (all?) versions of gcc on Linux, the following correction to the code is necessary:

libinf.c, line 33:

FILE *gOutfile = stdout;

change to

FILE *gOutfile = NULL;

Research Works using SPD

Timing comparisons for the various scenes using a wide variety of free software ray tracers are
summarized in The Ray Tracing News, 3(1) (many), 6(2), 6(3), 8(3), and 10(3). Here are some research
works which have used the SPD to benchmark their ray tracers (please let me know of others; you can
always search Google Scholar for more):

Kay, Timothy L. and James T. Kajiya, "Ray Tracing Complex Scenes," Computer Graphics (SIGGRAPH '86
Proceedings), 20(4), Aug. 1986, p. 269-78.
Arvo, James and David Kirk, "Fast Ray Tracing by Ray Classification," Computer Graphics (SIGGRAPH '87
Proceedings) 21(4), July 1987, p. 55-64. Also in Tutorial: Computer Graphics: Image Synthesis, Computer
Society Press, Washington, 1988, pp. 196-205. Predates SPD, uses recursive tetrahedron.

Subramanian, K.R., "Fast Ray Tracing Using K-D Trees," Master's Thesis, Dept. of Computer Sciences,
Univ. of Texas at Austin, Dec. 1987. Uses balls, tetra, tree.

Fussell, Donald and K.R. Subramanian "Fast Ray Tracing Using K-D Trees," Technical Report TR-88-07,
Dept. of Computer Sciences, Univ. of Texas at Austin March 1988. Uses balls, tetra, tree.

Salmon, John and Jeffrey Goldsmith "A Hypercube Ray-Tracer," Proceedings of the Third Conference on
Hypercube Computers and Applications , 1988. Uses balls and mountain.

Bouatouch, Kadi and Thierry Priol, "Parallel Space Tracing: An Experience on an iPSC Hypercube," ed. N.
Magnenat-Thalmann and D. Thalmann, New Trends in Computer Graphics (Proceedings of CG
International '88), Springer-Verlag, New York, 1988, p. 170-87. Uses balls.

Priol, Thierry and Kadi Bouatouch, "Experimenting with a Parallel Ray-Tracing Algorithm on a Hypercube
Machine," Eurographics '88, Elsevier Science Publishers, Amsterdam, North-Holland, Sept. 1988, p. 243-
59. Uses balls.

Devillers, Olivier, "The Macro-Regions: an Efficient Space Subdivision Structure for Ray Tracing,"
Eurographics '89, Elsevier Science Publishers, Amsterdam, North-Holland, Sept. 1989, p. 27-38, 541.
(revised version of Technical Report 88-13, Laboratoire d'Informatique de l'Ecole Normale Superieure,
Paris, France, Nov. 1988). Uses balls, tetra.

Priol, Thierry and Kadi Bouatouch, "Static Load Balancing for a Parallel Ray Tracing on a MIMD
Hypercube," The Visual Computer, 5(1/2), March 1989, p. 109-19. Uses balls.

Green, Stuart A. and D.J. Paddon, "Exploiting Coherence for Multiprocessor Ray Tracing," IEEE Computer
Graphics and Applications, 9(6), Nov. 1989, p. 12-26. Uses balls, mount, rings, tetra.

Green, Stuart A. and D.J. Paddon, "A Highly Flexible Multiprocessor Solution for Ray Tracing," The Visual
Computer, 6(2), March 1990, p. 62-73. Uses balls, mount, rings, tetra.

Dauenhauer, David Elliot and Sudhanshu Kumar Semwal, "Approximate Ray Tracing," Proceedings of
Graphics Interface '90, Canadian Information Processing Society, Toronto, Ontario, May 1990, p. 75-82.
Uses balls, gears, tetra.

Badouel, Didier, Kadi Bouatouch, Thierry Priol, "Ray Tracing on Distributed Memory Parallel Computers:
Strategies for Distributing Computations and Data," SIGGRAPH '90 Parallel Algorithms and Architecture
for 3D Image Generation course notes, 1990. Uses mountain, rings, teapot, tetra.

Spackman, John, "Scene Decompositions for Accelerated Ray Tracing". Ph.D. Thesis, The University of
Bath, UK, 1990. Available as Bath Computer Science Technical Report 90/33.
Green, Stuart A., Parallel Processing for Computer Graphics, MIT Press/Pitman Publishing, Cambridge,
Mass./London, 1991. Uses balls, mount, rings, tetra.

Subramanian, K.R. and Donald S. Fussell, "Automatic Termination Criteria for Ray Tracing Hierarchies,"
Proceedings of Graphics Interface '91, Canadian Information Processing Society, Toronto, Ontario, June
1991, p. 93-100. Uses balls, tetra.

Spackman, John N., "The SMART Navigation of a Ray Through an Oct-tree," Computers and Graphics,
vol. 15, no. 2, June 1991, p. 185-194. Code for the ray tracer is available.

Fournier, Alain and Pierre Poulin, "A Ray Tracing Accelerator Based on a Hierarchy of 1D Sorted Lists,"
Proceedings of Graphics Interface '93, Canadian Information Processing Society, Toronto, Ontario, May
1993, p. 53-61. Uses balls, gears, tetra, tree.

Simiakakis, George, and A. Day, "Five-dimensional Adaptive Subdivision for Ray Tracing," Computer
Graphics Forum, 13(2), June 1994, p. 133-140. Uses balls, gears, mount, teapot, tetra, tree.

Matthew Quail, "Space-Time Ray-Tracing using Ray Classification," Thesis project for B.S. with Honours,
Dept. of Computing, School of Maths, Physics, Computing and Electronics, Macquarie University. Uses
mount.

Klimaszewski, Krzysztof and Thomas W. Sederberg, "Faster Ray Tracing Using Adaptive Grids," IEEE
Computer Graphics and Applications 17(1), Jan/Feb 1997, p. 42-51. Uses balls.

Havran, Vlastimil, Tomas Kopal, Jiri Bittner, and Jiri Zara, "Fast robust BSP tree traversal algorithm for ray
tracing," Journal of Graphics Tools, 2(4):15-24, 1997. Uses balls, gears, mount, and tetra.

Nakamaru, Koji and Yoshio Ohno, "Breadth-First Ray Tracing Utilizing Uniform Spatial Subdivision," IEEE
Transactions on Visualization and Computer Graphics, 3(4), Oct-Dec 1997, p. 316-328.

Havran, Vlastimil, Jiri Bittner, and J. Zara, "Ray Tracing with Rope Trees," Proceedings of SCCG'98
Conference, pp. 130-139, April 1998. Uses 5 normal SPD.

Sanna, A., P. Montuschi and M. Rossi, "A Flexible Algorithm for Multiprocessor Ray Tracing,", The
Computer Journal, 41(7), pp. 503-516, 1998. Uses spheres.

Müller, Gordon and Dieter W. Fellner, "Hybrid Scene Structuring with Application to Ray Tracing,"
Proceedings of International Conference on Visual Computing (ICVC'99), Goa, India, Feb. 1999, pp. 19-
26. Uses balls, lattice, tree.

Havran, Vlastimil, and Jiri Bittner, "Rectilinear BSP Trees for Preferred Ray Sets," Proceedings of SCCG'99
conference, pp. 171-179, April/May 1999. Uses lattice, rings, tree.

Havran, Vlastimil and Filip Sixta "Comparison of Hierarchical Grids," Ray Tracing News, 12(1), June 25,
1999. Uses all SPD. Additional statistics are available at this site
Havran, Vlastimil, "A Summary of Octree Ray Traversal Algorithms," Ray Tracing News, 12(2), December
21, 1999. Uses all SPD. Additional statistics are available at this site

Havran, Vlastimil, Jan Prikryl, and Werner Purgathofer, "Statistical Comparison of Ray-Shooting Efficiency
Schemes," Technical Report/TR-186-2-00-14, Technische Universität Wien, Institut für Computergraphik
und Algorithmen, 4 July 2000. Uses all SPD.

Havran, Vlastimil, "Heuristic Ray Shooting Algorithms", Ph.D. Thesis, Czech Technical University,
November 2000. Uses all SPD.

Koji Nakamaru and Yoshio Ohno. "Enhanced breadth-first ray tracing," Journal of Graphics Tools,
6(4):13-28, 2001. Uses all SPD. Renderings include up to a billion primitives.

Simiakakis, George, Th. Theoharis and A. M. Day, "Parallel Ray Tracing with 5D Adaptive Subdivision,"
WSCG 2001 Conference Proceedings, 2001. Uses 5 normal SPD plus teaport.

Havran, Vlastimil and Jiri Bittner: "On Improving KD-Trees for Ray Shooting", Proceedings of WSCG'2002
conference, pp. 209-217, February 2002. Also see Libor Dachs' ray tracing visualization
History of the Teapot

I confirmed with Jim Blinn on August 11, 2015, that the teapot was squished
because it looked nicer. Picture here(opens in a new tab) of Blinn with a 3D
printed teapot, at SIGGRAPH 2015.

There are good articles about the history of the teapot by Frank Crow(opens in a
new tab), S.J. Baker(opens in a new tab), and on Wikipedia(opens in a new tab).
There are a number of iconic models and images in computer graphics(opens in a
new tab). Some famous models can be found here(opens in a new tab) and
here(opens in a new tab); my own teapot code is available(opens in a new tab).
Teapots still rule over all, with their own fan club(opens in a new tab) and teapot
sightings page(opens in a new tab). I photographed the whole collection(opens in
a new tab). Oh, and Pixar made a short(opens in a new tab).

The demos shown can be run in your browser: teapot(opens in a new tab),
teaspoon(opens in a new tab), and teacup(opens in a new tab).

The teapot sketch(opens in a new tab) is courtesy of Martin Newell, who is


working to put it onto Wikimedia Commons. The teapotahedron image is courtesy
of Erin Shaw. The teapot photos are from here(opens in a new tab) and
here(opens in a new tab) on Wikimedia Commons.

Simple Materials

Photos from Wikimedia Commons: shiny ball(opens in a new tab), glass ball(opens
in a new tab) and light bulb(opens in a new tab).

Try this demo out for spheres,

http://mrdoob.github.com/three.js/examples/webgl_materials.html(opens in a
new tab). Notice how some of the spheres respond to the light moving through
the world.
I should note that what I’m describing is a typical desktop or laptop computer’s
GPU. Portable devices such as smart phones and tablets will usually use tile-based
rendering instead.

There’s a brief explanation of this algorithm here(opens in a new tab).

The good news is that even this type of architecture can still be controlled by
WebGL.

This question needs a rewording: instead of "At what rate...", please change that
to "Once the pipeline is full, how often do boxes come off this pipeline?" My
apologies for the confusion. Also, note that it's a true pipeline. One cutter does
his cuts and then passes it on to the next cutter, once every five seconds. That
next cutter cuts and passes it to the folder in the next five seconds, while at the
same time the first cutter is now doing cuts on a new box.

I should note that what I’m describing is a typical desktop or laptop computer’s
GPU. Portable devices such as smart phones and tablets will usually use tile-based
rendering instead. There’s a brief explanation of this algorithm here(opens in a
new tab). The good news is that even this type of architecture can still be
controlled by WebGL.

GPU and Pipeline related articles @

https://fgiesen.wordpress.com/2011/07/09/a-
trip-through-the-graphics-pipeline-2011-index/
Stalling and Starving

Bottleneck is the slowest stage in the GPUs pipeline. It is the stage where the
maximum amount of time passes before each data or information output passes
at the end of the pipeline. It determines how frequently each data or output
comes at the end of the pipe line. It determines the speed of its processing.

Stalling: The situation where a stage in a pipeline awaits to deliver the already
processed and finished output but the next stage does not receive it, until it
finishes processing.

Starving: The situation where a stage in a GPU pipeline waits some time before
the next stage, which is starved, accepts it. It is a waiting stage for a quicker part
of the pipeline before it receives processed data from its precedent, which is the
slowest stage in the GPU pipeline.

Techniques to avoid bottlenecks in GPU processing in a pipeline: First in first out


and Unified Shaders.

Painter’s algorithm: A technique of displaying one object on top of the other


based on their distance from the camera, where the distant object is drawn first
followed by the next closer object. The closest object among many is drawn last.
Painter’s algorithm has its own flaws in that it complicates the process of display,
where part of an image among many appears to be at the front and the other of
its parts at the back. See the image of the three triangles below
Z- Buffer: Distance of an image from the camera. It affects our depth of view for
an object. Pixels not only store images with specified colors but also distance of
an image from the camera using what is called the Z- Depth. Indexed from 0.0 to
1.0.
I should note that I'm using a z-depth of 0.0 to 1.0 here for simplicity - it's also what DirectX
uses. In Unit 7 we'll see that WebGL uses -1.0 to 1.0; this is just a change in offset and scale,
the idea's the same.

Webgl and Three .js

Webgl is an application program interface (API).


A trip through the Graphics Pipeline 2011,
part 1
July 1, 2011

This post is part of the series “A trip through the Graphics


Pipeline 2011”.

It’s been awhile since I posted something here, and I figured I might
use this spot to explain some general points about graphics
hardware and software as of 2011; you can find functional
descriptions of what the graphics stack in your PC does, but usually
not the “how” or “why”; I’ll try to fill in the blanks without getting too
specific about any particular piece of hardware. I’m going to be
mostly talking about DX11-class hardware running D3D9/10/11 on
Windows, because that happens to be the (PC) stack I’m most
familiar with – not that the API details etc. will matter much past this
first part; once we’re actually on the GPU it’s all native commands.

The application
This is your code. These are also your bugs. Really. Yes, the API
runtime and the driver have bugs, but this is not one of them. Now
go fix it already.

The API runtime


You make your resource creation / state setting / draw calls to the
API. The API runtime keeps track of the current state your app has
set, validates parameters and does other error and consistency
checking, manages user-visible resources, may or may not validate
shader code and shader linkage (or at least D3D does, in OpenGL
this is handled at the driver level) maybe batches work some more,
and then hands it all over to the graphics driver – more precisely,
the user-mode driver.

The user-mode graphics driver (or UMD)


This is where most of the “magic” on the CPU side happens. If your
app crashes because of some API call you did, it will usually be in
here :). It’s called “nvd3dum.dll” (NVidia) or “atiumd*.dll” (AMD). As
the name suggests, this is user-mode code; it’s running in the same
context and address space as your app (and the API runtime) and
has no elevated privileges whatsoever. It implements a lower-level
API (the DDI) that is called by D3D; this API is fairly similar to the
one you’re seeing on the surface, but a bit more explicit about
things like memory management and such.

This module is where things like shader compilation happen. D3D


passes a pre-validated shader token stream to the UMD – i.e. it’s
already checked that the code is valid in the sense of being
syntactically correct and obeying D3D constraints (using the right
types, not using more textures/samplers than available, not
exceeding the number of available constant buffers, stuff like that).
This is compiled from HLSL code and usually has quite a number of
high-level optimizations (various loop optimizations, dead-code
elimination, constant propagation, predicating ifs etc.) applied to it –
this is good news since it means the driver benefits from all these
relatively costly optimizations that have been performed at compile
time. However, it also has a bunch of lower-level optimizations
(such as register allocation and loop unrolling) applied that drivers
would rather do themselves; long story short, this usually just gets
immediately turned into a intermediate representation (IR) and then
compiled some more; shader hardware is close enough to D3D
bytecode that compilation doesn’t need to work wonders to give
good results (and the HLSL compiler having done some of the high-
yield and high-cost optimizations already definitely helps), but
there’s still lots of low-level details (such as HW resource limits and
scheduling constraints) that D3D neither knows nor cares about, so
this is not a trivial process.

And of course, if your app is a well-known game, programmers at


NV/AMD have probably looked at your shaders and wrote hand-
optimized replacements for their hardware – though they better
produce the same results lest there be a scandal :). These shaders
get detected and substituted by the UMD too. You’re welcome.

More fun: Some of the API state may actually end up being
compiled into the shader – to give an example, relatively exotic (or
at least infrequently used) features such as texture borders are
probably not implemented in the texture sampler, but emulated with
extra code in the shader (or just not supported at all). This means
that there’s sometimes multiple versions of the same shader floating
around, for different combinations of API states.

Incidentally, this is also the reason why you’ll often see a delay the
first time you use a new shader or resource; a lot of the
creation/compilation work is deferred by the driver and only
executed when it’s actually necessary (you wouldn’t believe how
much unused crap some apps create!). Graphics programmers
know the other side of the story – if you want to make sure
something is actually created (as opposed to just having memory
reserved), you need to issue a dummy draw call that uses it to
“warm it up”. Ugly and annoying, but this has been the case since I
first started using 3D hardware in 1999 – meaning, it’s pretty much
a fact of life by this point, so get used to it. :)

Anyway, moving on. The UMD also gets to deal with fun stuff like all
the D3D9 “legacy” shader versions and the fixed function pipeline –
yes, all of that will get faithfully passed through by D3D. The 3.0
shader profile ain’t that bad (it’s quite reasonable in fact), but 2.0 is
crufty and the various 1.x shader versions are seriously whack –
remember 1.3 pixel shaders? Or, for that matter, the fixed-function
vertex pipeline with vertex lighting and such? Yeah, support for all
that’s still there in D3D and the guts of every modern graphics
driver, though of course they just translate it to newer shader
versions by now (and have been doing so for quite some time).

Then there’s things like memory management. The UMD will get
things like texture creation commands and need to provide space
for them. Actually, the UMD just suballocates some larger memory
blocks it gets from the KMD (kernel-mode driver); actually mapping
and unmapping pages (and managing which part of video memory
the UMD can see, and conversely which parts of system memory
the GPU may access) is a kernel-mode privilege and can’t be done
by the UMD.

But the UMD can do things like swizzling textures (unless the GPU
can do this in hardware, usually using 2D blitting units not the real
3D pipeline) and schedule transfers between system memory and
(mapped) video memory and the like. Most importantly, it can also
write command buffers (or “DMA buffers” – I’ll be using these two
names interchangeably) once the KMD has allocated them and
handed them over. A command buffer contains, well, commands :).
All your state-changing and drawing operations will be converted by
the UMD into commands that the hardware understands. As will a
lot of things you don’t trigger manually – such as uploading textures
and shaders to video memory.

In general, drivers will try to put as much of the actual processing


into the UMD as possible; the UMD is user-mode code, so anything
that runs in it doesn’t need any costly kernel-mode transitions, it can
freely allocate memory, farm work out to multiple threads, and so on
– it’s just a regular DLL (even though it’s loaded by the API, not
directly by your app). This has advantages for driver development
too – if the UMD crashes, the app crashes with it, but not the whole
system; it can just be replaced while the system is running (it’s just
a DLL!); it can be debugged with a regular debugger; and so on. So
it’s not only efficient, it’s also convenient.
But there’s a big elephant in the room that I haven’t mentioned yet.

Did I say “user-mode driver”? I meant “user-mode


drivers”.
As said, the UMD is just a DLL. Okay, one that happens to have the
blessing of D3D and a direct pipe to the KMD, but it’s still a regular
DLL, and in runs in the address space of its calling process.

But we’re using multi-tasking OSes nowadays. In fact, we have


been for some time.

This “GPU” thing I keep talking about? That’s a shared resource.


There’s only one that drives your main display (even if you use
SLI/Crossfire). Yet we have multiple apps that try to access it (and
pretend they’re the only ones doing it). This doesn’t just work
automatically; back in The Olden Days, the solution was to only give
3D to one app at a time, and while that app was active, all others
wouldn’t have access. But that doesn’t really cut it if you’re trying to
have your windowing system use the GPU for rendering. Which is
why you need some component that arbitrates access to the GPU
and allocates time-slices and such.

Enter the scheduler.


This is a system component – note the “the” is somewhat
misleading; I’m talking about the graphics scheduler here, not the
CPU or IO schedulers. This does exactly what you think it does – it
arbitrates access to the 3D pipeline by time-slicing it between
different apps that want to use it. A context switch incurs, at the very
least, some state switching on the GPU (which generates extra
commands for the command buffer) and possibly also swapping
some resources in and out of video memory. And of course only
one process gets to actually submit commands to the 3D pipe at
any given time.

You’ll often find console programmers complaining about the fairly


high-level, hands-off nature of PC 3D APIs, and the performance
cost this incurs. But the thing is that 3D APIs/drivers on PC really
have a more complex problem to solve than console games – they
really do need to keep track of the full current state for example,
since someone may pull the metaphorical rug from under them at
any moment! They also work around broken apps and try to fix
performance problems behind their backs; this is a rather annoying
practice that no-one’s happy with, certainly including the driver
authors themselves, but the fact is that the business perspective
wins here; people expect stuff that runs to continue running (and
doing so smoothly). You just won’t win any friends by yelling “BUT
IT’S WRONG!” at the app and then sulking and going through an
ultra-slow path.

Anyway, on with the pipeline. Next stop: Kernel mode!

The kernel-mode driver (KMD)


This is the part that actually deals with the hardware. There may be
multiple UMD instances running at any one time, but there’s only
ever one KMD, and if that crashes, then boom you’re dead – used
to be “blue screen” dead, but by now Windows actually knows how
to kill a crashed driver and reload it (progress!). As long as it
happens to be just a crash and not some kernel memory corruption
at least – if that happens, all bets are off.

The KMD deals with all the things that are just there once. There’s
only one GPU memory, even though there’s multiple apps fighting
over it. Someone needs to call the shots and actually allocate (and
map) physical memory. Similarly, someone must initialize the GPU
at startup, set display modes (and get mode information from
displays), manage the hardware mouse cursor (yes, there’s HW
handling for this, and yes, you really only get one! :), program the
HW watchdog timer so the GPU gets reset if it stays unresponsive
for a certain time, respond to interrupts, and so on. This is what the
KMD does.

There’s also this whole content protection/DRM bit about setting up


a protected/DRM’ed path between a video player and the GPU so
no the actual precious decoded video pixels aren’t visible to any
dirty user-mode code that might do awful forbidden things like dump
them to disk (…whatever). The KMD has some involvement in that
too.

Most importantly for us, the KMD manages the actual command
buffer. You know, the one that the hardware actually consumes.
The command buffers that the UMD produces aren’t the real deal –
as a matter of fact, they’re just random slices of GPU-addressable
memory. What actually happens with them is that the UMD finishes
them, submits them to the scheduler, which then waits until that
process is up and then passes the UMD command buffer on to the
KMD. The KMD then writes a call to command buffer into the main
command buffer, and depending on whether the GPU command
processor can read from main memory or not, it may also need to
DMA it to video memory first. The main command buffer is usually a
(quite small) ring buffer – the only thing that ever gets written there
is system/initialization commands and calls to the “real”, meaty 3D
command buffers.

But this is still just a buffer in memory right now. Its position is
known to the graphics card – there’s usually a read pointer, which is
where the GPU is in the main command buffer, and a write pointer,
which is how far the KMD has written the buffer yet (or more
precisely, how far it has told the GPU it has written yet). These are
hardware registers, and they are memory-mapped – the KMD
updates them periodically (usually whenever it submits a new chunk
of work)…

The bus
…but of course that write doesn’t go directly to the graphics card (at
least unless it’s integrated on the CPU die!), since it needs to go
through the bus first – usually PCI Express these days. DMA
transfers etc. take the same route. This doesn’t take very long, but
it’s yet another stage in our journey. Until finally…

The command processor!


This is the frontend of the GPU – the part that actually reads the
commands the KMD writes. I’ll continue from here in the next
installment, since this post is long enough already :)

Small aside: OpenGL


OpenGL is fairly similar to what I just described, except there’s not
as sharp a distinction between the API and UMD layer. And unlike
D3D, the (GLSL) shader compilation is not handled by the API at all,
it’s all done by the driver. An unfortunate side effect is that there are
as many GLSL frontends as there are 3D hardware vendors, all of
them basically implementing the same spec, but with their own bugs
and idiosyncrasies. Not fun. And it also means that the drivers have
to do all the optimizations themselves whenever they get to see the
shaders – including expensive optimizations. The D3D bytecode
format is really a cleaner solution for this problem – there’s only one
compiler (so no slightly incompatible dialects between different
vendors!) and it allows for some costlier data-flow analysis than you
would normally do.

Omissions and simplifcations


This is just an overview; there’s tons of subtleties that I glossed
over. For example, there’s not just one scheduler, there’s multiple
implementations (the driver can choose); there’s the whole issue of
how synchronization between CPU and GPU is handled that I didn’t
explain at all so far. And so on. And I might have forgotten
something important – if so, please tell me and I’ll fix it! But now,
bye and hopefully see you next time.

A trip through the Graphics Pipeline 2011,


part 2
July 2, 2011

This post is part of the series “A trip through the Graphics


Pipeline 2011”.

Not so fast.
In the previous part I explained the various stages that your 3D
rendering commands go through on a PC before they actually get
handed off to the GPU; short version: it’s more than you think. I then
finished by name-dropping the command processor and how it
actually finally does something with the command buffer we
meticulously prepared. Well, how can I say this – I lied to you. We’ll
indeed be meeting the command processor for the first time in this
installment, but remember, all this command buffer stuff goes
through memory – either system memory accessed via PCI
Express, or local video memory. We’re going through the pipeline in
order, so before we get to the command processor, let’s talk
memory for a second.

The memory subsystem


GPUs don’t have your regular memory subsystem – it’s different
from what you see in general-purpose CPUs or other hardware,
because it’s designed for very different usage patterns. There’s two
fundamental ways in which a GPU’s memory subsystem differs from
what you see in a regular machine:

The first is that GPU memory subsystems are fast. Seriously fast. A
Core i7 2600K will hit maybe 19 GB/s memory bandwidth – on a
good day. With tail wind. Downhill. A GeForce GTX 480, on the
other hand, has a total memory bandwidth of close to 180 GB/s –
nearly an order of magnitude difference! Whoa.

The second is that GPU memory subsystems are slow. Seriously


slow. A cache miss to main memory on a Nehalem (first-generation
Core i7) takes about 140 cycles if you multiply the memory latency
as given by AnandTech by the clock rate. The GeForce GTX 480 I
mentioned previously has a memory access latency of 400-800
clocks. So let’s just say that, measured in cycles, the GeForce GTX
480 has a bit more than 4x the average memory latency of a Core
i7. Except that Core i7 I just mentioned is clocked at 2.93GHz,
whereas GTX 480 shader clock is 1.4 GHz – that’s it, another 2x
right there. Woops – again, nearly an order of magnitude difference!
Wait, something funny is going on here. My common sense is
tingling. This must be one of those trade-offs I keep hearing about in
the news!

Yep – GPUs get a massive increase in bandwidth, but they pay for it
with a massive increase in latency (and, it turns out, a sizable hit in
power draw too, but that’s beyond the scope of this article). This is
part of a general pattern – GPUs are all about throughput over
latency; don’t wait for results that aren’t there yet, do something else
instead!

That’s almost all you need to know about GPU memory, except for
one general DRAM tidbit that will be important later on: DRAM chips
are organized as a 2D grid – both logically and physically. There’s
(horizontal) row lines and (vertical) column lines. At each
intersection between such lines is a transistor and a capacitor; if at
this point you want to know how to actually build memory from these
ingredients, Wikipedia is your friend. Anyway, the salient point
here is that the address of a location in DRAM is split into a row
address and a column address, and DRAM reads/writes internally
always end up accessing all columns in the given row at the same
time. What this means is that it’s much cheaper to access a swath
of memory that maps to exactly one DRAM row than it is to access
the same amount of memory spread across multiple rows. Right
now this may seem like just a random bit of DRAM trivia, but this will
become important later on; in other words, pay attention: this will be
on the exam. But to tie this up with the figures in the previous
paragraphs, just let me note that you can’t reach those peak
memory bandwidth figures above by just reading a few bytes all
over memory; if you want to saturate memory bandwidth, you better
do it one full DRAM row at a time.

The PCIe host interface


From a graphics programmer standpoint, this piece of hardware
isn’t super-interesting. Actually, the same probably goes for a GPU
hardware architect too. The thing is, you still start caring about it
once it’s so slow that it’s a bottleneck. So what you do is get good
people on it to do it properly, to make sure that doesn’t happen.
Other than that, well, this gives the CPU read/write access to video
memory and a bunch of GPU registers, the GPU read/write access
to (a portion of) main memory, and everyone a headache because
the latency for all these transactions is even worse than memory
latency because the signals have to go out of the chip, into the slot,
travel a bit across the mainboard then get to someplace in the CPU
about a week later (or that’s how it feels compared to the CPU/GPU
speeds anyway). The bandwidth is decent though – up to about
8GB/s (theoretical) peak aggregate bandwidth across the 16-lane
PCIe 2.0 connections that most GPUs use right now, so between
half and a third of the aggregate CPU memory bandwidth; that’s a
usable ratio. And unlike earlier standards like AGP, this is a
symmetrical point-to-point link – that bandwidth goes both
directions; AGP had a fast channel from the CPU to the GPU, but
not the other way round.

Some final memory bits and pieces


Honestly, we’re very very close to actually seeing 3D commands
now! So close you can almost taste them. But there’s one more
thing we need to get out of the way first. Because now we have two
kinds of memory – (local) video memory and mapped system
memory. One is about a day’s worth of travel to the north, the other
is a week’s journey to the south along the PCI Express highway.
Which road do we pick?

The easiest solution: Just add an extra address line that tells you
which way to go. This is simple, works just fine and has been done
plenty of times. Or maybe you’re on a unified memory architecture,
like some game consoles (but not PCs). In that case, there’s no
choice; there’s just the memory, which is where you go, period. If
you want something fancier, you add a MMU (memory management
unit), which gives you a fully virtualized address space and allows
you to pull nice tricks like having frequently accessed parts of a
texture in video memory (where they’re fast), some other parts in
system memory, and most of it not mapped at all – to be conjured
up from thing air, or, more usually, by a magic disk read that will
only take about 50 years or so – and by the way, this is not
hyperbole; if you stay with the “memory access = 1 day” metaphor,
that’s really how long a single HD read takes. A quite fast one at
that. Disks suck. But I digress.

So, MMU. It also allows you to defragment your video memory


address space without having to actually copy stuff around when
you start running out of video memory. Nice thing, that. And it
makes it much easier to have multiple processes share the same
GPU. It’s definitely allowed to have one, but I’m not actually sure if
it’s a requirement or not, even though it’s certainly really nice to
have (anyone care to help me out here? I’ll update the article if I get
clarification on this, but tbh right now I just can’t be arsed to look it
up). Anyway, a MMU/virtual memory is not really something you can
just add on the side (not in an architecture with caches and memory
consistency concerns anyway), but it really isn’t specific to any
particular stage – I have to mention it somewhere, so I just put it
here.

There’s also a DMA engine that can copy memory around without
having to involve any of our precious 3D hardware/shader cores.
Usually, this can at least copy between system memory and video
memory (in both directions). It often can also copy from video
memory to video memory (and if you have to do any VRAM
defragmenting, this is a useful thing to have). It usually can’t do
system memory to system memory copies, because this is a GPU,
not a memory copying unit – do your system memory copies on the
CPU where they don’t have to pass through PCIe in both directions!

Update: I’ve drawn a picture (link since this layout is too narrow to
put big diagrams in the text). This also shows some more details –
by now your GPU has multiple memory controllers, each of which
controls multiple memory banks, with a fat hub in the front.
Whatever it takes to get that bandwidth. :)
Okay, checklist. We have a command buffer prepared on the CPU.
We have the PCIe host interface, so the CPU can actually tell us
about this, and write its address to some register. We have the logic
to turn that address into a load that will actually return data – if it’s
from system memory it goes through PCIe, if we decide we’d rather
have the command buffer in video memory, the KMD can set up a
DMA transfer so neither the CPU nor the shader cores on the GPU
need to actively worry about it. And then we can get the data from
our copy in video memory through the memory subsystem. All paths
accounted for, we’re set and finally ready to look at some
commands!

At long last, the command processor!


Our discussion of the command processor starts, as so many things
do these days, with a single word:

“Buffering…”

As mentioned above, both of our memory paths leading up to here


are high-bandwidth but also high-latency. For most later bits in the
GPU pipeline, the method of choice to work around this is to run lots
of independent threads. But in this case, we only have a single
command processor that needs to chew through our command
buffer in order (since this command buffer contains things such as
state changes and rendering commands that need to be executed in
the right sequence). So we do the next best thing: Add a large
enough buffer and prefetch far enough ahead to avoid hiccups.

From that buffer, it goes to the actual command processing front


end, which is basically a state machine that knows how to parse
commands (with a hardware-specific format). Some commands deal
with 2D rendering operations – unless there’s a separate command
processor for 2D stuff and the 3D frontend never even sees it.
Either way, there’s still dedicated 2D hardware hidden on modern
GPUs, just as there’s a VGA chip somewhere on that die that still
supports text mode, 4-bit/pixel bit-plane modes, smooth scrolling
and all that stuff. Good luck finding any of that on the die without a
microscope. Anyway, that stuff exists, but henceforth I shall not
mention it again. :) Then there’s commands that actually hand some
primitives to the 3D/shader pipe, woo-hoo! I’ll take about them in
upcoming parts. There’s also commands that go to the 3D/shader
pipe but never render anything, for various reasons (and in various
pipeline configurations); these are up even later.

Then there’s commands that change state. As a programmer, you


think of them as just changing a variable, and that’s basically what
happens. But a GPU is a massively parallel computer, and you can’t
just change a global variable in a parallel system and hope that
everything works out OK – if you can’t guarantee that everything will
work by virtue of some invariant you’re enforcing, there’s a bug and
you will hit it eventually. There’s several popular methods, and
basically all chips use different methods for different types of state.

 Whenever you change a state, you require that all


pending work that might refer to that state be
finished (i.e. basically a partial pipeline flush).
Historically, this is how graphics chips handled most
state changes – it’s simple and not that costly if you
have a low number of batches, few triangles and a
short pipeline. Alas, batch and triangle counts have
gone up and pipelines have gotten long, so the cost
for this type of approach has shot up. It’s still alive
and kicking for stuff that’s either changed
infrequently (a dozen partial pipeline flushes aren’t
that big a deal over the course of a whole frame) or
just too expensive/difficult to implement with more
specific schemes though.
 You can make hardware units completely stateless.
Just pass the state change command through up to
the stage that cares about it; then have that stage
append the current state to everything it sends
downstream, every cycle. It’s not stored anywhere –
but it’s always around, so if some pipeline stage
wants to look at a few bits in the state it can,
because they’re passed in (and then passed on to
the next stage). If your state happens to be just a
few bits, this is fairly cheap and practical. If it
happens to be the full set of active textures along
with texture sampling state, not so much.
 Sometimes storing just one copy of the state and
having to flush every time that stage changes
serializes things too much, but things would really be
fine if you had two copies (or maybe four?) so your
state-setting frontend could get a bit ahead. Say you
have enough registers (“slots”) to store two versions
of every state, and some active job references slot
0. You can safely modify slot 1 without stopping that
job, or otherwise interfering with it at all. Now you
don’t need to send the whole state around through
the pipeline – only a single bit per command that
selects whether to use slot 0 or 1. Of course, if both
slot 0 and 1 are busy by the time a state change
command is encountered, you still have to wait, but
you can get one step ahead. The same technique
works with more than two slots.
 For some things like sampler or texture Shader
Resource View state, you could be setting very large
numbers of them at the same time, but chances are
you aren’t. You don’t want to reserve state space for
2*128 active textures just because you’re keeping
track of 2 in-flight state sets so you might need it.
For such cases, you can use a kind of register
renaming scheme – have a pool of 128 physical
texture descriptors. If someone actually needs 128
textures in one shader, then state changes are
gonna be slow. (Tough break). But in the more likely
case of an app using less than 20 textures, you have
quite some headroom to keep multiple versions
around.
This is not meant to be a comprehensive list – but the main point is
that something that looks as simple as changing a variable in your
app (and even in the UMD/KMD and the command buffer for that
matter!) might actually need a nontrivial amount of supporting
hardware behind it just to prevent it from slowing things down.

Synchronization
Finally, the last family of commands deals with CPU/GPU and
GPU/GPU synchronization.

Generally, all of these have the form “if event X happens, do Y”. I’ll
deal with the “do Y” part first – there’s two sensible options for what
Y can be here: it can be a push-model notification where the GPU
yells at the CPU to do something right now (“Oi! CPU! I’m entering
the vertical blanking interval on display 0 right now, so if you want to
flip buffers without tearing, this would be the time to do it!”), or it can
be a pull-model thing where the GPU just memorizes that
something happened and the CPU can later ask about it (“Say,
GPU, what was the most recent command buffer fragment you
started processing?” – “Let me check… sequence id 303.”). The
former is typically implemented using interrupts and only used for
infrequent and high-priority events because interrupts are fairly
expensive. All you need for the latter is some CPU-visible GPU
registers and a way to write values into them from the command
buffer once a certain event happens.

Say you have 16 such registers. Then you could


assign currentCommandBufferSeqId to register 0. You assign a
sequence number to every command buffer you submit to the GPU
(this is in the KMD), and then at the start of each command buffer,
you add a “If you get to this point in the command buffer, write to
register 0”. And voila, now we know which command buffer the GPU
is currently chewing on! And we know that the command processor
finishes commands strictly in sequence, so if the first command in
command buffer 303 was executed, that means all command
buffers up to and including sequence id 302 are finished and can
now be reclaimed by the KMD, freed, modified, or turned into a
cheesy amusement park.

We also now have an example of what X could be: “if you get here”
– perhaps the simplest example, but already useful. Other examples
are “if all shaders have finished all texture reads coming from
batches before this point in the command buffer” (this marks safe
points to reclaim texture/render target memory), “if rendering to all
active render targets/UAVs has completed” (this marks points at
which you can actually safely use them as textures), “if all
operations up to this point are fully completed”, and so on.

Such operations are usually called “fences”, by the way. There’s


different methods of picking the values you write into the status
registers, but as far as I am concerned, the only sane way to do it is
to use a sequential counter for this (probably stealing some of the
bits for other information). Yeah, I’m really just dropping that one
piece of random information without any rationale whatsoever here,
because I think you should know. I might elaborate on it in a later
blog post (though not in this series) :).

So, we got one half of it – we can now report status back from the
GPU to the CPU, which allows us to do sane memory management
in our drivers (notably, we can now find out when it’s safe to actually
reclaim memory used for vertex buffers, command buffers, textures
and other resources). But that’s not all of it – there’s a puzzle piece
missing. What if we need to synchronize purely on the GPU side, for
example? Let’s go back to the render target example. We can’t use
that as a texture until the rendering is actually finished (and some
other steps have taken place – more details on that once I get to the
texturing units). The solution is a “wait”-style instruction: “Wait until
register M contains value N”. This can either be a compare for
equality, or less-than (note you need to deal with wraparounds
here!), or more fancy stuff – I’m just going with equals for simplicity.
This allows us to do the render target sync before we submit a
batch. It also allows us to build a full GPU flush operation: “Set
register 0 to ++seqId if all pending jobs finished” / “Wait until register
0 contains seqId”. Done and done. GPU/GPU synchronization:
solved – and until the introduction of DX11 with Compute Shaders
that have another type of more fine-grained synchronization, this
was usually the only synchronization mechanism you had on the
GPU side. For regular rendering, you simply don’t need more.

By the way, if you can write these registers from the CPU side, you
can use this the other way too – submit a partial command buffer
including a wait for a particular value, and then change the register
from the CPU instead of the GPU. This kind of thing can be used to
implement D3D11-style multithreaded rendering where you can
submit a batch that references vertex/index buffers that are still
locked on the CPU side (probably being written to by another
thread). You simply stuff the wait just in front of the actual render
call, and then the CPU can change the contents of the register once
the vertex/index buffers are actually unlocked. If the GPU never got
that far in the command buffer, the wait is now a no-op; if it did, it
spend some (command processor) time spinning until the data was
actually there. Pretty nifty, no? Actually, you can implement this kind
of thing even without CPU-writeable status registers if you can
modify the command buffer after you submit it, as long as there’s a
command buffer “jump” instruction. The details are left to the
interested reader :)

Of course, you don’t necessarily need the set register/wait register


model; for GPU/GPU synchronization, you can just as simply have a
“rendertarget barrier” instruction that makes sure a rendertarget is
safe to use, and a “flush everything” command. But I like the set
register-style model more because it kills two birds (back-reporting
of in-use resources to the CPU, and GPU self-synchronization) with
one well-designed stone.

Update: Here, I’ve drawn a diagram for you. It got a bit convoluted
so I’m going to lower the amount of detail in the future. The basic
idea is this: The command processor has a FIFO in front, then the
command decode logic, execution is handled by various blocks that
communicate with the 2D unit, 3D front-end (regular 3D rendering)
or shader units directly (compute shaders), then there’s a block that
deals with sync/wait commands (which has the publicly visible
registers I talked about), and one unit that handles command buffer
jumps/calls (which changes the current fetch address that goes to
the FIFO). And all of the units we dispatch work to need to send us
back completion events so we know when e.g. textures aren’t being
used anymore and their memory can be reclaimed.

Closing remarks
Next step down is the first one doing any actual rendering work.
Finally, only 3 parts into my series on GPUs, we actually start
looking at some vertex data! (No, no triangles being rasterized yet.
That will take some more time).

Actually, at this stage, there’s already a fork in the pipeline; if we’re


running compute shaders, the next step would already be …
running compute shaders. But we aren’t, because compute shaders
are a topic for later parts! Regular rendering pipeline first.

Small disclaimer: Again, I’m giving you the broad strokes here,
going into details where it’s necessary (or interesting), but trust me,
there’s a lot of stuff that I dropped for convenience (and ease of
understanding). That said, I don’t think I left out anything really
important. And of course I might’ve gotten some things wrong. If you
find any bugs, tell me!

Until the next part…

You might also like