Introduction To Interactive 3D CG Note
Introduction To Interactive 3D CG Note
Course
Introduction
Curriculum
Overview
Instructors
Prerequisites:
No experience required
This course will teach you the principles of 3D computer graphics: meshes, transforms,
lighting, animation, and making interactive 3D applications run in a browser.
Introduction
First demo shown - three.js WebGL materials demo(opens in a new tab) with texture from Humus(opens
in a new tab).
Second demo - three.js WebGL materials bumpmap skin demo(opens in a new tab) with Lee Perry-Smith
head(opens in a new tab).
If you have problems seeing these demos on your browser, please check out the WebGL
Troubleshooting page(opens in a new tab). Still stuck? We'll work more on getting you set up in a few
lessons from now.
You may want to visit Eric Haines' blog - Realtime Rendering(opens in a new tab) - or Twitter feed -
@pointinpolygon(opens in a new tab).
Interactive 3D Rendering
Photographs throughout this course are from Wikimedia Commons(opens in a new tab), unless
otherwise noted. Building(opens in a new tab), XYZ(opens in a new tab), interactive(opens in a new tab).
You can try the brain program(opens in a new tab) yourself. Chrome is the best browser for it. If it
doesn’t work for you, don’t worry - the next lesson will help guide you through setting up WebGL on
your machine.
WebGL Setup
To see if your machine is set up right, try this site(opens in a new tab). If your machine doesn’t show a
spinning cube, read our help page(opens in a new tab) or go to this page(opens in a new tab) and look
under “Implementations”. For Safari on the Mac, follow these instructions(opens in a new tab).
An excellent summary of what supports WebGL is here(opens in a new tab). Some graphics cards are
blacklisted because their drivers are old, never to be updated, and won’t work with WebGL. See this
page(opens in a new tab) for more information. It’s possible to override the blacklisting in Firefox, see
this article(opens in a new tab) - this might be an acceptable “if all else fails” solution, since this course
will be using fairly vanilla WebGL features. Google Chrome has blacklisted XP, so there’s a similar
workaround (opens in a new tab). If all else fails, try different browsers, as they have different
limitations.
WARNING! The demo on the next page has my voice blaring out at a loud level. Be ready to turn down
your volume.
The Wikipedia page on motion blur(opens in a new tab) gives a start on the topic.
Some applications will aim to avoid a rate between 30 and 60 FPS, since then the frame rate doesn’t
align with the refresh rate. This video(opens in a new tab) explains in detail how this mismatch can
cause a sense that the interaction with the application is not smooth. That said, many games simply
strive for as fast a rate as possible.
Math Refresher
This question is simplifying the situation. The 50 Hz rate is actually the interlaced field update rate for
European TV, the frame rate is then 25 Hz. Here we're interested in the time between field refreshes at
50 Hz. If you want to learn more, see this page (opens in a new tab).
This question is simplifying the situation. The 50 Hz rate is actually the interlaced field update rate for
European TV, the frame rate is then 25 Hz. Here we're interested in the time between field refreshes at
50 Hz. If you want to learn more about this aspect of television broadcasting, see this page (opens in a
new tab). In the field of computer graphics we typically don't have interlacing, so this distinction does
not exist.
Some applications will aim to avoid a rate between 30 and 60 FPS, since then the frame rate doesn’t
align with the refresh rate. This video(opens in a new tab) explains in detail how this mismatch can
cause a sense that the interaction with the application is not smooth. That said, many games simply
strive for as fast a rate as possible.
The Eye
See the Wikipedia article on the eye(opens in a new tab) for some truly amazing facts about different
types of eyes.
Incognito(opens in a new tab) is a great book on the brain. The first part is all about how the brain
interprets what the eye sees.
Seeing Is Believing
You can find the original images for the illusion here http://persci.mit.edu/gallery/checkershadow(opens
in a new tab), along with a full explanation of how it works.
For this question, compare the human visual system (eye and brain together) to the lens mechanism of a
camera.
Eyes are fascinating organs, especially since there are a wide range of designs. See the Wikipedia article
on the eye http://en.wikipedia.org/wiki/Eye(opens in a new tab) for some truly amazing knowledge.
Real-Time Rendering
HomeAbout
We all have dumb little blind spots. As a kid, I thought “Achilles” was pronounced “a-chi-elz” and,
heaven knows how, “etiquette” was somehow “eh-teak”. When you say goofy things to other people,
someone eventually corrects you. However, if most of the people around you are making the same
mistake (I’m sorry, “nuclear” is not pronounced “new-cue-lar”, it just ain’t so), the error never gets
corrected. I’ve already mentioned the faux pas of pronouncing SIGGRAPH as “see-graph”, which seems
to be popular among non-researchers (well, admittedly there’s no “correct” pronunciation on that one,
it’s just that when the conference was small and mostly researchers that “sih-graph” was the way to say
it. If the majority now say “see-graph”, so be it – you then identify yourself as a general attendee or a
sales person and I can feel superior to you for no valid reason, thanks).
Certain spelling errors persist in computer graphics, perhaps because it’s more work to give feedback on
writing mistakes. We also see others make the same mistakes and assume they’re correct. So, here are
the two I believe are the most popular goofs in computer graphics (and I can attest that I used to make
them myself, once upon a time):
Tesselation – that’s incorrect, it’s “tessellation”. By all rules of English, this word truly should have just
one “l”: relation, violation, adulation, ululation, emulation, and on and on, they have just one “l”. The
only exceptions I could find with two “l”s were “collation”, “illation” (what the heck is that?), and a word
starting with “fe” (I don’t want this post to get filtered).
The word “tessellation” is derived from “tessella” (plural “tessellae”), which is a small piece of stone or
glass used in a mosaic. It’s the diminutive of “tessera”, which can also mean a small tablet or block used
as a ticket or token (but “tessella” is never a small ticket). Whatever. In Ionic Greek “tesseres” means
“four”, so “tessella” makes sense as being a small four-sided thing. For me, knowing that “tessella” is
from the ancient Greek word for a piece in a mosaic somehow helps me to catch my spelling of it –
maybe it will work for you. I know that in typing “tessella” in this post I still first put a single “l”
numerous times, that’s what English tells me to do.
Google test: searching on “tessellation” on Google gives 2,580,000 pages. Searching on “tesselation -
tessellation”, which gives only pages with the misspelled version, gives 1,800,000 pages. It’s nice to see
that the correct spelling still outnumbers the incorrect, but the race is on. That said, this sort of test is
accurate to within say plus or minus say 350%. If you search on “tessellation -tesselation”, which should
give a smaller number of pages (subtracting out those that I assume say “‘tesselation’ is a misspelling of
‘tessellation'” or that reference a paper with “tesselation” in the title), you get 8,450,000! How you can
get more than 3 times as many pages as just searching on “tessellation” is a mystery. Finally, searching
on “tessellation tesselation”, both words on the same page, gives 3,150,000 results. Makes me want to
go count those pages by hand. No it doesn’t.
One other place to search is the ACM Digital Library. There are 2,973 entries with “tessellation” in them,
375 with “tesselation”. To search just computer graphics publications, GRAPHBIB is a bit clunky but will
do: 89 hits for “tessellation”, 18 hits for the wrong one. Not terrible, but that’s still a solid 20% incorrect.
Frustrum – that’s incorrect, it’s “frustum” (plural “frusta”, which even looks wrong to me – I want to say
“frustra”). The word means a (finite) cone or pyramid with the tip chopped off, and we use it (always) to
mean the pyramidal volume in graphics. I don’t know why the extra “r” got into this word for some
people (myself included). Maybe it’s because the word then sort-of rhymes with itself, the “ru” from the
first part mirrored in the second. But “frustra” looks even more correct to me, no idea why. Maybe it’s
that it rolls off the tongue better.
Morgan McGuire pointed this one out to me as the most common misspelling he sees. As a professor,
he no doubt spends more time teaching about frusta than tessellations. Using the wildly-inaccurate
Google test, there are 673,000 frustum pages and 363,000 “frustrum -frustum” pages. And, confusingly,
again, 2,100,000 “frustum -frustrum” pages, more than three times as many as pages as just “frustum”.
Please explain, someone. For the digital library, 1,114 vs. 53. For GRAPHBIB I was happy to see 42 hits vs.
just 1 hit (“General Clipping on an Oblique Viewing Frustrum”).
So the frustum misspell looks like one that is less likely at the start and is almost gone by the time
practitioners are publishing articles, vs. the tessellation misspell, which appears to have more staying
power.
Addenda: Aaron Hertzmann notes that the US and Britain double their letters differently (“calliper”?
That’s just unnatural, Brits). He also notes the Oxford English Dictionary says about tessellate: “(US also
tesselate)”. Which actually is fine with me, except for the fact that Microsoft Word, Google’s
spellchecker, and even this blog’s software flags “tesselate” as a misspelling. If only we had the
equivalent of the Académie française to decide how we all should spell (on second thought, no).
Spike Hughes notes: “I think the answer for ‘frustrum’ is that it starts out like ‘frustrate’ (and indeed,
seems logically related: the pyramid WANTS to go all the way to the eye point, but is frustrated by the
near-plane).” This makes a lot of sense to me, and would explain why “frustra” feels even more correct.
Maybe that’s the mnemonic aid, like how with “it’s” vs. “its” there’s “It’s a wise dog that knows its own
fleas”. You don’t have to remember the spelling of each “its”, just remember that they differ; then
knowing “it’s” is “it is” means you can derive that the possessive “its” doesn’t have an apostrophe. Or
something. So maybe, “Don’t get frustrated when drawing a frustum”, remembering that they differ.
Andrew Glassner offers: “There’s no rum in a frustum,” because the poor thing has the top chopped off,
so all the rum we poured inside has evaporated.
3D Scene
You can look at the demo shown in the video by going to this link(opens in a new tab), letting it load and
then clicking “Start”.
For an in-depth overview of how three.js labels elements in a scene, see this page(opens in a new tab).
In case you're not watching the video and just doing the quiz, the answer should really be in terms of
pixels per second. Also, please don't use commas or periods in your answer, just numerals.
We’re going to disable the “Hide My Profile” option on Sunday, 7/14/2024, at roughly 8pm CDT. After
this time, you will no longer be able to hide your profile and any optional information you’ve posted in it
will be publicly accessible.
If your profile is now hidden and you put something in it you don’t want others to see, now’s the time to
edit or delete this material. To access your profile, click on:
Your avatar (graphic or initial at upper right) > Preferences icon (bottom of list) > Profile
This calls up a page showing any Profile information you previously entered. Please edit or delete
anything you don’t want others to see. All entries on this page are optional and can be changed at any
time - you can leave all input fields blank if you want.
The reasons for this change and a fuller explanation of what Profiles are may found at:
After much investigation and internal debate, we’ve tentatively decided to disable the Hide
Profile/Presence toggle in the Preferences section that enables users to customize their SDMB
experience. However, we’ll leave the Presence feature enabled. Below is an explanation of what that
means. We’re going to leave this thread open for comments before we do this in case there’s some
downside we’ve overlooked: Your Profile is what pops up when somebody clicks on your user name in a
post. It contai…
ADVERTISEMENT
Factual Questions
Sep 2005
Sep 2005
Sigene
Charter Member
Sep 2005
Lets say its a standard office type fluorescent bulb about 4 ft long. I don’t know how many watts…how
about whatever is standard 40? 60? How many photons would be emitted by such a white light source
per second?
Estimates are fine, I just want a general ball park number that has some credible thought behind it.
For a rough ballpark estimate: if your bulb uses 40 watts, that’s 40 Joules in a second. A
photon of visible light has a wavelength of around 500 nanometres; since the energy of
a photon is given by E = hc/l, where h is Planck’s constant, c is the speed of light, and l
is the wavelength, each photon has… ::: calculates ::: about 4 x 10[sup]-19[/sup] Joules.
So to emit 40 Joules worth of light in a second, the bulb would have to emit about
10[sup]20[/sup] photons in a second.
This is a very rough calculation, of course. To refine it, you’d have to take the efficiency
of the bulb into account (not all 40 watts go into light energy), and look at the spectrum
of the bulb to figure out exactly what the average photon energy is. I suspect that these
factors might cause the final answer to differ by a factor of 10 or so, but not much more
than that.
Actually, incandescent bulbs are easier. You can ignore the variations in emissivity of
the tungsten filament and trweat it like a pure blackbody radiator. According to the RCA
Electro-Optics Handbook, the spectral radiance in photons per second is
n(lambda) = 2c/((lambda)^4)(exp(h(nu)/kT)-1) photons /sec-m^2-steradian-m
(p. 36) That gives you the number of photons per wavelength increment. You gotta
integrate over wavelengths to get the total number of photons.
It’s easier to get the radiant emmittance integrated over all wavelengths and angles in
terms of power rather than photons – it follows the simple Stefan-Boltzman equation
M(Watts per sq. meter) = (sigma)T^4
This is a very rough calculation, of course. To refine it, you’d have to take the efficiency
of the bulb into account (not all 40 watts go into light energy), and look at the spectrum
of the bulb to figure out exactly what the average photon energy is. I suspect that these
factors might cause the final answer to differ by a factor of 10 or so, but not much more
than that.
This site 11 has a table of the photon emission rates of various lamps expressed in
microeinsteins per second. An Einstein represents 1 mole (6.02 X 10[sup]23[/sup]) of
photons, so a microeinstein is 6.02 X 10[sup]17[/sup] photons. Being a plant site, the
values in the table only count photosynthetically active radiation, so there’ll be some
photons missed at the red and blue ends of the spectrum:
The standard measure that quantifies the energy available for photosynthesis is “Photosynthetic
Active Radiation” (aka “Photosynthetic Available Radiation”) or PAR. Contrary to the lumen
measure that takes into account the human eye response, PAR is an unweighted measure. It
accounts with equal weight for all the output a light source emits in the wavelength range
between 400 and 700 nm. PAR also differs from the lumen in the fact that it is not a direct
measure of energy. It is expressed in “number of photons per second”, whose relationship with
“energy per second” (power) is intermediated by the spectral curve of the light source. One
cannot be directly converted into the other without the spectral curve.
A 40 watt cool white fluorescent puts out about 42.4 microensteins per second. That’s
2.55 X 10[sup]19[/sup] photons per second.
So far everyone has based their answer on the electrical power input to the bulb. As soon as I
can get around to it I’ll try to figure out how many photons are in the light output in lumens.
You’re referring to efficacy, which is usually an empirically derived number for real
world purposes, but can be theoretically approximated in many cases. Here’s a page
with some good random lighting links and info.
I suspect that these factors [inefficiency and spectral distribution] might cause the final answer to
differ by a factor of 10 or so, but not much more than that.
Interestingly, when you take inefficiency into account, you’ll get more photons, not less.
Almost all of the energy which goes into a bulb comes out as light. It’s just that only a
small amount of it is visible light. Most of the energy is in infrared light, which has less
energy per photon. So you’ll need more photons total to carry the same amount of
energy.
You’re referring to efficacy, which is usually an empirically derived number for real
world purposes, but can be theoretically approximated in many cases. Here’s a page
with some good random lighting links and info.
Most packaging gives the light output in lumens for the bulb. The OP asked how many
photons of light were put out by a light bulb, not how many photons would be contained
in the power input.
So far I’ve managed to dig up that 1 candela is 1/643 W/unit solid angle, and 1 candela
is 4Pi lumens. Now as soon as I can figure out just what unit solid angle they are talking
about (1 steradian?) the rest is a downhill pull.
But all of the power output of an incandescent is in photons. The OP wasn’t clear about
‘visible light’ or not.
The candela takes into account the sensitivity of the eye. From here.
Hyperphysics:
The candela is the luminous intensity, in a given direction. of a source that emits monochromatic
radiation of frequency 540 x 1012 hertz and that has a radiant intensity in that direction of 1/683
watt per steradian.
The page on the lumen (see ‘light’, then ‘light intensity’) also explains the ‘solid angle’
thing with a diagram.
Would it be asking too much for both photons of visible light and all photons?
Your question is complicated by the fact that photons of different color (wavelength) have
different energies. So you can’t simply convert output in Watts to numbers of photons without
knowing how many photons of which color are present. In the case of blackbody radiation the
relativer components of each color are wel–known, but in other cases, like a fluorescent lamp
with visible phosphors, the problem become more complex. You gotta know how much light of
each color is present.
Almost all of the energy which goes into a bulb comes out as light. It’s just that only a small
amount of it is visible light. Most of the energy is in infrared light, which has less energy per
photon.
It’ll depend more on the housing, shade, and the like than on the bulb technology itself.
Any energy which goes into the bulb will either turn directly into light or into heat (for an
incandescent, it’s all into heat). The energy which went into heat will then leave the bulb
through one of three mechanisms: Conduction, convection, or radiation. For most light
fixtures, radiation would be the dominant form of heat transfer, which would mean that
most of the heat energy is leaving via photons (this is in fact the only mechanism by
which incandescent bulbs produce photons). What frequencies these photons are
produced at will depend on the temperature; for ordinary incandescent bulb
temperatures, most of them are infrared.
It’ll depend more on the housing, shade, and the like than on the bulb technology itself. Any
energy which goes into the bulb will either turn directly into light or into heat (for an
incandescent, it’s all into heat). The energy which went into heat will then leave the bulb through
one of three mechanisms: Conduction, convection, or radiation. For most light fixtures, radiation
would be the dominant form of heat transfer, which would mean that most of the heat energy is
leaving via photons (this is in fact the only mechanism by which incandescent bulbs produce
photons). What frequencies these photons are produced at will depend on the temperature; for
ordinary incandescent bulb temperatures, most of them are infrared.
I’ve looked around some but I haven’t found any data on how much of the heat is
convected away. But you are pretty close to right that the majority is radiated in one
form or another. Even some of that which is conducted away is emitted as radiation. I’ll
withdraw my objection to using power input as a measure of how many photons come
from a light bulb.
Continuing what David Simmons & Chonos were just discussing, the filament gives off
almost 100% of it’s energy input as photons. Very little heat gets conducted back into
the base of the bulb down the filament supports. Now when those photons hit the
frosted glass globe, a bunch of them get absorbed and converted to heat that convects
or oonducts.
So the answer depends a bunch on whether we’re talking about filament output or bulb
output.
History of the Teapot
I confirmed with Jim Blinn on August 11, 2015, that the teapot was squished because it looked nicer.
Picture here(opens in a new tab) of Blinn with a 3D printed teapot, at SIGGRAPH 2015.
There are good articles about the history of the teapot by Frank Crow(opens in a new tab), S.J.
Baker(opens in a new tab), and on Wikipedia(opens in a new tab). There are a number of iconic models
and images in computer graphics(opens in a new tab). Some famous models can be found here(opens in
a new tab) and here(opens in a new tab); my own teapot code is available(opens in a new tab). Teapots
still rule over all, with their own fan club(opens in a new tab) and teapot sightings page(opens in a new
tab). I photographed the whole collection(opens in a new tab). Oh, and Pixar made a short(opens in a
new tab).
The demos shown can be run in your browser: teapot(opens in a new tab), teaspoon(opens in a new
tab), and teacup(opens in a new tab).
The teapot sketch(opens in a new tab) is courtesy of Martin Newell, who is working to put it onto
Wikimedia Commons. The teapotahedron image is courtesy of Erin Shaw. The teapot photos are from
here(opens in a new tab) and here(opens in a new tab) on Wikimedia Commons.
Utah teapot
Article
Talk
Read
Edit
View history
Tools
Appearance hide
Text
Small
Standard
Large
Width
Standard
Wide
Color (beta)
Automatic
Light
Dark
The Utah teapot, or the Newell teapot, is one of the standard reference test models in 3D modeling and
an in-joke[1] within the computer graphics community. It is a mathematical model of an ordinary
Melitta-brand teapot that appears solid with a nearly rotationally symmetrical body. Using a teapot
model is considered the 3D equivalent of a "Hello, World!" program, a way to create an easy 3D scene
with a somewhat complex model acting as the basic geometry for a scene with a light setup. Some
programming libraries, such as the OpenGL Utility Toolkit,[2] even have functions dedicated to drawing
teapots.
The teapot model was created in 1975 by early computer graphics researcher Martin Newell, a member
of the pioneering graphics program at the University of Utah.[3] It was one of the first to be modeled
using Bézier curves rather than precisely measured.
History
The actual Melitta teapot that Martin Newell modelled, displayed at the Computer History Museum in
Mountain View, California (1990–present)
External image
image icon A scan of the original diagram Martin Newell drew up, to plan the Utah Teapot before
inputing it digitally.
The teapot shape contained a number of elements that made it ideal for the graphics experiments of the
time: it was round, contained saddle points, had a genus greater than zero because of the hole in the
handle, could project a shadow on itself, and could be displayed accurately without a surface texture.
Newell made the mathematical data that described the teapot's geometry (a set of three-dimensional
coordinates) publicly available, and soon other researchers began to use the same data for their
computer graphics experiments. These researchers needed something with roughly the same
characteristics that Newell had, and using the teapot data meant they did not have to laboriously enter
geometric data for some other object. Although technical progress has meant that the act of rendering
the teapot is no longer the challenge it was in 1975, the teapot continued to be used as a reference
object for increasingly advanced graphics techniques.
Over the following decades, editions of computer graphics journals (such as the ACM SIGGRAPH's
quarterly) regularly featured versions of the teapot: faceted or smooth-shaded, wireframe, bumpy,
translucent, refractive, even leopard-skin and furry teapots were created.
Having no surface to represent its base, the original teapot model was not intended to be seen from
below. Later versions of the data set fixed this.
The real teapot is 33% taller (ratio 4:3)[5] than the computer model. Jim Blinn stated that he scaled the
model on the vertical axis during a demo in the lab to demonstrate that they could manipulate it. They
preferred the appearance of this new version and decided to save the file out of that preference.[6]
Versions of the teapot model — or sample scenes containing it — are distributed with or freely available
for nearly every current rendering and modelling program and even many graphic APIs, including
AutoCAD, Houdini, Lightwave 3D, MODO, POV-Ray, 3ds Max, and the OpenGL and Direct3D helper
libraries. Some RenderMan-compliant renderers support the teapot as a built-in geometry by calling
RiGeometry("teapot", RI_NULL). Along with the expected cubes and spheres, the GLUT library even
provides the function glutSolidTeapot() as a graphics primitive, as does its Direct3D counterpart D3DX
(D3DXCreateTeapot()). While D3DX for Direct3D 11 does not provide this functionality anymore, it is
supported in the DirectX Tool Kit.[7] Mac OS X Tiger and Leopard also include the teapot as part of
Quartz Composer; Leopard's teapot supports bump mapping. BeOS and Haiku include a small demo of a
rotating 3D teapot, intended to show off the platform's multimedia facilities.
Teapot scenes are commonly used for renderer self-tests and benchmarks.[8][9]
The original, physical teapot was purchased from ZCMI (a department store in Salt Lake City) in 1974. It
was donated to the Boston Computer Museum in 1984, where it was on display until 1990. It now
resides in the ephemera collection at the Computer History Museum in Mountain View, California where
it is catalogued as "Teapot used for Computer Graphics rendering" and bears the catalogue number
X00398.1984.[10] The original teapot the Utah teapot was based on used to be available from Friesland
Porzellan, once part of the German Melitta group.[11][12] Originally it was given the rather plain name
Haushaltsteekanne ('household teapot');[13] the company only found out about their product's
reputation in 2017, whereupon they officially renamed it "Utah Teapot". It was available in three
different sizes and various colors; the one Martin Newell had used is the white "1,4L Utah Teapot".[14]
Appearances
"The Six Platonic Solids", an image that humorously adds the Utah teapot to the five standard Platonic
solids
One famous ray-traced image, by James Arvo and David Kirk in 1987,[15] shows six stone columns, five
of which are surmounted by the Platonic solids (tetrahedron, cube, octahedron, dodecahedron,
icosahedron). The sixth column supports a teapot.[16] The image is titled "The Six Platonic Solids", with
Arvo and Kirk calling the teapot "the newly discovered Teapotahedron".[15] This image appeared on the
covers of several books and computer graphic journals.
The Utah teapot sometimes appears in the "Pipes" screensaver shipped with Microsoft Windows,[17]
but only in versions prior to Windows XP, and has been included in the "polyhedra" XScreenSaver hack
since 2008.[18]
Jim Blinn (in one of his "Project MATHEMATICS!" videos) proves an amusing (but trivial) version of the
Pythagorean theorem: construct a (2D) teapot on each side of a right triangle and the area of the teapot
on the hypotenuse is equal to the sum of the areas of the teapots on the other two sides.[19]
Vulkan and OpenGL graphics APIs feature the Utah teapot along with the Stanford dragon and the
Stanford bunny on their badges.[20]
With the advent of the first computer-generated short films, and later full-length feature films, it has
become an in-joke to hide the Utah teapot in films' scenes.[21] For example, in the movie Toy Story, the
Utah teapot appears in a short tea-party scene. The teapot also appears in The Simpsons episode
"Treehouse of Horror VI" in which Homer discovers the "third dimension."[22] In The Sims 2, a picture of
the Utah teapot is one of the paintings available to buy in-game, titled "Handle and Spout".
An origami version of the teapot, folded by Tomohiro Tachi, was shown at the Tikotin Museum of
Japanese Art in Israel in a 2007–2008 exhibit.[23]
In Oct 2021 "Smithfield Utah" by Alan Butler which was inspired by the Utah teapot was unveiled in
Dublin, Ireland.[24][25]
In The Amazing Digital Circus episode "Candy Carrier Chaos!", the floating blue Utah teapots can be seen
after Pomni and Gummigoo clipped under the map out of bounds.
OBJ conversion
Although the original tea set by Newell can be downloaded directly, this tea set is specified using a set of
Bézier patches in a custom format, which can be difficult to import directly into many popular 3D
modeling applications. As such, a tesselated conversion of the dataset in the popular OBJ file format can
be useful. One such conversion of the complete Newell teaset is available on the University of Utah
website.
3D printing
Through 3D printing, the Utah Teapot has come full circle from being a computer model based on an
actual teapot to being an actual teapot based on the computer model. It is widely available in many
renderings in different materials from small plastic knick-knacks to a fully functional ceramic teapot. It is
sometimes intentionally rendered as a low poly object to celebrate its origin as a computer model.
[citation needed]
In 2009, a Belgian design studio, Unfold, 3D printed the Utah Teapot in ceramic with the objective of
returning the iconographic teapot to its roots as a piece of functional dishware while showing its status
as an icon of the digital world.[26]
In 2015, the California-based company Emerging Objects followed suit, but this time printed the teapot,
along with teacups and teaspoons, out of actual tea.[27]
Gallery
See also
3DBenchy
Cornell box
Stanford bunny
Stanford dragon
Lenna
References
Dunietz, Jesse (February 29, 2016). "The Most Important Object In Computer Graphics History Is This
Teapot". Nautilus. Retrieved March 3, 2019.
Mark Kilgard (February 23, 1996). "11.9 glutSolidTeapot, glutWireTeapot". www.opengl.org. Retrieved
October 7, 2011.
Torrence, Ann (2006). "Martin Newell's original teapot: Copyright restrictions prevent ACM from
providing the full text for this work". ACM SIGGRAPH 2006 Teapot on - SIGGRAPH '06. p. 29.
doi:10.1145/1180098.1180128. ISBN 978-1-59593-364-5. S2CID 23272447. Article No. 29.
"The Utah Teapot - CHM Revolution". Computer History Museum. Retrieved March 20, 2016.
Seymour, Mike (July 25, 2012). "Founders Series: Industry Legend Jim Blinn". fxguide.com. Archived
from the original on July 29, 2012. Retrieved April 15, 2015.
Wald, Ingo; Benthin, Carsten; Slusallek, Philipp (2002). "A Simple and Practical Method for Interactive
Ray Tracing of Dynamic Scenes" (PDF). Technical Report, Computer Graphics Group. Saarland University.
Archived from the original (PDF) on March 23, 2012.
Klimaszewski, K.; Sederberg, T.W. (1997). "Faster ray tracing using adaptive grids". IEEE Computer
Graphics and Applications. 17 (1): 42–51. doi:10.1109/38.576857. S2CID 29664150.
Original Utah Teapot at the Computer History Museum. September 28, 2001. {{cite book}}: |website=
ignored (help)
Sander, Antje; Siems, Maren; Wördemann, Wilfried; Meyer, Stefan; Janssen, Nina (2015). Siems, Maren
(ed.). Melitta und Friesland Porzellan - 60 Jahre Keramikherstellung in Varel [Melitta and Friesland
Porzellan - 60 years manufacturing of ceramics in Varel]. Schloss Museum Jever [de] (in German). Vol.
Jever Heft 33 (1 ed.). Oldenburg, Germany: Isensee Verlag [de]. ISBN 978-3-7308-1177-1. Begleitkatalog
zur Ausstellung: Jeverland - in Ton gebrannt. (48 pages)
Friesland Porzellan [@FrieslandPorzel] (March 24, 2017). "The original Utah Teapot was always
produced by Friesland. We were part of the Melitta Group once, thats right. Got yours already?" (Tweet)
– via Twitter.
"Eine Teekanne als Filmstar" (in German). Radio Bremen. Archived from the original on April 1, 2019.
Retrieved March 1, 2019.
"Teekanne 1,4l Weiß Utah Teapot" (in German). Friesland Versand GmbH. Archived from the original on
March 29, 2023. Retrieved November 15, 2023.
Arvo, James; Kirk, David (1987). "Fast ray tracing by ray classification". ACM SIGGRAPH Computer
Graphics. 21 (4): 55–64. doi:10.1145/37402.37409.
Carlson, Wayne (2007). "A Critical History of Computer Graphics and Animation". OSU.edu. Archived
from the original on February 12, 2012. Retrieved April 15, 2015.
"Windows NT Easter Egg – Pipes Screensaver". The Easter Egg Archive. Retrieved May 5, 2018.
"changelog (Added the missing Utah Teapotahedron to polyhedra)". Xscreensaver. August 10, 2008.
Project Mathematica: Theorem Of Pythagoras. NASA. 1988. Event occurs at 14:00. Retrieved July 28,
2015 – via archive.org.
Rob Williams (March 8, 2018). "Khronos Group Announces Vulkan 1.1". Techgage Networks. Retrieved
January 18, 2020.
"Tempest in a Teapot". Continuum. Winter 2006–2007. Archived from the original on July 12, 2014.
"Pacific Data Images – Homer3". Archived from the original on July 24, 2008.
"Tomohiro Tachi". Treasures of Origami Art. Tikotin Museum of Japanese Art. August 17, 2007.
Retrieved June 18, 2021.
"Dublin City Council commission of public sculpture for Smithfield Square" (PDF). Retrieved April 23,
2023.
"Central Area: Smithfield Square Lower – Sculpture Dublin". Retrieved April 23, 2023.
"Utanalog, Ceramic Utah Teapot". Unfold Design Studio. October 28, 2009. Retrieved May 12, 2015.
Virginia San Fratello & Ronald Rael (2015). "The Utah Tea Set". Emerging Objects. Retrieved May 12,
2015.
External links
S.J. Baker's History of the teapot Archived November 20, 2014, at the Wayback Machine, including patch
data
Teapot history and images, from A Critical History of Computer Graphics and Animation (Wayback
Machine copy)
History of the Teapot video from Udacity's online Interactive 3D Graphics course
The World's Most Famous Teapot - Tom Scott explains the story of Martin Newell's digital creation
(YouTube)
vte
Artificial intelligence
SMPTE color barsEBU colour barsIndian-head test patternEIA 1956 resolution chartBBC Test Card A, B, C,
D, E, F, G, H, J, W, XETP-1Philips circle pattern (PM 5538, PM 5540, PM 5544, PM 5644)Snell & Wilcox
SW2/SW4Telefunken FuBKTVE test cardUEIT
Computer languages
"Hello, World!" programQuineTrabb Pardo–Knuth algorithmMan or boy testJust another Perl hacker
Data compression
3D computer graphics
Machine learning
ImageNetMNIST databaseList
Typography (filler text)
Etaoin shrdluHamburgevonsLorem ipsumThe quick brown fox jumps over the lazy dog
Other
Eric Haines, "A Proposal for Standard Graphics Environments," IEEE Computer Graphics and
Applications, 7(11), Nov. 1987, p. 3-5.
You can download the latest version of the SPD (currently 3.14), and also view the original IEEE CG&A
article from Nov. 1987. The code is on Github.
This software package is not copyrighted and can be used freely. All source is in K&R vanilla C (though
ANSI headers can be enabled) and has been used on many systems.
For a newer set of more realistic environments for benchmarking ray tracers (or renderers in general),
see BART: A Benchmark for Animated Ray Tracing. The focus is software that generates an animated set
of frames for a ray tracer to render. These scenes use an NFF-like language (AFF), and the authors
provide a number of tools for parsing and visualization.
This software is meant to act as a set of basic test images for ray tracing algorithms. The programs
generate databases of objects which are fairly familiar and "standard" to the graphics community, such
as the teapot, a fractal mountain, a tree, a recursively built tetrahedral structure, etc. I originally created
them for my own testing of ray tracing efficiency schemes. Since their first release other researchers
have used them to test new algorithms. In this way, research on algorithmic improvements can be
compared in a more standardized fashion. If one researcher ray-traces a car, another a tree, the
question arises, "How many cars to the tree?" With these databases we may be comparing oranges and
apples, but it's better than comparing oranges and orangutans. Using these statistics along with the
same scenes allows us to compare results in a more meaningful way.
Another interesting use for the SPD has been noted: debugging. By comparing the images and the
statistics with the output of your own ray tracer, you can detect program errors. For example, "mount"
is useful for checking if refraction rays are generated correctly, and "balls" (a.k.a. "sphereflake") can
check for the correctness of eye and reflection rays.
The images for these databases and other information about them can be found in A Proposal for
Standard Graphics Environments, IEEE Computer Graphics and Applications, vol. 7, no. 11, November
1987, pp. 3-5. See IEEE CG&A, vol. 8, no. 1, January 1988, p. 18 for the correct image of the tree
database (the only difference is that the sky is blue, not orange). The teapot database was added later.
The Neutral File Format (NFF) is the default output format from SPD programs. This format is trivial to
parse (if you can use sscanf, you can parse it), and each type of object is defined in human terms (e.g. a
cone is defined by two endpoints and radii). The basic shapes supported are polygon and polygon patch
(normal per vertex), cylinder, cone, and sphere. Note that there are primitives supported within the SPD
which are not part of NFF, e.g. heightfield, NURBS, and torus, so more elaborate programs can be
written. If a format does not support a given primitive, the primitive is tessellated and output as
polygons.
Ares Lagae has written libnff, a modern C++ library for parsing NFF that also supports conversion to
Wavefront OBJ.
I converted the sphereflake demo to a more modern form, for RTX hardware. The code is in the DXR-
Sphereflake directory in this code base (which now won't run, because Falcor has changed). My blog
post is here, gallery here, and longer NVIDIA post here.
POV-Ray 1.0
POV-Ray 3.1
Polyray 1.4 to 1.6
Vivid 2.0
QRT 1.5
Rayshade 4.0.6
RTrace 8.0.0
RenderMan RIB
Apple 3DMF
VRML 1.0
VRML 2.0
Alexander Enzmann receives most of the credit for creating the various file format output routines,
along with many others who contributed.
There are also reader programs for the various formats. Currently the following formats can be read and
converted:
NFF
OBJ
This makes the NFF format a nice, simple language for quickly creating models (whether by hand or by
program), as any NFF file can be converted to many different formats. Warnings:
The conversions tend to be verbose in many cases (e.g. there is currently no code in place to group
polygons of the same material into polygon mesh primitives used in some formats).
No real tessellation of polygons is done when needed for conversion, all that happens are that polygon
fans are created.
You might find the images you obtain are mirror reversed with some formats (e.g. VRML 2.0 files).
The Graphics Gems V code distribution has a simple z-buffer renderer by Raghu Karinthi, using NFF as
the input language.
On hashing: a sore point in mount.c, the fractal mountain generator, has been its hashing function. Mark
VandeWettering has provided a great hashing function by Bob Jenkins. To show what a difference it
makes, check out images of models made with the original hash function with a large size factor,
replacement hash function I wrote (still no cigar), and Jenkins' hash function.
For more information on the SPD, see the README.txt file included in the distribution.
Compatibility Notes
Linux
On some (all?) versions of gcc on Linux, the following correction to the code is necessary:
change to
Timing comparisons for the various scenes using a wide variety of free software ray tracers are
summarized in The Ray Tracing News, 3(1) (many), 6(2), 6(3), 8(3), and 10(3). Here are some research
works which have used the SPD to benchmark their ray tracers (please let me know of others; you can
always search Google Scholar for more):
Kay, Timothy L. and James T. Kajiya, "Ray Tracing Complex Scenes," Computer Graphics (SIGGRAPH '86
Proceedings), 20(4), Aug. 1986, p. 269-78.
Arvo, James and David Kirk, "Fast Ray Tracing by Ray Classification," Computer Graphics (SIGGRAPH '87
Proceedings) 21(4), July 1987, p. 55-64. Also in Tutorial: Computer Graphics: Image Synthesis, Computer
Society Press, Washington, 1988, pp. 196-205. Predates SPD, uses recursive tetrahedron.
Subramanian, K.R., "Fast Ray Tracing Using K-D Trees," Master's Thesis, Dept. of Computer Sciences,
Univ. of Texas at Austin, Dec. 1987. Uses balls, tetra, tree.
Fussell, Donald and K.R. Subramanian "Fast Ray Tracing Using K-D Trees," Technical Report TR-88-07,
Dept. of Computer Sciences, Univ. of Texas at Austin March 1988. Uses balls, tetra, tree.
Salmon, John and Jeffrey Goldsmith "A Hypercube Ray-Tracer," Proceedings of the Third Conference on
Hypercube Computers and Applications , 1988. Uses balls and mountain.
Bouatouch, Kadi and Thierry Priol, "Parallel Space Tracing: An Experience on an iPSC Hypercube," ed. N.
Magnenat-Thalmann and D. Thalmann, New Trends in Computer Graphics (Proceedings of CG
International '88), Springer-Verlag, New York, 1988, p. 170-87. Uses balls.
Priol, Thierry and Kadi Bouatouch, "Experimenting with a Parallel Ray-Tracing Algorithm on a Hypercube
Machine," Eurographics '88, Elsevier Science Publishers, Amsterdam, North-Holland, Sept. 1988, p. 243-
59. Uses balls.
Devillers, Olivier, "The Macro-Regions: an Efficient Space Subdivision Structure for Ray Tracing,"
Eurographics '89, Elsevier Science Publishers, Amsterdam, North-Holland, Sept. 1989, p. 27-38, 541.
(revised version of Technical Report 88-13, Laboratoire d'Informatique de l'Ecole Normale Superieure,
Paris, France, Nov. 1988). Uses balls, tetra.
Priol, Thierry and Kadi Bouatouch, "Static Load Balancing for a Parallel Ray Tracing on a MIMD
Hypercube," The Visual Computer, 5(1/2), March 1989, p. 109-19. Uses balls.
Green, Stuart A. and D.J. Paddon, "Exploiting Coherence for Multiprocessor Ray Tracing," IEEE Computer
Graphics and Applications, 9(6), Nov. 1989, p. 12-26. Uses balls, mount, rings, tetra.
Green, Stuart A. and D.J. Paddon, "A Highly Flexible Multiprocessor Solution for Ray Tracing," The Visual
Computer, 6(2), March 1990, p. 62-73. Uses balls, mount, rings, tetra.
Dauenhauer, David Elliot and Sudhanshu Kumar Semwal, "Approximate Ray Tracing," Proceedings of
Graphics Interface '90, Canadian Information Processing Society, Toronto, Ontario, May 1990, p. 75-82.
Uses balls, gears, tetra.
Badouel, Didier, Kadi Bouatouch, Thierry Priol, "Ray Tracing on Distributed Memory Parallel Computers:
Strategies for Distributing Computations and Data," SIGGRAPH '90 Parallel Algorithms and Architecture
for 3D Image Generation course notes, 1990. Uses mountain, rings, teapot, tetra.
Spackman, John, "Scene Decompositions for Accelerated Ray Tracing". Ph.D. Thesis, The University of
Bath, UK, 1990. Available as Bath Computer Science Technical Report 90/33.
Green, Stuart A., Parallel Processing for Computer Graphics, MIT Press/Pitman Publishing, Cambridge,
Mass./London, 1991. Uses balls, mount, rings, tetra.
Subramanian, K.R. and Donald S. Fussell, "Automatic Termination Criteria for Ray Tracing Hierarchies,"
Proceedings of Graphics Interface '91, Canadian Information Processing Society, Toronto, Ontario, June
1991, p. 93-100. Uses balls, tetra.
Spackman, John N., "The SMART Navigation of a Ray Through an Oct-tree," Computers and Graphics,
vol. 15, no. 2, June 1991, p. 185-194. Code for the ray tracer is available.
Fournier, Alain and Pierre Poulin, "A Ray Tracing Accelerator Based on a Hierarchy of 1D Sorted Lists,"
Proceedings of Graphics Interface '93, Canadian Information Processing Society, Toronto, Ontario, May
1993, p. 53-61. Uses balls, gears, tetra, tree.
Simiakakis, George, and A. Day, "Five-dimensional Adaptive Subdivision for Ray Tracing," Computer
Graphics Forum, 13(2), June 1994, p. 133-140. Uses balls, gears, mount, teapot, tetra, tree.
Matthew Quail, "Space-Time Ray-Tracing using Ray Classification," Thesis project for B.S. with Honours,
Dept. of Computing, School of Maths, Physics, Computing and Electronics, Macquarie University. Uses
mount.
Klimaszewski, Krzysztof and Thomas W. Sederberg, "Faster Ray Tracing Using Adaptive Grids," IEEE
Computer Graphics and Applications 17(1), Jan/Feb 1997, p. 42-51. Uses balls.
Havran, Vlastimil, Tomas Kopal, Jiri Bittner, and Jiri Zara, "Fast robust BSP tree traversal algorithm for ray
tracing," Journal of Graphics Tools, 2(4):15-24, 1997. Uses balls, gears, mount, and tetra.
Nakamaru, Koji and Yoshio Ohno, "Breadth-First Ray Tracing Utilizing Uniform Spatial Subdivision," IEEE
Transactions on Visualization and Computer Graphics, 3(4), Oct-Dec 1997, p. 316-328.
Havran, Vlastimil, Jiri Bittner, and J. Zara, "Ray Tracing with Rope Trees," Proceedings of SCCG'98
Conference, pp. 130-139, April 1998. Uses 5 normal SPD.
Sanna, A., P. Montuschi and M. Rossi, "A Flexible Algorithm for Multiprocessor Ray Tracing,", The
Computer Journal, 41(7), pp. 503-516, 1998. Uses spheres.
Müller, Gordon and Dieter W. Fellner, "Hybrid Scene Structuring with Application to Ray Tracing,"
Proceedings of International Conference on Visual Computing (ICVC'99), Goa, India, Feb. 1999, pp. 19-
26. Uses balls, lattice, tree.
Havran, Vlastimil, and Jiri Bittner, "Rectilinear BSP Trees for Preferred Ray Sets," Proceedings of SCCG'99
conference, pp. 171-179, April/May 1999. Uses lattice, rings, tree.
Havran, Vlastimil and Filip Sixta "Comparison of Hierarchical Grids," Ray Tracing News, 12(1), June 25,
1999. Uses all SPD. Additional statistics are available at this site
Havran, Vlastimil, "A Summary of Octree Ray Traversal Algorithms," Ray Tracing News, 12(2), December
21, 1999. Uses all SPD. Additional statistics are available at this site
Havran, Vlastimil, Jan Prikryl, and Werner Purgathofer, "Statistical Comparison of Ray-Shooting Efficiency
Schemes," Technical Report/TR-186-2-00-14, Technische Universität Wien, Institut für Computergraphik
und Algorithmen, 4 July 2000. Uses all SPD.
Havran, Vlastimil, "Heuristic Ray Shooting Algorithms", Ph.D. Thesis, Czech Technical University,
November 2000. Uses all SPD.
Koji Nakamaru and Yoshio Ohno. "Enhanced breadth-first ray tracing," Journal of Graphics Tools,
6(4):13-28, 2001. Uses all SPD. Renderings include up to a billion primitives.
Simiakakis, George, Th. Theoharis and A. M. Day, "Parallel Ray Tracing with 5D Adaptive Subdivision,"
WSCG 2001 Conference Proceedings, 2001. Uses 5 normal SPD plus teaport.
Havran, Vlastimil and Jiri Bittner: "On Improving KD-Trees for Ray Shooting", Proceedings of WSCG'2002
conference, pp. 209-217, February 2002. Also see Libor Dachs' ray tracing visualization
History of the Teapot
I confirmed with Jim Blinn on August 11, 2015, that the teapot was squished
because it looked nicer. Picture here(opens in a new tab) of Blinn with a 3D
printed teapot, at SIGGRAPH 2015.
There are good articles about the history of the teapot by Frank Crow(opens in a
new tab), S.J. Baker(opens in a new tab), and on Wikipedia(opens in a new tab).
There are a number of iconic models and images in computer graphics(opens in a
new tab). Some famous models can be found here(opens in a new tab) and
here(opens in a new tab); my own teapot code is available(opens in a new tab).
Teapots still rule over all, with their own fan club(opens in a new tab) and teapot
sightings page(opens in a new tab). I photographed the whole collection(opens in
a new tab). Oh, and Pixar made a short(opens in a new tab).
The demos shown can be run in your browser: teapot(opens in a new tab),
teaspoon(opens in a new tab), and teacup(opens in a new tab).
Simple Materials
Photos from Wikimedia Commons: shiny ball(opens in a new tab), glass ball(opens
in a new tab) and light bulb(opens in a new tab).
http://mrdoob.github.com/three.js/examples/webgl_materials.html(opens in a
new tab). Notice how some of the spheres respond to the light moving through
the world.
I should note that what I’m describing is a typical desktop or laptop computer’s
GPU. Portable devices such as smart phones and tablets will usually use tile-based
rendering instead.
The good news is that even this type of architecture can still be controlled by
WebGL.
This question needs a rewording: instead of "At what rate...", please change that
to "Once the pipeline is full, how often do boxes come off this pipeline?" My
apologies for the confusion. Also, note that it's a true pipeline. One cutter does
his cuts and then passes it on to the next cutter, once every five seconds. That
next cutter cuts and passes it to the folder in the next five seconds, while at the
same time the first cutter is now doing cuts on a new box.
I should note that what I’m describing is a typical desktop or laptop computer’s
GPU. Portable devices such as smart phones and tablets will usually use tile-based
rendering instead. There’s a brief explanation of this algorithm here(opens in a
new tab). The good news is that even this type of architecture can still be
controlled by WebGL.
https://fgiesen.wordpress.com/2011/07/09/a-
trip-through-the-graphics-pipeline-2011-index/
Stalling and Starving
Bottleneck is the slowest stage in the GPUs pipeline. It is the stage where the
maximum amount of time passes before each data or information output passes
at the end of the pipeline. It determines how frequently each data or output
comes at the end of the pipe line. It determines the speed of its processing.
Stalling: The situation where a stage in a pipeline awaits to deliver the already
processed and finished output but the next stage does not receive it, until it
finishes processing.
Starving: The situation where a stage in a GPU pipeline waits some time before
the next stage, which is starved, accepts it. It is a waiting stage for a quicker part
of the pipeline before it receives processed data from its precedent, which is the
slowest stage in the GPU pipeline.
It’s been awhile since I posted something here, and I figured I might
use this spot to explain some general points about graphics
hardware and software as of 2011; you can find functional
descriptions of what the graphics stack in your PC does, but usually
not the “how” or “why”; I’ll try to fill in the blanks without getting too
specific about any particular piece of hardware. I’m going to be
mostly talking about DX11-class hardware running D3D9/10/11 on
Windows, because that happens to be the (PC) stack I’m most
familiar with – not that the API details etc. will matter much past this
first part; once we’re actually on the GPU it’s all native commands.
The application
This is your code. These are also your bugs. Really. Yes, the API
runtime and the driver have bugs, but this is not one of them. Now
go fix it already.
More fun: Some of the API state may actually end up being
compiled into the shader – to give an example, relatively exotic (or
at least infrequently used) features such as texture borders are
probably not implemented in the texture sampler, but emulated with
extra code in the shader (or just not supported at all). This means
that there’s sometimes multiple versions of the same shader floating
around, for different combinations of API states.
Incidentally, this is also the reason why you’ll often see a delay the
first time you use a new shader or resource; a lot of the
creation/compilation work is deferred by the driver and only
executed when it’s actually necessary (you wouldn’t believe how
much unused crap some apps create!). Graphics programmers
know the other side of the story – if you want to make sure
something is actually created (as opposed to just having memory
reserved), you need to issue a dummy draw call that uses it to
“warm it up”. Ugly and annoying, but this has been the case since I
first started using 3D hardware in 1999 – meaning, it’s pretty much
a fact of life by this point, so get used to it. :)
Anyway, moving on. The UMD also gets to deal with fun stuff like all
the D3D9 “legacy” shader versions and the fixed function pipeline –
yes, all of that will get faithfully passed through by D3D. The 3.0
shader profile ain’t that bad (it’s quite reasonable in fact), but 2.0 is
crufty and the various 1.x shader versions are seriously whack –
remember 1.3 pixel shaders? Or, for that matter, the fixed-function
vertex pipeline with vertex lighting and such? Yeah, support for all
that’s still there in D3D and the guts of every modern graphics
driver, though of course they just translate it to newer shader
versions by now (and have been doing so for quite some time).
Then there’s things like memory management. The UMD will get
things like texture creation commands and need to provide space
for them. Actually, the UMD just suballocates some larger memory
blocks it gets from the KMD (kernel-mode driver); actually mapping
and unmapping pages (and managing which part of video memory
the UMD can see, and conversely which parts of system memory
the GPU may access) is a kernel-mode privilege and can’t be done
by the UMD.
But the UMD can do things like swizzling textures (unless the GPU
can do this in hardware, usually using 2D blitting units not the real
3D pipeline) and schedule transfers between system memory and
(mapped) video memory and the like. Most importantly, it can also
write command buffers (or “DMA buffers” – I’ll be using these two
names interchangeably) once the KMD has allocated them and
handed them over. A command buffer contains, well, commands :).
All your state-changing and drawing operations will be converted by
the UMD into commands that the hardware understands. As will a
lot of things you don’t trigger manually – such as uploading textures
and shaders to video memory.
The KMD deals with all the things that are just there once. There’s
only one GPU memory, even though there’s multiple apps fighting
over it. Someone needs to call the shots and actually allocate (and
map) physical memory. Similarly, someone must initialize the GPU
at startup, set display modes (and get mode information from
displays), manage the hardware mouse cursor (yes, there’s HW
handling for this, and yes, you really only get one! :), program the
HW watchdog timer so the GPU gets reset if it stays unresponsive
for a certain time, respond to interrupts, and so on. This is what the
KMD does.
Most importantly for us, the KMD manages the actual command
buffer. You know, the one that the hardware actually consumes.
The command buffers that the UMD produces aren’t the real deal –
as a matter of fact, they’re just random slices of GPU-addressable
memory. What actually happens with them is that the UMD finishes
them, submits them to the scheduler, which then waits until that
process is up and then passes the UMD command buffer on to the
KMD. The KMD then writes a call to command buffer into the main
command buffer, and depending on whether the GPU command
processor can read from main memory or not, it may also need to
DMA it to video memory first. The main command buffer is usually a
(quite small) ring buffer – the only thing that ever gets written there
is system/initialization commands and calls to the “real”, meaty 3D
command buffers.
But this is still just a buffer in memory right now. Its position is
known to the graphics card – there’s usually a read pointer, which is
where the GPU is in the main command buffer, and a write pointer,
which is how far the KMD has written the buffer yet (or more
precisely, how far it has told the GPU it has written yet). These are
hardware registers, and they are memory-mapped – the KMD
updates them periodically (usually whenever it submits a new chunk
of work)…
The bus
…but of course that write doesn’t go directly to the graphics card (at
least unless it’s integrated on the CPU die!), since it needs to go
through the bus first – usually PCI Express these days. DMA
transfers etc. take the same route. This doesn’t take very long, but
it’s yet another stage in our journey. Until finally…
Not so fast.
In the previous part I explained the various stages that your 3D
rendering commands go through on a PC before they actually get
handed off to the GPU; short version: it’s more than you think. I then
finished by name-dropping the command processor and how it
actually finally does something with the command buffer we
meticulously prepared. Well, how can I say this – I lied to you. We’ll
indeed be meeting the command processor for the first time in this
installment, but remember, all this command buffer stuff goes
through memory – either system memory accessed via PCI
Express, or local video memory. We’re going through the pipeline in
order, so before we get to the command processor, let’s talk
memory for a second.
The first is that GPU memory subsystems are fast. Seriously fast. A
Core i7 2600K will hit maybe 19 GB/s memory bandwidth – on a
good day. With tail wind. Downhill. A GeForce GTX 480, on the
other hand, has a total memory bandwidth of close to 180 GB/s –
nearly an order of magnitude difference! Whoa.
Yep – GPUs get a massive increase in bandwidth, but they pay for it
with a massive increase in latency (and, it turns out, a sizable hit in
power draw too, but that’s beyond the scope of this article). This is
part of a general pattern – GPUs are all about throughput over
latency; don’t wait for results that aren’t there yet, do something else
instead!
That’s almost all you need to know about GPU memory, except for
one general DRAM tidbit that will be important later on: DRAM chips
are organized as a 2D grid – both logically and physically. There’s
(horizontal) row lines and (vertical) column lines. At each
intersection between such lines is a transistor and a capacitor; if at
this point you want to know how to actually build memory from these
ingredients, Wikipedia is your friend. Anyway, the salient point
here is that the address of a location in DRAM is split into a row
address and a column address, and DRAM reads/writes internally
always end up accessing all columns in the given row at the same
time. What this means is that it’s much cheaper to access a swath
of memory that maps to exactly one DRAM row than it is to access
the same amount of memory spread across multiple rows. Right
now this may seem like just a random bit of DRAM trivia, but this will
become important later on; in other words, pay attention: this will be
on the exam. But to tie this up with the figures in the previous
paragraphs, just let me note that you can’t reach those peak
memory bandwidth figures above by just reading a few bytes all
over memory; if you want to saturate memory bandwidth, you better
do it one full DRAM row at a time.
The easiest solution: Just add an extra address line that tells you
which way to go. This is simple, works just fine and has been done
plenty of times. Or maybe you’re on a unified memory architecture,
like some game consoles (but not PCs). In that case, there’s no
choice; there’s just the memory, which is where you go, period. If
you want something fancier, you add a MMU (memory management
unit), which gives you a fully virtualized address space and allows
you to pull nice tricks like having frequently accessed parts of a
texture in video memory (where they’re fast), some other parts in
system memory, and most of it not mapped at all – to be conjured
up from thing air, or, more usually, by a magic disk read that will
only take about 50 years or so – and by the way, this is not
hyperbole; if you stay with the “memory access = 1 day” metaphor,
that’s really how long a single HD read takes. A quite fast one at
that. Disks suck. But I digress.
There’s also a DMA engine that can copy memory around without
having to involve any of our precious 3D hardware/shader cores.
Usually, this can at least copy between system memory and video
memory (in both directions). It often can also copy from video
memory to video memory (and if you have to do any VRAM
defragmenting, this is a useful thing to have). It usually can’t do
system memory to system memory copies, because this is a GPU,
not a memory copying unit – do your system memory copies on the
CPU where they don’t have to pass through PCIe in both directions!
Update: I’ve drawn a picture (link since this layout is too narrow to
put big diagrams in the text). This also shows some more details –
by now your GPU has multiple memory controllers, each of which
controls multiple memory banks, with a fat hub in the front.
Whatever it takes to get that bandwidth. :)
Okay, checklist. We have a command buffer prepared on the CPU.
We have the PCIe host interface, so the CPU can actually tell us
about this, and write its address to some register. We have the logic
to turn that address into a load that will actually return data – if it’s
from system memory it goes through PCIe, if we decide we’d rather
have the command buffer in video memory, the KMD can set up a
DMA transfer so neither the CPU nor the shader cores on the GPU
need to actively worry about it. And then we can get the data from
our copy in video memory through the memory subsystem. All paths
accounted for, we’re set and finally ready to look at some
commands!
“Buffering…”
Synchronization
Finally, the last family of commands deals with CPU/GPU and
GPU/GPU synchronization.
Generally, all of these have the form “if event X happens, do Y”. I’ll
deal with the “do Y” part first – there’s two sensible options for what
Y can be here: it can be a push-model notification where the GPU
yells at the CPU to do something right now (“Oi! CPU! I’m entering
the vertical blanking interval on display 0 right now, so if you want to
flip buffers without tearing, this would be the time to do it!”), or it can
be a pull-model thing where the GPU just memorizes that
something happened and the CPU can later ask about it (“Say,
GPU, what was the most recent command buffer fragment you
started processing?” – “Let me check… sequence id 303.”). The
former is typically implemented using interrupts and only used for
infrequent and high-priority events because interrupts are fairly
expensive. All you need for the latter is some CPU-visible GPU
registers and a way to write values into them from the command
buffer once a certain event happens.
We also now have an example of what X could be: “if you get here”
– perhaps the simplest example, but already useful. Other examples
are “if all shaders have finished all texture reads coming from
batches before this point in the command buffer” (this marks safe
points to reclaim texture/render target memory), “if rendering to all
active render targets/UAVs has completed” (this marks points at
which you can actually safely use them as textures), “if all
operations up to this point are fully completed”, and so on.
So, we got one half of it – we can now report status back from the
GPU to the CPU, which allows us to do sane memory management
in our drivers (notably, we can now find out when it’s safe to actually
reclaim memory used for vertex buffers, command buffers, textures
and other resources). But that’s not all of it – there’s a puzzle piece
missing. What if we need to synchronize purely on the GPU side, for
example? Let’s go back to the render target example. We can’t use
that as a texture until the rendering is actually finished (and some
other steps have taken place – more details on that once I get to the
texturing units). The solution is a “wait”-style instruction: “Wait until
register M contains value N”. This can either be a compare for
equality, or less-than (note you need to deal with wraparounds
here!), or more fancy stuff – I’m just going with equals for simplicity.
This allows us to do the render target sync before we submit a
batch. It also allows us to build a full GPU flush operation: “Set
register 0 to ++seqId if all pending jobs finished” / “Wait until register
0 contains seqId”. Done and done. GPU/GPU synchronization:
solved – and until the introduction of DX11 with Compute Shaders
that have another type of more fine-grained synchronization, this
was usually the only synchronization mechanism you had on the
GPU side. For regular rendering, you simply don’t need more.
By the way, if you can write these registers from the CPU side, you
can use this the other way too – submit a partial command buffer
including a wait for a particular value, and then change the register
from the CPU instead of the GPU. This kind of thing can be used to
implement D3D11-style multithreaded rendering where you can
submit a batch that references vertex/index buffers that are still
locked on the CPU side (probably being written to by another
thread). You simply stuff the wait just in front of the actual render
call, and then the CPU can change the contents of the register once
the vertex/index buffers are actually unlocked. If the GPU never got
that far in the command buffer, the wait is now a no-op; if it did, it
spend some (command processor) time spinning until the data was
actually there. Pretty nifty, no? Actually, you can implement this kind
of thing even without CPU-writeable status registers if you can
modify the command buffer after you submit it, as long as there’s a
command buffer “jump” instruction. The details are left to the
interested reader :)
Update: Here, I’ve drawn a diagram for you. It got a bit convoluted
so I’m going to lower the amount of detail in the future. The basic
idea is this: The command processor has a FIFO in front, then the
command decode logic, execution is handled by various blocks that
communicate with the 2D unit, 3D front-end (regular 3D rendering)
or shader units directly (compute shaders), then there’s a block that
deals with sync/wait commands (which has the publicly visible
registers I talked about), and one unit that handles command buffer
jumps/calls (which changes the current fetch address that goes to
the FIFO). And all of the units we dispatch work to need to send us
back completion events so we know when e.g. textures aren’t being
used anymore and their memory can be reclaimed.
Closing remarks
Next step down is the first one doing any actual rendering work.
Finally, only 3 parts into my series on GPUs, we actually start
looking at some vertex data! (No, no triangles being rasterized yet.
That will take some more time).
Small disclaimer: Again, I’m giving you the broad strokes here,
going into details where it’s necessary (or interesting), but trust me,
there’s a lot of stuff that I dropped for convenience (and ease of
understanding). That said, I don’t think I left out anything really
important. And of course I might’ve gotten some things wrong. If you
find any bugs, tell me!