0% found this document useful (0 votes)

112 views

Fast Inverse Square Root

Gennerraly in anaylysics the complexity of algorithm we assign for timecomplexity of arithmetic operator is O(1). But inside it is many lines of asmbly code to calculate on bit. In late of 1990s and soon of 2000s, the hardware was very limited so that the calculate of value √1x is a big problem. In this report, we will respresent a algorithm to calc this in O(1) of bit code, which was a art of bit and mathematic. Index Terms—algorithm, fast inverse square root, ieee754

Uploaded by

Maxminlevel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views

Fast Inverse Square Root

Uploaded by

Maxminlevel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Fast Inverse Square Root

September 21, 2022

Abstract
Gennerraly in anaylysics the complexity of algorithm we assign for
timecomplexity of arithmetic operator is O(1). But inside it is many lines
of asmbly code to calculate on bit. In late of 1990s and soon of 2000s,
the hardware was very limited so that the calculate of value √1x is a big
problem. In this report, we will respresent a algorithm to calc this in O(1)
of bit code, which was a art of bit and mathematic.
Index Terms—algorithm, fast inverse square root, ieee754

1 Problem
This problem appear when you have to normalize the vector to length 1 for ray
tracing, reflection, ... You have light come into a plane with vector (x,y,z) and
the normal vector of plane is (a,b,c) to determine the vector of light come out
you have to normalize the normal vector of plane (make the lenghth to 1). The
result:
a b c
⃗n = ( √ ,√ ,√ )
a2 2
+b +c2 2 2
a +b +c2 a + b2 + c2
2

The core calculation for this problem is fast inverse square root. Imagine in a
3d-game, there are many object need to reflection but the CPU is no unlimited
and then FISR was born. The problem is the value of √1x . The context is
doing this math on a computer with structure of integer and floating point.
Breakdown the problem:
• Algorithm to find square root
• How to make divide 1 and a value as fast as able

– Datatype
– Instructions set

1
2 Approaches
2.1 Naive approach
The naive approach that anyone could come up with is to use floating-point
division and the square root function.
1 f l o a t n a i v e s q r t ( f l o a t number ) {
2 r e t u r n 1 / s q r t ( number ) ;
3 }

2.2 Quake III Areana approach

Before the year 2000, current CPUs did not support the reciprocal (or mul-
tiplicative inverse) of the square root instruction, so the operation 1/sqrt(x)
came with huge overhead in term of time. This is a major issue for graphically
intensive software, such as video games, where these calculations are performed
millions of times per second, and even a minor improvement in the performance
of such a calculation can significantly increase the graphics computation speed
and, as a result, the game’s frame rate.
The method is implemented exactly as it existed in the Quake engine, with
the same code and notes. This approximation’s overall concept is simply a
Newton-Raphson iteration. The Newton-Raphson root finding method works
like this: given an approximation of a numbers root, c, a better approximation
is found using.
c
y = − (3 − x − c2 )
2
1 f l o a t Q s q r t ( f l o a t number ) {
2 long i ;
3 f l o a t x2 , y ;
4 const f l o a t t h r e e h a l f s = 1.5F ;
5
6 x2 = number ∗ 0 . 5 F ;
7 y = number ;
8 // e v i l f l o a t i n g point b i t l e v e l hacking
9 i = ∗ ( l o n g ∗ ) &y ;
10 // what t h e #### ?
11 i = 0 x 5 f 3 7 5 9 d f − ( i >> 1 ) ;
12 y = ∗ ( f l o a t ∗ ) &i ;
13 // 1 st iteration
14 y = y ∗ ( t h r e e h a l f s − ( x2 ∗ y ∗ y ) ) ;
15 // 2nd i t e r a t i o n
16 // y = y ∗ ( t h r e e h a l f s − ( x2 ∗ y ∗ y ) ) ;
17 // t h i s l i n e can be removed
18 }

3 Background
This part give you at least knowledge to know why the code written like that
and how the code work. It includes 2 main topic the floating point and approx-

2
Figure 1: IEEE 754

imation by Newton’s method

3.1 IEEE754
The IEEE Standard for Floating-Point Arithmetic wrote in 1985 is the core-
valueable in IEEE 754 after two time update in 2008 and 2019. It is consist of
many type of floating point number:
• Normalised number

• Denormalised number
• Not a Number
• Infinities
• 0 and -0

But in this report we only represent the normalised number, the other type is
not the type for input of FISR algorithm. The value of a IEEE-754 number is
computed as:
sign 2exponent mantissa (1)

3.2 Newton-Rapson Method

The statement:
Let f :RßR be a differentiable function. We seek a solution of f (x) = 0,
starting from an initial estimate x = x1 .
At the n′ th step, given xn, compute the next approximation xn+1 by xn+1 =
xn f (xn )f ′ (xn ) and repeat.
The idea is base on geometry. Firstly we get a point (x1 , y1 ) on curse then
draw linear approximation and it intersect x-axis in x2 . Repeat the process with
point (x2 , y2 ) . . . we will rapidly reach the solution x0 .
Math expsression: The line has gradient f ′ (x1), and passes through (x1, y1),
so has equation
y − y1
= f ′ (x1), orequivalently, y = f ′ (x1 )(x − x1 ) + y1 (2)
x − x1
Setting y = 0, we find the x − intercept as

y1 f (x1 )
x = x1 − = x1 − ′ (3)
f ′ (x1 ) f (x1 )

3
Figure 2: Newton Approximation Method

Figure 3: Approximate log2 (1 + x)

The same calculation applies at each stage: so from the n’th approximation
xn, the next approximation is given by

xn+1 = xn − f (xn )f ′ (xn ) (4)

3.3 Approximate log2 (1 + x)

The context is x is integer from 0 to 1.
We will approximate log2 (1 + x). With the small value of x, log2 (1 + x) is
approximately equal to x.This is expressed through the graph below.
Blue line is log2 (1 + x) and the green line is y = x. This approximation is
true for x equal to 0 and 1, as on the graph of these two lines intersect. Now we
will proceed with the road y = x about the µ constant. This constant can choose
any. If µ = 0 will return to the original line. But when µ = 0.043, it will give the
smallest average error between 0 and 1. We will approximate log2 (1 + m/223 ).
Because m is in paragraph from 0 to 223 , m/223 is in paragraph from 0 to 1, so
we can approximate log2 (1 + m/223 )=m/(223 )+µ

4
4 Method
Interpreting this function may be a little difficult because of the bit hacking and
lack of proper documentation. However it can be broken down into the following
sections and interpreted. The casting between (long*) and (float*) is one detail.
The formula y = y ∗ (threehalf s − (x2 ∗ y ∗ y)) is another detail. But these are
simple compared to the line that should jump out; i = 0x5f 3759df − (i >> 1).

4.1 Bit hacking

The function contains two explicit castings between (long*) and (float*).
For example, the line i=*(long*)y. This line takes the address of y, which
is of type float, and casts it to a pointer of type long integer, then stores the
value at that address in i. This is basically interpreting the float bits as an
integer without converting it with any intelligence. The other casting works
similarly but from integer back to float.
This concept relies on how floats are stored in memory as explained in section
IEEE 754 So the integer interpretation of a float’s bits, I, can be expressed as
I = 231 s + 223 E + M
where s is the sign bit, 1 means negative, E is the exponent field, bias 127
and finally, M is the fractional part of the normalized mantissa.

4.2 Approximation
The function contains the formula y = y (threehalfs (x2 y y)). This comes
from the Newton-Raphson method on f(x) = 1 x 2 h which, from Equation 2.1,
yields an xn+1 of xn+1 = xn 2
3 hx2 n
This can be rearranged to more closely resemble the code as
xn+1 = xn 3 2 h 2 x 2 n
where xn is y and h/2 is x2. This step refines the answer, and can be
repeated. Notice in the Quake III: Arena source code it is commented out the
second time‘

5 Experiment
5.1 Setup
We test FISR by evaluating its accuracy and performance with regard to the
naive approach.
When comparing accuracy, we use Relative Error as the metric which is
calculated based on the results of the FIRQ and the straightforward 1/sqrt(x)
approach. In terms of performance, more specifically we will evaluate the FISR
based on its execution time and the number of instructions it consumps. To
achieve this, we use perf - performance counters for Linux. We will implement

5
the program to comparing between different approaches and execute it on our
machine (Intel(R) Core(TM) i7-8550U 1.80GHz).
All the diagrams presented in this section are generated from programs writ-
ten in python

5.2 Accuracy
Relative error (RE) is the ratio of the absolute error between the prediction and
groundtruth to the prediction. Relative error is expressed as percentage.

|ypred − y|
RE(ypred , y) = (5)
y
Conduct the experiment by calculating the relative error between fastIn-
vSqrt(x) and 1/sqrt(x) with x from 1 to 10000. The results are visualized as
shown in 5 and 6.
As shown in 5, the relative error fluctuates periodically but does not exceed
0.3%. The relative error even lower when we integrate the second iteration
into the fast inverse square root. It can be easily seen that, with the second
iteration of Newton’s method, the relative error is maximum at 1, 75.10−5 , this
approximation is good enough to use, besides, the time it takes to compute the
approximation is also very fast.

5.3 Performance
To evaluate the performance, we use different approaches to calculate the inverse
square root of every integer number from 1 to 231 − 1. As mentioned in the 5.1,
we use perf to get the statistics of the execution of each approach.

5.3.1 Speed

Table 1: Average & variance of execution time (seconds) of each approach. The best
number is in bold.

Approach Execution time

Fast Inverse Square Root 3.943±0.0052
1.0/sqrt(x) 7.793±0.0057
pow(x, -0.5f) 17.618±0.01

The table shows that FISR is roughly 2 times faster than the normal ap-
proach. With those CPUs in the past, it can be 3 to 4 times faster.

5.3.2 Instructions
As shown in Table 2, the fast approach has the fewest instructions to execute
even though modern CPUs have dedicated instruction for computing square
root. This also partly explains why FISR is so fast.

6
Figure 4: The approximate result fits perfectly with groudtruth result

Figure 5: The relative error when approximating inverse square root with the 1st
iteration of Newton’s method

Figure 6: The relative error when approximating inverse square root with the 1st
iteration of Newton’s method

7
Table 2: Average instructions of each approach. The best number is in bold.

Approach Instructions
Fast Inverse Square Root 45, 105, 476, 499
1.0/sqrt(x) 53, 699, 853, 657
pow(x, -0.5f) 173, 970, 220, 040

6 Application
6.1 Context
The Quake III algorithm used to be the optimised approach back in the day
when CPUs are not as powerful as they are today. However, later generations of
CPUs gradually adapt and support the reciprocal of the square root instruction,
e.g. the rsqrtss instruction found in the [?] x86 instruction set..
Therefore, the fast inverse square root is not applicationable. Still, this
algorithm remains one of the beauty of mathematics in the world of computers
1 f l o a t s q r t ( f l o a t number ) {
2 r e t u r n pow ( number , −0.5 f ) ;
3 }

6.2 Data
No need data

7 Reference
• Fast Inverse Square Root — A Quake III Algorithm
https://www.youtube.com/watch?v=p8u k2LIZyo
• FISR, https://blog.timhutt.co.uk/fast-inverse-square-root/
• SSE Instruction, https://docs.oracle.com/cd/E37838 01/html/E61064/eojde.html
• IEEE754, https://standards.ieee.org/ieee/754/993/
• Newston-Rapson Method
https://amsi.org.au/ESA Senior Years/SeniorTopic3/3j/3j 2content 2.html
• https://ece.uwaterloo.ca/ dwharder/aads/Algorithms/Inverse square root/
• http://i.stanford.edu/pub/cstr/reports/csl/tr/94/647/CSL-TR-94-647.pdf
• https://www.linkedin.com/pulse/fast-inverse-square-root-still-armin-kassemi-
langroodi/
• Chris Lomont, http://www.lomont.org/papers/2003/InvSqrt.pdf

Hot Wire Manchine - User Manual-EN
No ratings yet
Hot Wire Manchine - User Manual-EN
41 pages
Numerical Analysis Notes
No ratings yet
Numerical Analysis Notes
73 pages
Numerical Methods: Jeffrey R. Chasnov
No ratings yet
Numerical Methods: Jeffrey R. Chasnov
60 pages
CHAPTER 1: Introduction To Software Applications: Objectives
100% (1)
CHAPTER 1: Introduction To Software Applications: Objectives
5 pages
Fast Inverse Square Root
No ratings yet
Fast Inverse Square Root
12 pages
Efficiently Computing The Inverse Square Root Using Integer Operations
No ratings yet
Efficiently Computing The Inverse Square Root Using Integer Operations
13 pages
406Fast Inverse Square Root
No ratings yet
406Fast Inverse Square Root
16 pages
Unit 4 - 2
No ratings yet
Unit 4 - 2
21 pages
Computation: A Modification of The Fast Inverse Square Root Algorithm
No ratings yet
Computation: A Modification of The Fast Inverse Square Root Algorithm
14 pages
Final Version
No ratings yet
Final Version
14 pages
Lecture Notes On Numerical Analysis
No ratings yet
Lecture Notes On Numerical Analysis
68 pages
Course Note
No ratings yet
Course Note
121 pages
Fast Floating Point Square Root: Thomas F. Hain, David B. Mercer
No ratings yet
Fast Floating Point Square Root: Thomas F. Hain, David B. Mercer
7 pages
Cos323 s06 Lecture02 Rootfinding
No ratings yet
Cos323 s06 Lecture02 Rootfinding
23 pages
Applications of Numerical Methods
No ratings yet
Applications of Numerical Methods
54 pages
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
No ratings yet
Floating-Point Inverse Square Root Algorithm Based On Taylor-Series Expansion
5 pages
MATH 2160 Numerical Analysis 1 Notes: S. H. Lui Department of Mathematics University of Manitoba
No ratings yet
MATH 2160 Numerical Analysis 1 Notes: S. H. Lui Department of Mathematics University of Manitoba
111 pages
AM341
No ratings yet
AM341
118 pages
report_squareroot_1.6
No ratings yet
report_squareroot_1.6
59 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
Brent Elementary
No ratings yet
Brent Elementary
10 pages
Front Matter
No ratings yet
Front Matter
10 pages
Numerical Methods: I. Finding Roots II. Integrating Functions
No ratings yet
Numerical Methods: I. Finding Roots II. Integrating Functions
53 pages
Num Computing Notes Only
No ratings yet
Num Computing Notes Only
102 pages
Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
100% (1)
Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
102 pages
Numerical Methods
No ratings yet
Numerical Methods
60 pages
Repetition SquareRoot
No ratings yet
Repetition SquareRoot
3 pages
CBNST Notes For BCA PU 3rd Sem Based On Syllabus PDF
100% (1)
CBNST Notes For BCA PU 3rd Sem Based On Syllabus PDF
27 pages
Generalising The Fast Reciprocal Square Root Algorithm: Mike Day
No ratings yet
Generalising The Fast Reciprocal Square Root Algorithm: Mike Day
19 pages
Lecture 01
No ratings yet
Lecture 01
2 pages
Main PDF
No ratings yet
Main PDF
137 pages
Chapter 4
No ratings yet
Chapter 4
37 pages
Rgreenfastermath gdc02
No ratings yet
Rgreenfastermath gdc02
22 pages
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
No ratings yet
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
104 pages
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
No ratings yet
Lecture Notes On Numerical Methods For Engineering (?) : Pedro Fortuny Ayuso
104 pages
ECE 3040 Lecture 6: Programming Examples: © Prof. Mohamad Hassoun
No ratings yet
ECE 3040 Lecture 6: Programming Examples: © Prof. Mohamad Hassoun
17 pages
Text Book
No ratings yet
Text Book
129 pages
Numerical Methods: A Manual
No ratings yet
Numerical Methods: A Manual
61 pages
Scientific Computation (COMS 3210) Bigass Study Guide: Spring 2012
No ratings yet
Scientific Computation (COMS 3210) Bigass Study Guide: Spring 2012
30 pages
Numerical Methods
No ratings yet
Numerical Methods
106 pages
Notes
No ratings yet
Notes
60 pages
Midterm Review: 1 Root-Finding Methods
No ratings yet
Midterm Review: 1 Root-Finding Methods
6 pages
Curtis F. Gerald Patrick O. Wheatley - Applied Numerical Analysis - Solutions Manual PDF
No ratings yet
Curtis F. Gerald Patrick O. Wheatley - Applied Numerical Analysis - Solutions Manual PDF
124 pages
metnum_V5
No ratings yet
metnum_V5
114 pages
Lab NM
No ratings yet
Lab NM
16 pages
A Textbook of Numerical Methods3934170332936108873
No ratings yet
A Textbook of Numerical Methods3934170332936108873
131 pages
Curseng
No ratings yet
Curseng
230 pages
Buch Gander Kwok
No ratings yet
Buch Gander Kwok
10 pages
Reference Book For Numerical Analysis
100% (3)
Reference Book For Numerical Analysis
231 pages
Unit 1 MT 202 CBNST
No ratings yet
Unit 1 MT 202 CBNST
24 pages
Numerical Analysis Durham UNI
No ratings yet
Numerical Analysis Durham UNI
87 pages
Numerical Analysis I-1
100% (1)
Numerical Analysis I-1
205 pages
Numerical
No ratings yet
Numerical
146 pages
Numerical Analysis
No ratings yet
Numerical Analysis
117 pages
Rood Findings
No ratings yet
Rood Findings
7 pages
Adv Math Prog
No ratings yet
Adv Math Prog
30 pages
Parallel_Square_and_Cube_Computations
No ratings yet
Parallel_Square_and_Cube_Computations
6 pages
Sheets Tw3710tu s3 Handout
No ratings yet
Sheets Tw3710tu s3 Handout
13 pages
Siddhant CM File
No ratings yet
Siddhant CM File
37 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
CHAPTER 4 Dimensioning New
No ratings yet
CHAPTER 4 Dimensioning New
8 pages
Multimedia Systems in Libraries and Their Applications: DR CK Rarnaiah
No ratings yet
Multimedia Systems in Libraries and Their Applications: DR CK Rarnaiah
16 pages
Movie Cube q100
No ratings yet
Movie Cube q100
45 pages
A Novel Image Style Transfer Model Using Generative AI
No ratings yet
A Novel Image Style Transfer Model Using Generative AI
72 pages
User Manual: Shimadzu Advanced Flow Technology Detector Switching Software
No ratings yet
User Manual: Shimadzu Advanced Flow Technology Detector Switching Software
17 pages
(Ebook) Programming with Microsoft Visual Basic 2015 by Diane Zak ISBN 9781285860268, 1285860268 download pdf
100% (7)
(Ebook) Programming with Microsoft Visual Basic 2015 by Diane Zak ISBN 9781285860268, 1285860268 download pdf
67 pages
JMobile Training Day1 v16
No ratings yet
JMobile Training Day1 v16
142 pages
Disease Identification and Detection in Apple Tree
No ratings yet
Disease Identification and Detection in Apple Tree
5 pages
KeyFigure Remodelling
No ratings yet
KeyFigure Remodelling
16 pages
Java Example: Platform: Any Hardware or Software Environment in Which A Program Runs, Is Known As A
No ratings yet
Java Example: Platform: Any Hardware or Software Environment in Which A Program Runs, Is Known As A
12 pages
Lesson 2 WORD 2010
No ratings yet
Lesson 2 WORD 2010
14 pages
Maestro 3D Ortho Studio - User Manual - Maestro3D
No ratings yet
Maestro 3D Ortho Studio - User Manual - Maestro3D
47 pages
Fundamentals of Image Data Mining Analysis, Features, Classification and Retrieval by Dengsheng Zhang
No ratings yet
Fundamentals of Image Data Mining Analysis, Features, Classification and Retrieval by Dengsheng Zhang
382 pages
PowerBIPRIAD Lab02A
No ratings yet
PowerBIPRIAD Lab02A
11 pages
Considerations For Remote Working With NX: 1.1 Caveat
No ratings yet
Considerations For Remote Working With NX: 1.1 Caveat
8 pages
Nomenclature Texstudio
No ratings yet
Nomenclature Texstudio
3 pages
G46 MDB GB
No ratings yet
G46 MDB GB
51 pages
KillDisk Manual
No ratings yet
KillDisk Manual
126 pages
Compiling ONNX Neural Network Models Using Mlir
No ratings yet
Compiling ONNX Neural Network Models Using Mlir
8 pages
Boston Scientific Vercise Neural Navigator 40
No ratings yet
Boston Scientific Vercise Neural Navigator 40
362 pages
Castalia - Installation
No ratings yet
Castalia - Installation
4 pages
R18 B.Tech. CSE Syllabus Jntu Hyderabad
No ratings yet
R18 B.Tech. CSE Syllabus Jntu Hyderabad
4 pages
AI Generated Text Detection Synopsis
No ratings yet
AI Generated Text Detection Synopsis
29 pages
32-CIMPLICITY Communications - 8 - 2
No ratings yet
32-CIMPLICITY Communications - 8 - 2
16 pages
Assignment of Connect Internal Hardware Components
No ratings yet
Assignment of Connect Internal Hardware Components
6 pages
G6 Final Synopsis
No ratings yet
G6 Final Synopsis
19 pages
Steps To Creating Digital Artwork
No ratings yet
Steps To Creating Digital Artwork
6 pages
My Job Was Made Redundant Due To COVID-19/Oil & Gas Price Crash
No ratings yet
My Job Was Made Redundant Due To COVID-19/Oil & Gas Price Crash
6 pages

Fast Inverse Square Root

Uploaded by

Fast Inverse Square Root

Uploaded by

Fast Inverse Square Root

September 21, 2022

2.2 Quake III Areana approach

imation by Newton’s method

3.2 Newton-Rapson Method

Figure 3: Approximate log2 (1 + x)

xn+1 = xn − f (xn )f ′ (xn ) (4)

3.3 Approximate log2 (1 + x)

4.1 Bit hacking

Approach Execution time

You might also like