Fast Inverse Square Root
Fast Inverse Square Root
Abstract
Gennerraly in anaylysics the complexity of algorithm we assign for
timecomplexity of arithmetic operator is O(1). But inside it is many lines
of asmbly code to calculate on bit. In late of 1990s and soon of 2000s,
the hardware was very limited so that the calculate of value √1x is a big
problem. In this report, we will respresent a algorithm to calc this in O(1)
of bit code, which was a art of bit and mathematic.
Index Terms—algorithm, fast inverse square root, ieee754
1 Problem
This problem appear when you have to normalize the vector to length 1 for ray
tracing, reflection, ... You have light come into a plane with vector (x,y,z) and
the normal vector of plane is (a,b,c) to determine the vector of light come out
you have to normalize the normal vector of plane (make the lenghth to 1). The
result:
a b c
⃗n = ( √ ,√ ,√ )
a2 2
+b +c2 2 2
a +b +c2 a + b2 + c2
2
The core calculation for this problem is fast inverse square root. Imagine in a
3d-game, there are many object need to reflection but the CPU is no unlimited
and then FISR was born. The problem is the value of √1x . The context is
doing this math on a computer with structure of integer and floating point.
Breakdown the problem:
• Algorithm to find square root
• How to make divide 1 and a value as fast as able
– Datatype
– Instructions set
1
2 Approaches
2.1 Naive approach
The naive approach that anyone could come up with is to use floating-point
division and the square root function.
1 f l o a t n a i v e s q r t ( f l o a t number ) {
2 r e t u r n 1 / s q r t ( number ) ;
3 }
3 Background
This part give you at least knowledge to know why the code written like that
and how the code work. It includes 2 main topic the floating point and approx-
2
Figure 1: IEEE 754
3.1 IEEE754
The IEEE Standard for Floating-Point Arithmetic wrote in 1985 is the core-
valueable in IEEE 754 after two time update in 2008 and 2019. It is consist of
many type of floating point number:
• Normalised number
• Denormalised number
• Not a Number
• Infinities
• 0 and -0
But in this report we only represent the normalised number, the other type is
not the type for input of FISR algorithm. The value of a IEEE-754 number is
computed as:
sign 2exponent mantissa (1)
y1 f (x1 )
x = x1 − = x1 − ′ (3)
f ′ (x1 ) f (x1 )
3
Figure 2: Newton Approximation Method
The same calculation applies at each stage: so from the n’th approximation
xn, the next approximation is given by
4
4 Method
Interpreting this function may be a little difficult because of the bit hacking and
lack of proper documentation. However it can be broken down into the following
sections and interpreted. The casting between (long*) and (float*) is one detail.
The formula y = y ∗ (threehalf s − (x2 ∗ y ∗ y)) is another detail. But these are
simple compared to the line that should jump out; i = 0x5f 3759df − (i >> 1).
4.2 Approximation
The function contains the formula y = y (threehalfs (x2 y y)). This comes
from the Newton-Raphson method on f(x) = 1 x 2 h which, from Equation 2.1,
yields an xn+1 of xn+1 = xn 2
3 hx2 n
This can be rearranged to more closely resemble the code as
xn+1 = xn 3 2 h 2 x 2 n
where xn is y and h/2 is x2. This step refines the answer, and can be
repeated. Notice in the Quake III: Arena source code it is commented out the
second time‘
5 Experiment
5.1 Setup
We test FISR by evaluating its accuracy and performance with regard to the
naive approach.
When comparing accuracy, we use Relative Error as the metric which is
calculated based on the results of the FIRQ and the straightforward 1/sqrt(x)
approach. In terms of performance, more specifically we will evaluate the FISR
based on its execution time and the number of instructions it consumps. To
achieve this, we use perf - performance counters for Linux. We will implement
5
the program to comparing between different approaches and execute it on our
machine (Intel(R) Core(TM) i7-8550U 1.80GHz).
All the diagrams presented in this section are generated from programs writ-
ten in python
5.2 Accuracy
Relative error (RE) is the ratio of the absolute error between the prediction and
groundtruth to the prediction. Relative error is expressed as percentage.
|ypred − y|
RE(ypred , y) = (5)
y
Conduct the experiment by calculating the relative error between fastIn-
vSqrt(x) and 1/sqrt(x) with x from 1 to 10000. The results are visualized as
shown in 5 and 6.
As shown in 5, the relative error fluctuates periodically but does not exceed
0.3%. The relative error even lower when we integrate the second iteration
into the fast inverse square root. It can be easily seen that, with the second
iteration of Newton’s method, the relative error is maximum at 1, 75.10−5 , this
approximation is good enough to use, besides, the time it takes to compute the
approximation is also very fast.
5.3 Performance
To evaluate the performance, we use different approaches to calculate the inverse
square root of every integer number from 1 to 231 − 1. As mentioned in the 5.1,
we use perf to get the statistics of the execution of each approach.
5.3.1 Speed
Table 1: Average & variance of execution time (seconds) of each approach. The best
number is in bold.
The table shows that FISR is roughly 2 times faster than the normal ap-
proach. With those CPUs in the past, it can be 3 to 4 times faster.
5.3.2 Instructions
As shown in Table 2, the fast approach has the fewest instructions to execute
even though modern CPUs have dedicated instruction for computing square
root. This also partly explains why FISR is so fast.
6
Figure 4: The approximate result fits perfectly with groudtruth result
Figure 5: The relative error when approximating inverse square root with the 1st
iteration of Newton’s method
Figure 6: The relative error when approximating inverse square root with the 1st
iteration of Newton’s method
7
Table 2: Average instructions of each approach. The best number is in bold.
Approach Instructions
Fast Inverse Square Root 45, 105, 476, 499
1.0/sqrt(x) 53, 699, 853, 657
pow(x, -0.5f) 173, 970, 220, 040
6 Application
6.1 Context
The Quake III algorithm used to be the optimised approach back in the day
when CPUs are not as powerful as they are today. However, later generations of
CPUs gradually adapt and support the reciprocal of the square root instruction,
e.g. the rsqrtss instruction found in the [?] x86 instruction set..
Therefore, the fast inverse square root is not applicationable. Still, this
algorithm remains one of the beauty of mathematics in the world of computers
1 f l o a t s q r t ( f l o a t number ) {
2 r e t u r n pow ( number , −0.5 f ) ;
3 }
6.2 Data
No need data
7 Reference
• Fast Inverse Square Root — A Quake III Algorithm
https://www.youtube.com/watch?v=p8u k2LIZyo
• FISR, https://blog.timhutt.co.uk/fast-inverse-square-root/
• SSE Instruction, https://docs.oracle.com/cd/E37838 01/html/E61064/eojde.html
• IEEE754, https://standards.ieee.org/ieee/754/993/
• Newston-Rapson Method
https://amsi.org.au/ESA Senior Years/SeniorTopic3/3j/3j 2content 2.html
• https://ece.uwaterloo.ca/ dwharder/aads/Algorithms/Inverse square root/
• http://i.stanford.edu/pub/cstr/reports/csl/tr/94/647/CSL-TR-94-647.pdf
• https://www.linkedin.com/pulse/fast-inverse-square-root-still-armin-kassemi-
langroodi/
• Chris Lomont, http://www.lomont.org/papers/2003/InvSqrt.pdf