Automatic_Reproduction_of_a_Genius_Algorithm_Strassens_Algorithm_Revisited_by_Genetic_Search
Automatic_Reproduction_of_a_Genius_Algorithm_Strassens_Algorithm_Revisited_by_Genetic_Search
2, APRIL 2010
Abstract—In 1968, Volker Strassen, a young German mathe- Although improvements have been made since then, Strassen’s
matician, announced a clever algorithm to reduce the asymptotic work is still optimal under his original framework of dividing
complexity of n × n matrix multiplication from the order of
matrices into four n/2 × n/2 matrices and finding bilinear
n3 to n2.81 . It soon became one of the most famous scientific
discoveries in the 20th century and provoked numerous studies combinations. Strassen’s algorithm is one of the most famous
by other mathematicians to improve upon it. Although a number scientific discoveries in the 20th century.
of improvements have been made, Strassen’s algorithm is still Two primary questions motivated this paper. First, how
optimal in his original framework, the bilinear systems of 2 × 2 many algorithms other than Strassen’s exist under the same
matrix multiplication, and people are still curious how Strassen
framework? Second, can a search algorithm achieve the power
developed his algorithm. We examined it to see if we could auto-
matically reproduce Strassen’s discovery using a search algorithm of finding the same or equivalent solutions to Strassen’s
and find other algorithms of the same quality. In total, we found without enormous efforts of genius human beings? This
608 algorithms that have the same quality as Strassen’s, including paper contains a partial answer to these by using genetic
Strassen’s original algorithm. We partitioned the algorithms into search.
nine different groups based on the way they are constructed. This
Suppose we wish to compute the product of two matrices,
paper was made possible by the combination of genetic search
and linear–algebraic techniques. To the best of our knowledge, which takes the form C = A · B, where each of A, B, and C
this is the first work that automatically reproduced Strassen’s is an n × n matrix. Assuming n is a power of 2 (n = 2k for
algorithm, and furthermore, discovered new algorithms with some integer k), we divide each matrix into four n/2 × n/2
equivalent asymptotic complexity using a search algorithm. matrices as follows:
Index Terms—Bilinear, Gaussian elimination, genetic algori- C 1 C2 A1 A 2 B1 B 3
thm, matrix multiplication, Sammon mapping, Strassen’s algo- = .
rithm. C3 C4 A3 A4 B2 B4
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.
OH AND MOON: AUTOMATIC REPRODUCTION OF A GENIUS ALGORITHM: STRASSEN’S ALGORITHM REVISITED BY GENETIC SEARCH 247
Each of the final matrices C1 through C4 then can be combination of P1 , P2 , . . ., P7 , then n2.81 scalar multiplications
computed using a combination of Pi ’s with no multiplications are enough for an n × n matrix multiplication. Therefore, for
C1 = −P2 + P4 + P5 + P6 i = 1, . . . , 7 and j = 1, . . . , 4, our goal is to find seven Pi ’s
and associated δji ’s, satisfying
C 2 = P1 + P 2
7
C 3 = P3 + P 4 Cj = δji Pi . (1)
C 4 = P1 − P 3 + P 5 + P 7 . i=1
Therefore, the whole process consists of only seven mul- In this framework, the number of all possible Pi ’s is 34 ×34 .
tiplications of n/2 × n/2 matrices. The recursive relation Eliminating the zero matrix and symmetric pairs with reversed
for the total number of scalar multiplications now becomes signs, the number of unique Pi ’s is ((34 − 1)/2)((34 − 1)/2) =
Tn = 7Tn/2 , which results in Tn = nlog2 7 = n2.81 . Strassen 1600. Therefore, the total number
of the candidate solu-
thus dropped the complexity of matrix multiplication from the tions for seven Pi ’s is 1600
7
= 5.2566 × 1018 . Verifying
order of n3 to n2.81 . In total, Strassen’s algorithm uses seven whether a solution satisfies (1) can be done by perform-
multiplications and 18 additions. Some years later Winograd ing the Gaussian elimination of four 16 × 8 matrices as
found an algorithm that uses seven multiplications and only 15 will be seen in Section II-D. On a Pentium IV 2.4GHz
additions [7], [2]. Winograd’s algorithm reduced the number of CPU, it takes 0.0001 s to compute the Gaussian elim-
additions to 15 by exploiting common subexpressions. How- ination of a 16 × 8 matrix. Therefore, it would take
ever, the time required for the additions is negligible compared (5.2566 × 1018 × 0.0001 × 4)/(60 × 60 × 24 × 365) = 6.67×
to the multiplications, and does not affect the asymptotic 107 years to check all candidate solutions.
complexity of the algorithms when n is large enough.
After Strassen, a large number of mathematicians tried to B. Genetic Algorithm
find better algorithms. Currently, the best is n2.376 , which was A genetic algorithm (GA) is a search method that mimics
obtained by using a basic trilinear form and the Salem-Spencer the process of natural selection in nature. Fig. 1 shows the
Theorem, which uses arithmetic progression [4]. However, it flow of the GA we used. It is a typical steady-state hybrid
has been proven that when using bilinear combinations of GA. In the GA, we first create a fixed number of initial
n/2 × n/2 matrices, it is not possible to obtain the product solutions at random, in which nearly all solutions have poor
with only six multiplications of n/2 × n/2 matrices [6], [7]; fitness; this set of solutions is called the population. Then we
therefore, Strassen’s algorithm using seven is optimal. If we iterate the genetic main loop. Each loop runs as follows. We
divide an n × n matrix into nine n/3 × n/3 matrices and choose two parent solutions in the population based on their
try the same approach (bilinear combinations) as Strassen’s, relative fitness. Then the chosen solutions are combined by
so far 23 multiplications of n/3 × n/3 matrices is the best, partly mixing their characteristics to produce a new solution,
which requires nlog3 23 scalar multiplications in total [8]. Since called an offspring. The offspring is improved using a local
nlog3 23 > n2.81 , it is asymptotically not as good as Strassen’s optimization algorithm. Then the fitness of this offspring is
original n/2×n/2 approach. An excellent survey about matrix evaluated. If the offspring satisfies some condition, it replaces
multiplication was provided by Pan [9]. one of the solutions in the population. The loop is repeated un-
Strassen’s method can be found by naively evaluating all til the stopping condition is satisfied. For a good introduction
the possible cases exhaustively. Unfortunately, if we attempt to GAs, see [5].
to find a comparable solution using an exhaustive search algo- Although GA has a wide space-search capability, it is
rithm, it would take around 67 million years on a Pentium IV usually not very powerful for greater than toy-sized problems.
2.4 GHz machine. Instead, we used a genetic algorithm for Particularly, it is weak in fine-tuning around the local optima.
this, and succeeded to cut the run-time down to a few hours. For practical competence, we often need to incorporate local
In an extreme case, it took just 10 s to find a solution. optimization algorithms in the framework of GA. These types
To the best of our knowledge, this is the first work that of GAs are called hybrid GAs or memetic GAs. Devising a
automatically found algorithms comparable to Strassen’s. We synergetic local optimization algorithm is thus a crucial part
hereafter describe how we achieved the goal. in the design of hybrid GAs [3]. In the following sections, we
describe each part of the GA in more detail.
II. A Genetic Search
A. Problem Formulation C. Encoding
It is not clear exactly how Strassen discovered the set of Pi Two key parts of a Strassen-style algorithm are the set of
matrices, which are the crucial part of his algorithm. For this Pi ’s and the combination of Pi ’s to produce Ci ’s. Once an arbi-
paper, we assumed each matrix product, Pi (i = 1, 2, . . . , 7), trary set of Pi ’s is given, it is not difficult to determine if they
is written in the following form [10]: correctly produce the Ci ’s using matrix rank and the Gaussian
Pi = (αi1 A1 + αi2 A2 + αi3 A3 + αi4 A4 ) elimination. Thus the GA here focuses on discovering a set
of Pi ’s. In GAs, a solution is represented by a chromosome.
·(βi1 B1 + βi2 B2 + βi3 B3 + βi4 B4 ), αij , βij ∈ {−1, 0, 1}. Here, a chromosome is an 8 × 7 matrix; each of the seven
These types of combinations for Pi are called bilinear columns is a set of eight αij ’s/βij ’s. That is, the ith column
combinations. If each of C1 , C2 , C3 , and C4 is a linear corresponds to Pi . As mentioned, the number of all possible
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.
248 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 2, APRIL 2010
I7×7 δ1
matrix of the form , C1 is a linear combination
O9×7 O9×1
of Pi ’s. Similarly, we check C2 , C3 , and C4 to see if they are
a linear combination of Pi ’s.
E. Crossover
Crossover is a genetic operator for recombining two parent
Fig. 1. Steady-state genetic algorithm. solutions into a new one. Each parent solution consists of
seven vectors of P1 through P7 ; thus, there are 14 vectors
in total. Theoffspring takes seven vectors out of the 14.
solutions is 5.2566 × 1018 . Due to the huge solution space, a For Cj = 7i=1 δji Pi , we define Gj to be {Pi |δji = 0}. Gj can
search algorithm can examine only a relatively tiny fraction be an empty set. Namely, Gj is the set of Pi ’s that contribute to
of the space. the formation of Cj . For example, if C1 = P1 +P3 +P4 , then
G1 = {P1 , P3 , P4 }, which implies that C1 can be expressed
D. Fitness with P1 , P3 and P4 .
We choose one Gi at random among the nonempty Gi ’s
Our goal is to find a solution that satisfies (1). This means
of parent1 and parent2, and we add a Pi that belongs to the
that C1 through C4 each can be represented by a linear
chosen Gi to the offspring, unless the Pi already exists in the
combination of {P1 , P2 , . . ., P7 }. We define fitness as the
offspring. Then, we choose another Gi and repeat the same
number of Ci ’s that can be represented by a combination of
process until the offspring has seven Pi ’s. If no such Pi remains
Pi ’s; thus, fitness is an integer value from 0 to 4. For example,
before we have seven Pi ’s, we randomly generate new Pi ’s and
if C2 is a linear combination of Pi ’s and the others are not,
add them to the offspring until the offspring has seven Pi ’s.
then the fitness of this solution is 1. Therefore, in terms of
fitness, our goal is to find a solution to fitness 4.
To evaluate a solution, we expand each Pi into 16 terms as
16 F. Local Optimization
Pi = αi,j/4 βi,j−1( mod 4)+1 Aj/4 Bj−1( mod 4)+1 . (2) After a new offspring is generated by crossover, the GA
j=1 improves the solution using a local optimization algorithm.
Let αij βik = γi,(j−1)×4+k , = [γji ], and i = [δij ]T . Then, Each solution has a position in the problem space. If we plot
the fitness of each solution to its corresponding position, we
Pi can be described as Pi = 16 j=1 γi,j Aj/4 Bj−1( mod 4)+1 . For
instance, the vector representation of C1 = A1 B1 + A2 B2 can construct a fitness landscape for the problem. Some of the
becomes C1 = [1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]T . The ith element solutions are of high quality and play the role of attractors.
When we have a moderate or low-quality solution, we can
7 the term Ai/4 Bi−1( mod 4)+1 . From
of the vector represents
climb to one of the nearby attractors. Although this is not
the equation C1 = i=1 δ1i Pi , we obtain · 1 = C1 . The
equation · 1 = C1 is represented by a matrix expression as always beneficial, it saves considerable run-time in the GA.
follows: Considering the huge problem space and limited time budget,
we cannot help but ignore the most seemingly unpromising
⎛ ⎞ ⎛ ⎞ areas of the space. The local optimization consists of the
γ1,1 γ2,1 γ3,1 γ4,1 γ5,1 γ6,1 γ7,1 1 following two parts.
⎜γ1,2 γ2,2 γ3,2 γ4,2 γ5,2 γ6,2 γ7,2 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ 1) Pursuing Linear Independence: If Pi ’s are linearly
⎜γ1,3 γ2,3 γ3,3 γ4,3 γ5,3 γ6,3 γ7,3 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ dependent, the solution contains at least one unnecessary
⎜γ1,4 γ2,4 γ3,4 γ4,4 γ5,4 γ6,4 γ7,4 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ Pi . Removing unnecessary Pi ’s is the goal of the first local
⎜γ1,5 γ2,5 γ3,5 γ4,5 γ5,5 γ6,5 γ7,5 ⎟ ⎛ ⎞ ⎜0⎟
⎜ ⎟ δ11 ⎜ ⎟ optimization. If Pi ’s are linearly independent, the rank of the
⎜γ1,6 γ2,6 γ3,6 γ4,6 γ5,6 γ6,6 γ7,6 ⎟ ⎜ ⎟
⎜ ⎟ ⎜δ12 ⎟ ⎜1⎟ matrix = [γji ] is 7. In the case of a rank value lower than
⎜γ1,7 γ2,7 γ3,7 γ4,7 γ5,7 γ6,7 γ7,7 ⎟ ⎜ ⎟ ⎜0⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟ 7, we find the nonessential Pi by means of the Gaussian
⎜γ1,8 γ2,8 γ3,8 γ4,8 γ5,8 γ6,8 γ7,8 ⎟ ⎜δ13 ⎟ ⎜0⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟ elimination, and change an αij or βij of the Pi . We then
⎜γ1,9 γ2,9 γ3,9 γ4,9 γ5,9 γ6,9 γ7,9 ⎟ ⎜δ14 ⎟ = ⎜0⎟ . (3)
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜γ1,10 γ2,10 γ3,10 γ4,10 γ5,10 γ6,10 γ7,10 ⎟ ⎜δ15 ⎟ ⎜0⎟ compute the rank again and repeat this process until the rank
⎜ ⎟⎝ ⎠ ⎜ ⎟ of the matrix becomes 7.
⎜γ1,11 γ2,11 γ3,11 γ4,11 γ5,11 γ6,11 γ7,11 ⎟ δ16 ⎜0⎟
⎜ ⎟ ⎜ ⎟
⎜γ1,12 γ2,12 γ3,12 γ4,12 γ5,12 γ6,12 γ7,12 ⎟ δ17 ⎜0⎟ 2) Checking All the Cases: To improve the fitness of
⎜ ⎟ ⎜ ⎟ a solution, we check the fitness after replacing one of the
⎜γ1,13 γ2,13 γ3,13 γ4,13 γ5,13 γ6,13 γ7,13 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ Pi ’s with each of the possible 1600 cases. First, we select a
⎜γ1,14 γ2,14 γ3,14 γ4,14 γ5,14 γ6,14 γ7,14 ⎟ ⎜0⎟
⎜ ⎟ ⎜ ⎟ nonessential Pi which does not belong to G1 , G2 , G3 , or G4 .
⎝γ1,15 γ2,15 γ3,15 γ4,15 γ5,15 γ6,15 γ7,15 ⎠ ⎝0⎠
If such a Pi does not exist, an arbitrary Pi is selected. We
γ1,16 γ2,16 γ3,16 γ4,16 γ5,16 γ6,16 γ7,16 0
replace it with another among the 1600 cases and compute
We apply the Gaussian elimination on the matrix [ Ci ]. the fitness. If a better case is found, Pi is replaced. Otherwise,
If the Gaussian elimination of the matrix [ C1 ] results in a Pi remains unchanged.
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.
OH AND MOON: AUTOMATIC REPRODUCTION OF A GENIUS ALGORITHM: STRASSEN’S ALGORITHM REVISITED BY GENETIC SEARCH 249
TABLE I
Number of Solutions
Group Index Solutions Found Distinct Solutions Lower Bound of the Number of Solutions
Group 1 648 32 32
Group 2 891 128 128
Group 3 522 32 32
Group 4 61 35 64
Group 5 30 19 64
Group 6 3 3 32
Group 7 33 30 128
Group 8 149 57 64
Group 9 46 36 64
TABLE II
Representative Solutions in Each Group
III. Experimental Results of nonzero entries αij and βij in the Pi ’s. For example, group 1
In our GA, we set the population size to 15. The GA stops has [3 3 3 3 4 4 4] nonzero elements. It is notable that groups 3,
if a valid solution appears or 1000 iterations has passed. Many 4, 5, 6, and 7 include 0.5 as the coefficients of Pi ’s, which we
more solutions were found than we expected. So far, we have did not expect to encounter. (Although they introduce some
discovered 2383 solutions that have the same computational scalar multiplications in Ci ’s, it does not affect the asymptotic
complexity as Strassen’s. Removing duplicate solutions, 372 complexity of the algorithm.)
distinct ones remained. Among them, 32 solutions turned out Since there is no special order in Ci ’s, we can make different
to be similar to Strassen’s in the way they are constructed. We solutions with the same structure using the symmetry of Ci ’s
divided the solutions into nine groups based on the numbers and changing the signs of ai ’s and bi ’s in Pi ’s [10]. Using
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.
250 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 14, NO. 2, APRIL 2010
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.
OH AND MOON: AUTOMATIC REPRODUCTION OF A GENIUS ALGORITHM: STRASSEN’S ALGORITHM REVISITED BY GENETIC SEARCH 251
Authorized licensed use limited to: ST. JOSEPH ENGINEERING COLLEGE MANGALORE. Downloaded on March 09,2025 at 09:55:06 UTC from IEEE Xplore. Restrictions apply.