Solving Sudoku Using Particle Swarm Optimization On CUDA
Solving Sudoku Using Particle Swarm Optimization On CUDA
Abstract—Sudoku is a popular puzzle utilizing 81 is crucial for reducing run times from multiple days to
squares in a 9x9 grid consisting of nine 3x3 boxes. The hours, 4) that PSO can indeed solve Sudoku puzzles, and
digits 1-9 can each appear only once in a given row, 5) that the choice of PSO parameters has a huge impact
column, or box. This paper describes the implementation on the time to find a solution.
of Particle Swarm Optimization (PSO) to solve sudoku
puzzles using GPU processing. This PSO uses our open- II. S UDOKU
source PSO framework that takes advantage of CUDA- Sudoku is a logic puzzle that has been extremely popular
enabled GPUs. Although each row contains nine digits, in the US since about 2005 and is based around using
permutations of nine digits can be represented as eight rules and logic to determine where numbers belong. This
”picks”. To find a solution each of the nine rows was section describes basics of the sudoku puzzle as well as its
treated as a permutation. This reduced the problem origination.
dimensionality from 81 to 72. With suitable parameters
the algorithm was able to solve multiple sudoku puzzles. A. Origin
This paper describes the implementation of the algorithm Sudoku is often thought to be of Japanese origin, but
, the fitness function used, and the effects of variation on this is not actually true. [2] Modern Sudoku first appeared
PSO parameters. The original PSO framework and the in American papers in the 1980s, but did not become
Sudoku code described in this paper are available online. popular in the US until 2005. Although the puzzle we know
today only dates back around 30 years, similar puzzles
I. I NTRODUCTION originated in the late 19th century in France, where puzzles
with very similar solutions to sudoku originated. [3]
An open source framework for implementing Multi-
swarm Particle Swarm Optimization on GPUs using
CUDA was developed and discussed in [1] . The framework
B. Rules
was previously demonstrated on a problem to optimize The rules to a sudoku are to fill a nine by nine grid with
the parameters of a PID controller. This was a relatively nine three by three sub-boxes such that only one instance
well behaved problem in a three dimensional space. It was of each of the digits one through nine is contained in each
shown that by utilizing GPU processing, the problem could column, row or box. This means in any given column or
be solved many times faster than was possible using the row there should no repeated numbers and no numbers
CPU. other than one through nine.
In the previous paper it was claimed that the framework
could easily be extended to other problems and higher C. Related Work
dimensional spaces. In this paper we utilize the framework This was not the first time that Sudoku had collided with
to find solutions to Sudoku puzzles. These represent a biologically inspired algorithms. Sudoku puzzles have been
class of problems in a very high dimensional space . solved by both GA and GPSO in the past. This, however,
Furthermore, the space has large number of local minima is the first time using a more generic PSO and adjusting
that occur at large distances from the global minimum. the fitness function accordingly.
We believe that the solution of Sudoku puzzles repre- Sudoku puzzles were solved by Mantere and Koljonen
sents a very significant optimization problem. In this paper 2007 [2]. In the same year, Sudoku puzzles were also
we show that 1) the open source Multi-Swarm Particle solved by Moraglio et al. [4], where they introduced
Swarm Optimization Framework that we have previously combinatorial-based PSO algorithms (GPSO), these spaces
developed is in deed general enough to extend to this were similar to the pick space used in this paper. Sudoku
problem, 2) that the dimensionality of the problem can puzzles were confirmed to be solvable using GPSO by Jilg
be reduced using permutations, 3) that the use of GPUs and Carter 2009 [5].
III. PARTICLE S WARM O PTIMIZATION of 90% of its old movement plus the new movement
PSO was originally developed by Eberhart and Kennedy calculated above. This helps prevent particles from getting
[6] in 1995. It is heavily biologically inspired and it mimics stuck in any single location, such as local minima. A
behaviors that can be seen often in various types of flocking velocity of the particle is calculated and saved each
or swarming animals. It is not a complex algorithm and iteration using the following equation.
performs very well in continuous spaces that do not have
analytical solutions. PSO is an iterative algorithm in that velnew = .9velprevious + mov (2)
it moves each particle, calculates fitness of its location and Where velnew is a vector representing the velocity or
then repeats the process. the amount that the particle will move this iteration,
velprevious is the velocity from the previous iteration, and
A. Inspiration mov is a vector calculated above in equation 1. Once the
Particle Swarm Optimization was inspired by animals velocity is calculated it is added to the particles location
in nature. Many different types of animals travel in groups, every iteration.
and often they benefit from the knowledge of each other. In
a school of fish, each individual can benefit from the others C. Related Work
knowledge regarding food and predators. Flocks of birds Our PSO framework was not the first one to reach the
can cover larger areas by spreading out, and informing GPU scene; however it was the first with the goal of making
the flock of any food found. PSO mimics this in that each the PSO framework more accessible. Zhou and Tan 2010
particle has knowledge about the fitness (or happiness) of [8] implemented a PSO algorithm using GPUs. Rather
itself and of other particles in the swarm, and tends to than using multiple swarms it used a triggered mutation
move toward regions of better fitness. system on top of a standard particle swarm optimization.
They were able to achieve a speedup of 25x with this
B. Algorithm system.
As mentioned previously, particle swarm optimization An asynchronous implementation of PSO was created
is based on having many particles, or candidate solutions, by Mussi et al. 2011 [9]. This allowed each particle to run
moving around an error/solution space. This implemen- iterations at its own rate (which was very fast). However
tation considers the best particle the one with the lowest this was limited by the fact that only one particle was
fitness value; however it would work just as well looking allowed per block, and the maximum number of blocks
for particles with larger fitness values. that can run in parallel limits the swarm size.
Each of the particles tries to move to a solution that is Vanneschi et al. 2010 [10] tested a multi-swarm system
better than its current one. To do this, a particle moves in where the best particles from one swarm were passed to the
the direction of the best location of the swarm and the best next swarm to replace the worst particles, this was done
location it has found so far. The particle does a random in a ring setup of several swarms. They exchanged this
weighting of each so it will be heading towards both to information every 10 steps. This was implemented again
some degree. The following is the equation that describes by Solomon et al. 2011 [11].
the movement towards the particle’s best location as well
as the swarm’s best particle’s location. IV. PARALLEL PARTICLE S WARM O PTIMIZATION
F RAMEWORK
The Parallel Particle Swarm Optimization Framework
mov = rand()pweight (pbest p) + rand()sweight (sbest p) (1)
was an implementation of the Particle Swarm Optimiza-
Where mov is a vector in parameter space representing tion algorithm that would use CUDA and be flexible
a movement, pweight and sweight are scalars chosen by to a number of different problems. The framework was
the programmer, pbest is a position in parameter space designed so that a programmer with knowledge of C, but
representing the location where a particle has been the minimal knowledge of CUDA could modify the program to
happiest, sbest is a position in parameter space represent- solve a large range of problems. In this case the framework
ing the location of the happiest particle in the swarm, and was modified to solve Sudoku puzzles.
p is the current particle location. rand() is a randomly
selected floating point number between 0 and 1. A. Multi-Swarm
Since mov is a vector in parameter space it generalizes
to any number of dimensions depending on the parameter Since each swarm is at some point in time entirely held
space. In the case of no momentum this vector is calculated in CUDA shared memory, the size of each swarm is limited.
and then added to the location each iteration. The following equation defines how many particles can
The algorithm often performs better when each parti- be in a single swarm based on the dimensionality of the
cle’s velocity has some momentum associated with it. [7] problem. [1]
In the framework used, the velocity was implemented with 16, 384
momentum, such that the new movement was composed M ax#of P art = (3)
8 + 12 ∗ DIM
Since in our problem each particle will be moving within Next there are a set of ”picks” that select the order
a 72 dimensional space, 72 can be substituted, giving the in which the numbers will appear in the row. If the first
actual maximum number of particles. ”pick” is assumed to be 5, the 5 would be moved from the
available set to the row, this is shown in figure IV-D.
M ax#of P art = 18 (4)
Because of this limitation the framework allows for
Available: [1 2 3 4 6 7 8 9]
multiple swarms to be run in parallel. To allow the swarms Row: [5]
to cooperate in some way, particles are swapped every 1000
Fig. 2. One Pick Complete
steps. Each time a swap occurs there is a 1% chance that
any 2 particles will be swapped.
If the next pick is also 5 then the 6 is now moved from
B. Modifications to Code the available set to the row set, finally shown in figure
The only programming required by the framework are IV-D.
generally modifying PSO parameters, such as weights, as
well as rewriting the fitness function. The first problem Available: [1 2 3 4 7 8 9]
tested was tuning of PID Controller parameters. While this Row: [5 6]
problem was good for its computational intensity, it had a
very small memory footprint. When using this framework Fig. 3. Two Picks Complete
to implement a sudoku solver a bit more memory was
required.
This process continues until the row set is filled and
The added dimensionality of this problem will still be
the available set is empty. This will only take 8 picks as
held within the particles individual location; however the
once the last pick is reached, there is only one number
information specifying constraints to the puzzle is common
available.
to all particles. To make this quickly accessible to all
It can be observed that in this ”pick” space the size of
threads it was stored in constant memory on the GPU.
each dimension decreases with each pick, starting with 9
This memory is cached for all threads to make it easily
and decreasing down to the known pick with a size of 1.
accessible. Both the ”pick” space and the solution space
In this implementation if a pick was outside the space then
constraints were stored in this constant memory, which
the nearest number would be chosen, one if it is too low
will be described in section IV-G.
and the max of the dimension if it is too high. When these
picks are extended to be 8 dimensions for each of the 9
C. Sudoku Fitness Development rows the problem has a 72-dimensional search space.
For the PSO framework being used a fitness function This ”pick” space turned out to be a good intuitive
had to be developed. This fitness function needed to be a space for searching Sudokus because a movement in any
function that could determine if one solution was better dimension by one unit will create a swap of two numbers.
than another and by how much. It could specify how good The goal of the fitness function is to make it so puzzles
a solution is by filling in a floating point value inside the similar to the solution will be only a series of swaps away
particle structure when the fitness calculation function was from the actual solution.
called. A better solution is interpreted to be one that has
a lower fitness value than another.
E. Fitness Function Basis
D. Pick Space The first implementation was the most intuitive imple-
mentation of the fitness function that could be found. The
It was mentioned previously that a 72-dimensional space
first system was very generic in that location in the puzzle
was being used to solve the Sudokus. This Sudoku solver
did not affect fitness at all. It would first generate the
is using a row based solution system, where each row is
candidate solution based on the location of the particle,
viewed as a permutation. Each row can contain the digits
then it would overwrite any constraint in the puzzle. Once
one through nine in any order, but each digit can only
this was done it would have an array representing an
occur once.
attempted solution to the puzzle as shown below in figure
This can also be interpreted as a sequence of selections
IV-E. The red numbers show constraints that have taken
from an ordered set of the numbers one through nine. It
the place of whatever value was chosen based on particle
would start with an ordered set and a empty row as shown
location.
in figure IV-D.
To calculate the fitness each of the duplicates were
Available: [1 2 3 4 5 6 7 8 9] counted Since any number should occur only once in each
row, column or box, any duplicate can be perceived as a
Row: [ ] problem in the puzzle and therefore the more duplicates
Fig. 1. No Pick Complete the less fit a solution is. So every column, row, and box
is scanned for duplicates and the fitness is set to the sum
the range of the pick space. Because of this the fitness was
increased for each dimension that was outside the bounds
for that given pick.
Ending Fitness
150
the computers CPU, rather than GPU took slightly over
24 hours, making the speedup approximately 48. It is 100
noteworthy that the runs described in this paper utilized
the GPU processors of a GTX 260 for approximately a 50
week of total run time. These same runs would have taken
approximately a year to run on a conventional computer. 0
0.8 0.85 0.9 0.95 1
Momentum Weight
A. Solved puzzles
The local best weight and the swarm best weight were
varied from 0.85 to 1.15 and from 0.05 to 0.15 respectively.
The first puzzle solved was the puzzle shown in IV-G. Through a series of previous runs it was determined that
It also solved the following puzzle in figure V-A. these were the areas that the algorithm could be successful.
These parameters proved to be much less sensitive than
the momentum. The low fitness range of the results suggest
that given enough time it is possible that any of these sets
of parameters could solve the puzzle. Figure V-B below
shows the final fitness vs the local best weight and the
swarm best weight.
35
30
25
Final Fitness
20
15
10
0
0.15
1.1
0.1 1.05
1
0.95
0.9
0.05 0.85
Local Best Weight Swarm Best Weight