EasyChair Preprint 3149
EasyChair Preprint 3149
№ 3149
ABSTRACT
INTRODUCTION
If it is possible for a system to improve itself, for example, for a program to rewrite its
own source code to learn faster, or to store more knowledge in a fixed space, without
being given any information except its own source code. This is a different problem than
learning, where a program gets better at achieving goals as it receives input. An
example of a self improving program would be a program that gets better at playing
chess by playing games against itself. Another example would be a program with the
goal of finding large prime numbers within t steps given t. The program might improve
itself by varying its source code and testing whether the changes find larger primes for
various t.
Is it possible for a computer program to write its own programs? While this kind of idea
seems far-fetched, it may actually be closer than we think. Researchers conducted an
experiment to produce an AI program, capable of developing its own programs, using a
genetic algorithm implementation with self-modifying and self-improving code. However,
artificial intelligence, if programmed, in an attempt to write a functioning program that
can, itself, write programs.
METHODOLOGY
Evolution does modify its own source code and does that by manipulating DNA from
generation to generation. A genetic algorithm is a type of artificial intelligence, modeled
after biological evolution, that begins with no knowledge of the subject, aside from
available tools and valid instructions. The AI picks a series of instructions at random (to
serve as a piece of DNA) and checks the fitness of the result. It does this with a large
population size, of say 100 programs. Surely, some of the programs are better than
others. Those that have the best fitness are mated together to produce offspring. Each
generation gets a bit of extra diversity from evolutionary techniques such as roulette
selection, crossover, and mutation. The process is repeated with each child generation,
hopefully producing better and better results, until a target solution is found. Genetic
algorithms are programmatic implementations of survival of the fittest. They can also be
classified as artificially intelligent search algorithms, with regard to how they search an
immense problem space for a specific solution.
ARCHITECTURE
OPTIMAL PROGRAM FOLLOWING RSI
Definition: P1 is a recursively self improving (RSI) program with respect to G if and only
if Pi(-1) = Pi+1 for all i > 0 and the sequence Pi, i = 1, 2, 3...is an improving sequence
with respect to G.
Definition (RSI system).Given a finite set of programs P and a score function S over P.
Initialize p from P to be the system’s current program. Repeat until certain criterion
satisfied, generate p'∈ P using p. If p' is better than p according to S, replace p by p'.
From this definition, one needs to decide how p ∈ P generates a program. In general,
we should allow the RSI system to generate programs based on the history of the entire
process. The way a program generates a new program is independent, and each
program defines a fixed probabilistic distribution over P. This procedure defines a
homogeneous Markov chain. We will see that even with this restriction, with some score
function, the model is able to achieve a desirable performance.
Fig. 1: The Markov chain corresponding to the RSI procedure defined by given scores
and program generation probabilities.
A reasonable utility measure is the expected numbers of steps starting from a program
to find the optimal program following our RSI definition. Furthermore, the score function
needs to be consistent with the expected numbers of steps from programs to the
optimal program following the process defined by itself. We mean that a score function
S is consistent if for all p, p′∈P, S(p)> S(p′)implies that the expected number of steps to
reach the optimal program from p is greater than starting from p′. More generally, if one
takes some measure for a programs’ ability to generate future programs, the score
function needs to be consistent with this measure.
Two nice properties hold for this construction. First, the programs are added in a non-
decreasing order of scores. Second, the score function equals the expected numbers of
steps to reach the optimal program defined by this score function. We will prove the first
property. The second property and the consistency of the score function are
straightforward from the first property. We describe an example of how such score
function is computed given the distributions to generate programs of each program and
the optimal program. Consider the same abstraction of programs as the above example,
where P={p1, p2, p3, p4} with corresponding probabilistic weights w1=
[0.97,0.01,0.01,0.01], w2= [0.75,0,0.25,0], w3= [0.25,0.25,0.25,0.25], w4=
[0,0.58,0,0.42]. Fix p1 to be the optimal program. Initially set S(p1) = 0 and S(pi) =∞,
i=2,3,4. The transition function of initial Markov chain is
At the first step, the expected number of steps from p2, p3, p4 following the current
Markov chain are 4/3,4,∞. Hence we update S(p2) = 4/3. Because of the change of
score, transition of the Markov chain change to
Then we compute the expected number of steps from p3 and p4 following the updated
Markov chain. By some arithmetic we get the expectation are 8/3 for p3 and
(approximately) 3.057 for p4. Since 8/3<3.057, update S(p3) = 8/3. By similar
procedures, one can compute the score for S(p4).
Genetic algorithms have collections of solutions that are collided with each other to
make new solutions, eventually returning the best solution. Since optimization and
intelligence are deeply linked, using Genetic Algorithm to optimize Machine Learning or
AI algorithm performances which would include ‘genetic algorithm’ as a numerical
optimization technique.
In this paper , we use the genetic algorithm (GA) for optimizing the ANN network weights
as the solution to the problem of very low accuracy in view of the fact that no backward
pass for updating the network weights is used.
Looking at the above figure, the parameters of the network are in matrix form because
this makes calculations of ANN much easier. For each layer, there is an associated
weights matrix. Just multiply the inputs matrix by the parameters matrix of a given layer
to return the outputs in such layer. Chromosomes in GA are 1D vectors and thus we
have to convert the weights matrices into 1D vectors.
Because matrix multiplication is a good option to work with ANN, we will still represent
the ANN parameters in the matrix form when using the ANN. Thus, matrix form is used
when working with ANN and vector form is used when working with GA. This makes us
need to convert the matrix to vector and vice versa. The next figure summarizes the
steps of using GA with ANN.
Weights Matrices to 1D Vector
Each solution in the population will have two representations. First is a 1D vector for
working with GA and second is a matrix to work with ANN. Because there are 3 weights
matrices for the 3 layers (2 hidden + 1 output), there will be 3 vectors, one for each
matrix. Because a solution in GA is represented as a single 1D vector, such 3 individual
1D vectors will be concatenated into a single 1D vector. Each solution will be
represented as a vector of length 24,540.
Implementing GA Steps
After converting all solutions from matrices to vectors and concatenated together, we are
ready to go through the GA steps. The steps are presented in the figure above and also
summarized in the next figure.
Remember that GA uses a fitness function to return a fitness value for each solution. The
higher the fitness value the better the solution. The best solutions are returned as
parents in the parents selection step.
One of the common fitness functions for a classifier such as ANN is the accuracy. It is
the ratio between the correctly classified samples and the total number of samples. It is
calculated according to the following equation. The classification accuracy of each
solution is calculated according to steps in the above figure.
The single 1D vector of each solution is converted back into 3 matrices, one matrix for
each layer (2 hidden and 1 output).
The matrices returned for each solution are used to predict the class label for each of the
samples in the used dataset to calculate the accuracy. This is done using 2 functions.
The first function accepts the weights of a single solution, inputs, and outputs of the
training data, and an optional parameter that specifies which activation function to use. It
returns the accuracy of just one solution not all solutions within the population. It order to
return the fitness value (i.e. accuracy) of all solutions within the population,
the second function loops through each solution, pass it to the first function, store the
accuracy of all solutions into an array, and finally return such an array.
After calculating the fitness value (i.e. accuracy) for all solutions, the remaining steps of
GA as shown in the above figure are applied. The best parents are selected, based on
their accuracy, into the mating pool. Then mutation and crossover variants are applied in
order to produce the offspring. The population of the new generation is created using
both offspring and parents. These steps are repeated for a number of generations. We
can also try different values for the GA parameters such as a number of solutions per
population, number of selected parents, mutation percent, and number of generations.
RESULTS
The test results of the proposed RSI procedure ( Wenyi Wang ) in simulation with
randomly generated abstraction of programs where a fixed number of programs is
chosen from n= 2l, l = 1,2,…..20. The first program is designed to generate programs
uniformly over all programs. Other programs generate programs follow a weighted
distribution over a subset of programs. With 10 repeats for each l = 1,2,……20, the
expected number of steps for the first program to reach the optimal program has been
calculated and the results suggest a linear relation between l ( Number of Programs )
and expected number of steps.
GA-ANN
Based on 50 generations, and using visualization library that shows how the accuracy
changes across each generation. It is observed that after 50 iterations, On the MNIST
dataset, we are able to find an accuracy that is more than 50%. This is compared to
25% with no backward pass for updating the network weights and without using an
optimization technique. This is an evidence about why results might be bad not because
there is something wrong in the model or the data but because no optimization technique
is used. However, using different values for the parameters such as 100 generations
might increase the accuracy.
CONCLUSION
REFERENCES
1. Wenyi Wang “ A Formulation of Recursive Self-Improvement and Its Possible
Efficiency”. https://arxiv.org/pdf/1805.06610.pdf
2. Kory Becker, Justin Gottschlich “AI Programmer: Autonomously Creating
Software Programs Using Genetic Algorithms “ https://arxiv.org/abs/1709.05703
3. Ahmed Gad “ Artificial Neural Networks Optimization using Genetic Algorithm with Python”
Towards Data Science