0% found this document useful (0 votes)
9 views45 pages

Master Inspera

This thesis explores the concept of emulating human-like knowledge transfer in artificial intelligence through transfer learning in image classification. It discusses the dual thinking processes described by Kahneman in 'Thinking, Fast and Slow' and how they can be applied to enhance machine learning capabilities. The experiments conducted demonstrate high accuracy in image classification using transfer learning, while also identifying limitations related to the similarity of training data.

Uploaded by

muddu0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views45 pages

Master Inspera

This thesis explores the concept of emulating human-like knowledge transfer in artificial intelligence through transfer learning in image classification. It discusses the dual thinking processes described by Kahneman in 'Thinking, Fast and Slow' and how they can be applied to enhance machine learning capabilities. The experiments conducted demonstrate high accuracy in image classification using transfer learning, while also identifying limitations related to the similarity of training data.

Uploaded by

muddu0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Thinking, Fast and Slow

Emulating Transfer of Knowledge


through Image Classification
Sem Gebregziabher Petros

Thesis submitted for the degree of


Master in Robotics and Intelligent Systems
60 credits

Department of Informatics
Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

Spring 2022
Thinking, Fast and Slow

Emulating Transfer of Knowledge


through Image Classification

Sem Gebregziabher Petros


© 2022 Sem Gebregziabher Petros

Thinking, Fast and Slow

http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo


Abstract

Humans can solve previously unknown tasks by transferring already ac-


quired knowledge to new domains. The combination of slow and fast
thinking within the human brain allows for logical reasoning and judg-
ments based on examination while solving problems fast and effortlessly.
Machine learning has yet to achieve this dual ability.
This thesis presents a possible way to attain such a dual ability
through transfer learning. A series of experiments are presented using
transfer learning in image classification to determine artificial intelligence’s
capabilities to transfer knowledge.
The results show high accuracy when applying transfer learning to
image classification while also defining some limits to how similar data
used for training needs to be.

i
Contents

1 Introduction 1
1.1 Kahneman’s "Thinking, Fast and Slow" . . . . . . . . . . . . 1
1.2 Thinking, Fast and Slow in machine learning . . . . . . . . . 2
1.3 Transfer of knowledge . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Artificial neural networks . . . . . . . . . . . . . . . . 6
2.1.2 Residual Neural Networks . . . . . . . . . . . . . . . 7
2.2 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Neuroscience-Inspired Artificial Intelligence . . . . . . . . . 8
2.3.1 Lifelong learning machines . . . . . . . . . . . . . . . 9
2.3.2 Throwing a ball into a basket . . . . . . . . . . . . . . 9
2.4 Identifying tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Project Description and Implementation 12


3.1 Motivation for using transfer learning . . . . . . . . . . . . . 12
3.2 Emulating Transfer of Knowledge through Image Classifica-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Datasets and modifications . . . . . . . . . . . . . . . . . . . 14
3.3.1 Transfer learning using large scale datasets . . . . . . 14
3.4 Transfer Library: Storing previous knowledge . . . . . . . . 15

4 Experiments and Results 18


4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Conditions and configuration . . . . . . . . . . . . . . 19
4.3 Training ResNet-18 on 2D data . . . . . . . . . . . . . . . . . 19
4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Experiment 1 - Classifying matching 3D objects . . . . . . . . 20
4.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Experiment 2 - Classifying 3D objects without training . . . 22
4.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Experiment 3 - Introducing "confusing" objects . . . . . . . . 23
4.6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.7 Experiment 4 - Transfer learning with non-similar data . . . 24

ii
4.7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.8 Experiment 5 - Transfer learning with ImageNet . . . . . . . 26
4.8.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Conclusion 30
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 Implementing "Thinking, fast and slow" . . . . . . . . 30
5.2.2 Cross-domain knowledge transfer . . . . . . . . . . . 31

iii
List of Figures

2.1 A figure illustrating forward propagation through a single


neuron within a artificial neural network. . . . . . . . . . . . 6
2.2 Diagram of a singe residual building block presented by He
et al. [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1 A figure explaining how transfer learning is performed. (1)


A residual neural network is trained on initial data; (2)
Transfer learning is applied by reusing the already-trained
network; (3) The network is then reused using transfer
learning. The fully connected layer is then trained using new
data. The model predicts based on the new data. . . . . . . . 13
3.2 Six samples from the 2D dataset . . . . . . . . . . . . . . . . . 14
3.3 Eight samples from the 3D dataset . . . . . . . . . . . . . . . 15
3.4 A diagram illustrating the framework described in section 3.4. 15

4.1 Accuracy and loss measurements from training the ResNet-


18 model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Experiment 1: Measuring accuracy of classification when
classifying 3D objects with training using similar objects.
The figure shows ten distinct runs of the model. . . . . . . . 21
4.3 Experiment 2: Measuring accuracy of classification when
classifying 3D objects with no initial training. The figure
shows ten distinct runs of the model. . . . . . . . . . . . . . . 22
4.4 Experiment 3: Measuring accuracy of classification when
adding "confusing" objects. The figure shows ten distinct
runs of the model. . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Experiment 4: Measuring accuracy of classification when
classifying images of animals with initial training from 2D
data. The figure shows ten distinct runs of the model. . . . . 25
4.6 Experiment 5: Measuring accuracy of classification when
classifying when initially trained on large scale datasets. The
figure shows ten distinct runs of the model. The model is
trained with batch size = 12. . . . . . . . . . . . . . . . . . . . 26
4.7 Experiment 5: Measuring accuracy of classification when
classifying when initially trained on large scale datasets. The
figure shows ten distinct runs of the model. The model is
trained with batch size = 128. . . . . . . . . . . . . . . . . . . 27

iv
List of Tables

4.1 Overview of the datasets used in the experiments. . . . . . . 19


4.2 Table containing results of training the ResNet-18 model to
classify 2D geometric shapes dataset. . . . . . . . . . . . . . . 20
4.3 Results of experiment 1 . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Results of experiment 2. . . . . . . . . . . . . . . . . . . . . . 22
4.5 Results of experiment 3. . . . . . . . . . . . . . . . . . . . . . 24
4.6 Results of experiment 4. . . . . . . . . . . . . . . . . . . . . . 25
4.7 Results of experiment 5. The measurements of accuracy and
loss on the top displays the results of training with batch
size = 12, and the bottom measurements of accuracy and loss
displays the results of training with batch size = 128. . . . . . 27

v
Acknowledgements

I would like to give a special thank you to my supervisor, Jim Tørresen, for
good guidance and support throughout this thesis.

vi
Chapter 1

Introduction

Non-human intelligence has captured people’s imagination and fascination


for more than a century. The abilities of machines have kept exceeding over
the last decades and keep surpassing new limits steadily.
Even though advancements in artificial intelligence (AI) have pro-
gressed over the last eighty or so years, the goal of artificial general in-
telligence (AGI) has yet a long way to go. Machines can solve particular
tasks and manage to solve them with very high precision, but putting those
same machines in a different domain will most likely lead them to perform
poorly. The progression within the field of transfer learning might be the
solution to how machines can solve tasks in multiple domains.

1.1 Kahneman’s "Thinking, Fast and Slow"

In Thinking, Fast and Slow [11], Kahneman divides the human way of
thinking into two distinct categories. He distinguishes between their traits
and calls them "System 1" and "System 2". System 1 is categorized as fast
and automatic thinking and uses pre-learned knowledge to solve problems.
It concludes effortlessly, but it also tends to generalize when solving tasks
quickly.
System 2, on the other hand, is categorized as slow and effortful
thinking. It is rational and systematic and takes over when System 1
lacks the knowledge to solve a given task. System 2 attains knowledge
and creates models for System 1 to use so that solving the same task later
becomes easier.
An example of this is teaching a child to identify a group of animals. Say
that we teach a child how to identify the differences between a crocodile, a
turtle, and a donkey. The child will use System 2 to learn how to recognize
and distinguish each animal from the other. System 2 engages until the
process of distinguishing the animals becomes automated. Once System
2 has created reliable enough models for identifying and classifying the
animals, System 1 starts taking over the process.

1
System 1 operates independently and relies on the model created by
System 2. However, imagine introducing pictures of an alligator, a tortoise,
or a mule to this child. Since System 1 tends to generalize, the child would
likely classify one of the new animals as one of the previously learned
ones. System 1 has only been given a model for distinguishing the previous
animals from each other, and since System 1’s objective is to solve tasks
quickly, it would most likely fail. For the child to be able to distinguish
the new animals from the previous ones, System 2 has to revise the current
System 1 model. System 2 will then take over, identify the differences and
create systematic models aware of the similarities and differences. Since
the child already knows how to distinguish the previous animals from each
other, System 2 can transfer some of the previously learned knowledge to
not learn from scratch.

1.2 Thinking, Fast and Slow in machine learning


Human-like thinking behavior has long been an inspiration for machine
learning and artificial intelligence. The modeling of neural networks in
simple organisms, done by McCulloch and Pitts [14], and the further
development of the perceptron [23], marked the start of AI taking
inspiration from human-like thinking. Similarly, popular fields within
machine learning, such as transfer learning [28] and reinforcement learning
[25] are inspired by how humans and animals learn.
Booch et al. suggest translating Kahneman’s theories of human-like
thinking behavior into the AI environment and conjecture that this will
possibly lead to some of the same capabilities in machines [4]. In their
research, they try to identify similar traits between the systems Kahneman
writes about and the two main lines of work in AI: machine learning (data-
driven) and symbolic logic reasoning (knowledge-driven).
They write that machine learning does have many similar traits to
System 1 due to its ability to build models from sensory data. Perception
activities, such as seeing or reading, are handled by both System 1 and
machine learning, and both of them tend to be biased or imprecise due to
fast conclusions. However, while Kahneman’s description of System 1 can
grasp basic notions of causality to build models, machine learning has not
figured this out yet.
System 2, on the other hand, has the ability to solve complex problems,
and some of these traits might resemble AI techniques based on logic,
search, and optimization.

2
1.3 Transfer of knowledge
Humans excel at generalizing, converting, and reusing knowledge to solve
tasks in previously unknown domains. A person who usually drives a
coupe, for example, can also manage to operate a large van even though
driving a larger vehicle is perceptually different and can seem more
complicated. By knowing how to drive a car, a driver will not have to
learn to drive from scratch because that person understands how to operate
vehicles. In other words, that person can take knowledge from one car, i.e.,
one domain, and apply it to multiple different vehicles.

Accurate machine learning models rely on relevant, labeled data to


get good performance, and AI is often limited due to a lack of data that
corresponds with the task at hand. Another limitation of AI is training time
and computer resources. Implementing how humans generalize and reuse
knowledge and skills into AI allows the need for labeled and task-specific
data to be significantly reduced. Machines will be able to learn faster and
solve more tasks with less data, and training time and computer resources
are also reduced due to the model storing the weights from the relevant
data.

1.4 Thesis structure


This thesis contains the following chapters: (1) Introduction, (2) Back-
ground, (3) Project Description and Implementation, (4) Experiments and
Results, and (5) Conclusion.
The Background chapter(2) summarizes relevant research and studies
concerning this thesis. It includes information about machine learning,
transfer learning, neuroscience-inspired artificial intelligence, and other
research that is relevant to this thesis.
The Project Description and Implementation chapter(3) discusses what
has been done in this thesis. The chapter includes a description of
implementing the transfer learning model used in the experiments and
descriptions of datasets used in the Experiments and Results chapter. It
also includes the thoughts and ideas behind this thesis and how these ideas
might be relevant to current research in AI.
The Experiments and Results chapter(4) includes the experiments
completed in this thesis and the following results of the experiments. Each
experiment includes an analysis of the results.
Finally, the Conclusion chapter(5) summarizes the work done during
this thesis and discussions about possible future work concerning this
theme.

3
4
Chapter 2

Background

This section includes essential background information and related work


concerning this thesis.

2.1 Machine Learning


Humans are exceptional in creating models of the world based on small
amounts of data. A child, for example, can look at a few images of cats
and dogs and know how to differentiate between them. However, when
looking through large datasets with thousands or even millions of images,
a machine will always be superior to humans.
Machine learning is used to predict the future by looking at past data.
For example, classifying images of 100 different animals or determining
if a person might like an article of clothing based on that person’s
purchase history. A machine learning model uses algorithms to discover
patterns within data that lead to correct future predictions. The goal is
to maximize accuracy by predicting correct outcomes based on previously
seen data, and there are three main approaches to this; supervised learning,
unsupervised learning, and reinforcement learning.
Supervised learning uses labeled data to learn how to distinguish
between classes in the data. A machine is fed with labeled data and adjusts
its weights until the model reflects the data accordingly. On the other hand,
unsupervised learning uses unlabeled data and tries to find similarities
within the dataset. The machine tries to find meaningful clusters and uses
the clusters to distinguish the classes.
Lastly, we have reinforcement learning, an approach where an agent
interacts with an environment and gets a reward based on the action. The
agent’s goal is to maximize the reward, so the agent uses trial and error to
find an optimal way to interact with the system.
For a machine learning model to be able to solve a task, it has to
recognize a pattern within the given data. For example, when learning to
classify images, the model has to recognize features of the given data and
learn which features correspond with which class. The machine learns how
to classify through deep learning. Deep learning models are inspired by the
human brain and use artificial neural networks (ANNs) to simulate how

5
neurons within the human brain send information signals to each other.
There are several types of neural networks, and some types are better for
specific types of AI tasks.

2.1.1 Artificial neural networks


The idea behind ANNs derives from Rosenblatt’s development of the
perceptron [23]. ANNs are inspired by how the neurons within the human
brain receive information through our senses to make decisions. The
network consists of a large number of processors called neurons, and the
neurons operate within layers. The first layer consists of inputs x N from
the training data and is called the input layer, while the last layer, which
makes the prediction y, is called the output layer. There are hidden layers
between the input and output layers that perform most of the computations
to get correct predictions. The layers communicate through channels
called weights (w N ) that are assigned numerical values. Each input xi is
multiplied by the weights and sent to a neuron in the next layer. Each
neuron is associated with a numerical value called a bias b, which is added
to the input sum. The sum of xi wi + b is then sent through an activation
function which determines whether the neuron will get activated or not.
The activated neurons then send data to the next hidden layer. This process
is called forward propagation and is illustrated in figure 2.1.

Figure 2.1: A figure illustrating forward propagation through a single


neuron within a artificial neural network.

In the output layer, each neuron is assigned a probability corresponding


to the value of the neuron after forward propagation. Each neuron
is associated with a class, and the class with the highest probability
determines the output y. The predicted output ŷ is compared to the actual
output y, which determines the error, and the error is sent through a loss

6
function L. The result of L(y − ŷ) is then fed back to the network in which
the network uses an optimizer to adjust its weights and biases in such a
way that error is minimized. This is called back propagation.
Forward and back propagation is performed with all the training data
until the network learns to predict correctly.

2.1.2 Residual Neural Networks


Deep neural networks provide more accurate image classification and have
led to multiple breakthroughs in image classification. A deep neural
network uses its layers to learn the different features of an image. For
example, the first layer detects edges, the second discover shapes, the
third learns to recognize the background, etc. So consequently, one would
assume that stacking layers on top of each other would lead to accurate
classification. However, this is not always the case for traditional neural
networks. While increasing the number of layers might increase the
number of potential features that the model learns, increasing them too
much might lead to poor results due to vanishing/exploding gradients [7]
[3].
He et al.’s [9] results show that adding more layers increases training
and test errors due to the vanishing gradients in regular convolutional
neural networks. Thus, they propose using deep residual neural networks
(ResNet) to combat accuracy degradation. As shown in figure 2.2, they use
shortcut connections that perform identity mappings. They then add the
outputs from the identity mappings with the output of the layer ahead. He
et al.s experiments show that using residual blocks increases accuracy and
allows for training much deeper networks.

Figure 2.2: Diagram of a singe residual building block presented by He et


al. [9].

7
2.2 Transfer Learning
There are several other approaches within the three branches of machine
learning. Transfer learning is one of these. Transfer learning takes
inspiration from how humans transfer knowledge from one domain to
another. Consider an example of two people learning to play the piano.
One of them has an extensive background as a guitarist, while the second
person has no musical background. The guitarist will likely have fewer
problems playing the piano than a beginner because the guitarist has
experience from a similar domain [19]. Transfer learning uses a similar
approach to learning. Firstly, a machine is trained using one type of data
and learns to solve tasks related to that data. The machine’s knowledge
is then transferred and used to solve different tasks. In some instances,
additional training might be required if the previously obtained knowledge
does not correspond well with the new task. This training often consists of
adjusting the final layers of the artificial neural network instead of training
the entire network. Consequently, transfer learning tends to be an efficient
way of performing machine learning since both time and computation cost
is reduced [29].
Another motivation for using transfer learning is the problem of
acquiring vast amounts of labeled data. Since sufficient and relevant data
can be hard to acquire, using previous data can teach machines to solve
new tasks without requiring large amounts of new data. Oquab et al.
[18] show that by using large-scale datasets, such as ImageNet [6], image
representations are transferable to new image classification tasks without
the need for vast amounts of training data. Using transfer learning trained
on comprehensive datasets has also proven effective when classifying
data that is non-relatable to the training data. Morid et al. [16] achieve
adequate performance by training a CNN using ImageNet in order to
classify medical data.
The use of transfer learning has also proven successful in other
fields such as natural language processing (NLP) [21], recommendation
systems (RS) [15], autonomous driving [24], and research in cross-domain
knowledge transfer through sparsely connected neural networks has also
shown promising results [26]. Weiss et al. conjecture that with the recent
increment of data-collection through smartphones, computers, online
tracking, etc., the potential use for transfer learning increases to limit
computer resources [29].

2.3 Neuroscience-Inspired Artificial Intelligence


The idea behind neural networks is highly inspired by how the human
brain works. Several believe that a better understanding of how the
human brain functions will result in more intelligent machines [4] [8].
Hassabis et al. write that how humans and other animals accomplish
high-level transfer learning is still unknown. They add that exploring this
question may provide deeper insight into how machines may learn more

8
intelligently.
Booch et al. suggest translating Kahneman’s theories of human-like
thinking behavior into the AI environment and conjecture that this will
possibly lead to some of the same capabilities in machines [4]. Their paper,
Thinking Fast and Slow in AI, tries to identify similar traits between the
systems Kahneman writes about and the two main lines of work in AI:
machine learning (data-driven) and symbolic logic reasoning (knowledge-
driven).

2.3.1 Lifelong learning machines


Intelligent machines have to be able to acquire and process continuous
streams of information in order to interact with the real world [20]. This
ability is inadvertently something humans and other mammals have [5].
The human brain can learn from a continuous stream of information
and obtain new knowledge without necessarily having to forget old
knowledge. This ability is called lifelong learning or continuous learning and
is an ongoing challenge within machine learning.
Lifelong learning has proven to be problematic in machine learning
due to catastrophic interference or catastrophic forgetting [12]. Kemker et al.
show that training neural networks to perform multiple tasks sequentially
occasionally leads to new knowledge interfering with previous knowledge.
Richardson and Thomas [22] suggest three ways to avoid catastrophic
interference: (1) increasing computational resources in order to learn
new knowledge; (2) using localist coding, meaning avoid the usage of
overlapping representations; and (3) refreshing the old knowledge as new
knowledge is obtained in order to combine the knowledge.

2.3.2 Throwing a ball into a basket


While Booch et al.’s paper only gives an overview of the "Thinking, Fast
and Slow" approach in AI, there are examples of researchers who have
tried out possible approaches to AI based on humans’ way of thinking. An
example of this is Li et al.’s Robots That Think Fast and Slow: An Example
of Throwing the Ball Into the Basket [thro]. Their study proposes a possible
way of achieving human-like thinking behavior using two humanoid
robots trained to throw a ball into a basket. Li et al. train the robots using
two of Kahneman’s examples of human-like thinking behavior: The peak-
end rule and anchoring.
In their experiments, they use the peak-end rule as a method for the
robots to learn the best way to throw a ball. The robot uses the best shot
(peak result) and the last shot to decide the optimal shooting motion. They
write that the concept of the peak-end rule, which is seemingly defective in
human beings, helps achieve better results. They also use anchoring in their
study to adjust the shooting motion of the robots, given that the robots have
defective parts. The anchors are used to adjust the robot’s shooting motion.
If the robot experiences damage due to an accident, the anchors consider
what parts may have been affected and try to compensate for them. They

9
also write that the added anchors can be removed immediately if all parts
are restored. They compare this to how humans compensate when certain
parts of the human body are defective. For example, if a human has a
defective knee, the human uses anchors to learn how to limp and get on
forward. If the knee is in good physical condition again, they can remove
the adjustments without relearning how to walk.

2.4 Identifying tasks


In order to translate Kahneman’s dual-system approach within AI, ma-
chines have to learn to use both systems accordingly. Knowing which sys-
tem to use depends on if the machine is familiar enough with a task or not.
However, for the machine to determine its knowledge of a task, a suitable
way to identify tasks is a prerequisite.
Task Identification During Encounters (TIDE) [17] is a system capable of
identifying new tasks and choosing a best-suited neural network to solve
the tasks. TIDE trains a new network based on the new task if a neural
network is unavailable. By using pretrained networks when knowledge
is available and allowing TIDE to train a new network when it is not,
Norstein’s approach allows for a dual system framework. TIDE shows high
accuracy in identifying simple tasks when task descriptors are structured
in a certain way. However, memorizing each task descriptor is the best
alternative when the descriptors have no structure.
Tønnesen suggests a solution to the supervised approach in TIDE [27].
Planning and Identifying Neural Network (PINE) uses observations within
an environment and human-readable descriptions of the task to generate
task descriptions used by the system to identify a task. PINE then labels the
identified task as known or unknown. If the task is labeled as unknown, the
system chooses the most similar task and applies meta-learning to achieve
faster learning.

10
11
Chapter 3

Project Description and


Implementation

If transfer learning is to be used to achieve machines that can solve tasks


using a fast and slow system, knowing the limitations of transfer learning
is essential. Machines that use previously stored knowledge to learn new
tasks need to be able to determine what kind of knowledge is transferable
in order to solve new tasks accurately.
In this chapter, the motivation for using transfer learning to achieve
a dual-system framework will be discussed in section 3.1. The practice
used to emulate transfer of knowledge will be presented in section 3.2.
Section 3.3 introduces the datasets used for training and testing during
transfer learning. In section 3.4 a framework for a dual-system approach
is presented.

3.1 Motivation for using transfer learning


Transfer Learning’s current main application is when training machines
without enough data, as explained in section 2.2, but there are some that
believe that transfer learning is the key to general intelligence or even
achieving lifelong learning in agents [8]. Transfer learning, which takes
inspiration from human thinking, has proven to be successful in multiple
applications [29]. Since humans obtain the ability to generalize knowledge
and reuse it in other, and sometimes even more complex, domains, humans
can often solve multiple tasks by learning any knowledge. By assuming
that machines can have the same ability as humans to learn complex
tasks based on relatable and straightforward knowledge, a machine can
be thought to solve a task in a simple domain and transfer that ability to
a higher domain. Thus, teaching the machine to solve complicated tasks
without the need to procure data for that specific task.
The use of transfer learning in this thesis is done to give another
perspective on how machines can learn independently in order to attain
the abilities of human’s System 2. Transfer of knowledge is emulated by
the use of image classification, but an extended use of transfer learning in

12
Figure 3.1: A figure explaining how transfer learning is performed. (1) A
residual neural network is trained on initial data; (2) Transfer learning is
applied by reusing the already-trained network; (3) The network is then
reused using transfer learning. The fully connected layer is then trained
using new data. The model predicts based on the new data.

other areas of machine learning might also show good results of knowledge
transfer.

3.2 Emulating Transfer of Knowledge through Image


Classification
Humans have the ability to learn simple knowledge and use that know-
ledge to solve tasks in other domains. This ability is due to the human
brain being outstanding in generalizing knowledge and experiences. For
machines to learn like humans do, machines need to be able to recognize
patterns and similarities within relevant data and reuse a generalized trans-
lation to understand the higher-level data.
The experiments in chapter 4 has the objective to demonstrate what
kind of knowledge that is transferable, and what the limits of transfer
learning are. Transfer learning is applied using different combinations of
datasets in order to determine how well it manages to transfer knowledge
from the initial training.

13
A diagram explaining how transfer learning is applied is shown in
figure 3.1.

3.3 Datasets and modifications


The experiments are performed using three different datasets. 2D
geometric shapes dataset [1], 3D geometric shapes dataset, and Animal
Image dataset [2].
The 2D dataset consists geometric shapes of nine classes with ten
thousand 200x200 RGB images in each class. The perimeter, position and
rotation angle, background color and filling color of each shape is selected
randomly. In order to reduce training time, only three of the nine classes
are used when training the residual neural network. The residual neural
network is trained to classify circles, squares, and triangles which can be
seen in figure 3.2

Figure 3.2: Six samples from the 2D dataset

The 3D geometric shapes dataset contains four classes with ten


thousand 640x480 RGB images in each class. The dataset consists
of spheres, pyramids, cubes, and cylinders with random filling color,
background color, and rotation angle. The images in the dataset are
produced using mplot3d toolkit which allows for simple 3D plotting in
Matplotlib [10]. Matplotlib is a comprehensive plotting library used within
the programming language Python. Samples from the 3D dataset can be
seen in figure 3.3.
Animals Image dataset consists of three classes; cats, dogs and pandas.
Each class contains one thousand images of each animal.

3.3.1 Transfer learning using large scale datasets


Large scale datasets containing labeled samples are difficult to obtain and
expensive due to the time they take to label. If such datasets existed
in all domains within machine learning, every task would be solvable
using supervised learning. All though training neural networks using such
datasets would be computationally expensive and time-consuming, these

14
Figure 3.3: Eight samples from the 3D dataset

networks would likely be able to numerous of tasks accurately. However,


this is not the reality in machine learning.
When using image classification, such a dataset exist and Morid et. al’s
study [16] show that ImageNet [6] can be used as a non-medical dataset
when training a model using transfer learning to classify medical data.
Since the idea of training on non-related data seems to apply in the medical
domain, it seems reasonable to assume that this might also applicable to
other domains.

3.4 Transfer Library: Storing previous knowledge


The system presented in this section is not implemented in chapter 4 and is
only a conceptual design of a framework using a fast and a slow system.

Figure 3.4: A diagram illustrating the framework described in section 3.4.

15
By implementing at system that identify tasks, such as TIDE [17]
or PINE [27], and actively using transfer learning to learn from similar
knowledge when a task is unsolvable, achieving lifelong learning machines
might be possible. Machines that can learn by themselves will thus
see a problem, identify their lack of knowledge, and search for similar
knowledge in order to solve the task, as illustrated in figure 3.4.
The illustrated system uses a library containing all previous experi-
ences. When a new task is observed, it starts by identifying and labeling
the observation in order to determine if the task is solvable based on know-
ledge stored within the library. The task is completed if a matching la-
bel corresponding to a experience is found within the library. If a cor-
responding label is not found within the library, the system proceeds to
generalizing the observation to look for similar knowledge. When similar
knowledge is found, the system performs transfer learning using the sim-
ilar knowledge and the observed task. The new knowledge is then labeled
and stored for future use, and the task is completed.

16
17
Chapter 4

Experiments and Results

This chapter describes the experiments performed using the transfer


learning system described in section 3.2 and discusses the results of these
experiments.
In section 4.1, the overall motivation for the experiments are discussed.
Section 4.2 gives a description of the experiments performed in this thesis.
In section 4.2.1 the configurations used in the experiments are presented.
Section 4.3 describes the process of training the initial model used the
experiments, while sections 4.4 - 4.8 presents the experiments performed
and discusses their results. Finally, section ?? discusses the overall results
of the thesis.

4.1 Motivation
As described in section 2.2, transfer learning allows for less training data
and has proven effective in terms of time and computer resources. The
motivation of these experiments is to demonstrate a machine’s learning
capabilities when using transfer learning. The experiments are performed
using different compositions of the datasets described in section 3.3 to give
an overview of how transfer learning works in different environments.
The experiments shown demonstrate to what degree previous know-
ledge is transferable with an overall motivation to apply this to achieve
self-learning machines that resemble Kahneman’s description of Systems 1
and 2. The hope is that after training the machine’s equivalent of System 2
to solve a task and then introducing the machine to a similar but unknown
task, the machine will be able to generalize and transfer the prior know-
ledge to solve the new task.

4.2 Description
Emulating the transfer of knowledge will be demonstrated through five
different experiments. The accuracy and loss during 12 epochs of training
and test will measure the performance of each experiment.

18
Experiment Dataset for initial training Dataset for transfer learning
1 2D dataset 3D dataset
2 Untrained 3D dataset
3 2D dataset 3D dataset (incl. cylinders)
4 2D dataset Animal
5 ImageNet Animal dataset

Table 4.1: Overview of the datasets used in the experiments.

Experiments 1 and 2 use the 3D geometric shapes dataset to perform


transfer learning. In these experiments, three of the classes are in use. The
classes consist of images of spheres, pyramids, and cubes. Experiment
3 also uses the 3D geometric shapes dataset, but images of cylinders are
added when performing transfer learning in this experiment. Experiments
4 and 5 use the Animal Image dataset consisting of images of cats, dogs,
and pandas when performing transfer learning.

Experiments 1, 3, and 4 use the model initially trained using the 2D


dataset. Experiment 5 uses a PyTorch pre-trained model and is trained
using ImageNet. The model used in experiment 2 is also a PyTorch model,
but this model has no initial training.
An overview of datasets used for initial training and transfer learning
is shown in table 4.1.

4.2.1 Conditions and configuration


All experiments are trained and tested using a ResNet-18 model with the
same parameters; Adaptive Moment Estimation (ADAM) [13] is used as an
optimizer with a learning rate of 0.003, and Cross Entropy loss [30] is used
as the loss function. All experiments are trained using a batch size of 12,
but increasing the batch size is demonstrated during experiment 5. Each
run consists of 12 epochs of training, and each experiment is run ten times
to see how each run performs compared to the other and to calculate the
deviation.
The experiments using a model initially trained on the 2D dataset
(Experiments 1, 3, and 4) are taught using the same pretrained model to
have the same starting point.

4.3 Training ResNet-18 on 2D data


To use transfer learning in experiments 1, 3, and 4, a ResNet-18 model
is trained using the 2D geometric shapes dataset. To perform transfer
learning subsequently, accurate classification of the initial dataset is
imperative due to the model being transferred. Since the motive of the
experiments is to demonstrate a machine’s learning capabilities, using an
inaccurate model would result in unreliable results.

19
Figure 4.1: Accuracy and loss measurements from training the ResNet-18
model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.999 1. 0.003
2. 0.999 2. 0.006
3. 0.998 3. 0.006
Mean 0.996 Mean 0.014
Standard deviation 0.003 Standard deviation 0.01

Table 4.2: Table containing results of training the ResNet-18 model to


classify 2D geometric shapes dataset.

4.3.1 Results
The results of training the ResNet-18 model on 2D dataset is shown in table
4.3 and figure 4.1.

Twelve epochs of training the residual neural network using the 2D


dataset show good results. The top three runs and the mean of ten runs
show an accuracy above 98% with a standard deviation of 0.001. Figure 4.1
shows that the model has a steady increase in accuracy during the first four
epochs, followed by eight epochs where the accuracy is upheld above 90%.

4.4 Experiment 1 - Classifying matching 3D objects


Experiment 1 is performed by using the model trained on the 2D dataset.
All model weights are frozen, and the final, fully connected layer is
replaced with a new, untrained, linear layer. The new layer’s weights
are trained and adjusted according to the data from the 3D dataset. This
experiment involves classifying 3D objects that correspond with the shapes
in the 2D dataset. Therefore, only three classes(spheres, pyramids, and
cubes) are used during transfer learning.
This experiment determines how well transfer learning performs with
whether similar data.

20
Figure 4.2: Experiment 1: Measuring accuracy of classification when
classifying 3D objects with training using similar objects. The figure shows
ten distinct runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.984 1. 0.057
2. 0.983 2. 0.058
3. 0.983 3. 0.059
Mean 0.982 Mean 0.059
Standard deviation 0.001 Standard deviation 0.002

Table 4.3: Results of experiment 1

4.4.1 Results
The results of using classifying matching 3D objects can be seen in figure
4.2 and table 4.3.

Figure 4.2 shows ten different iterations of measuring accuracy during


training of the transfer learning model. The graph shows a notable increase
in accuracy over time in all ten iterations. The model’s accuracy is above
0.89 after the first measurement, indicating that the model can reuse
the knowledge from the 2D dataset to classify the 3D data. The final
measurement shows a mean of 98.2% accuracy and a standard deviation
of 0.001.
Training the fully connected layer with 3D data also shows a gradual
accuracy increase. The increase indicates that using the 2D model and
training the model with 3D data, i.e., similar data, gives positive results.

21
Figure 4.3: Experiment 2: Measuring accuracy of classification when
classifying 3D objects with no initial training. The figure shows ten distinct
runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.731 1. 0.669
2. 0.705 2. 0.709
3. 0.692 3. 0.742
Mean 0.673 Mean 0.777
Standard deviation 0.031 Standard deviation 0.060

Table 4.4: Results of experiment 2.

4.5 Experiment 2 - Classifying 3D objects without


training
For knowledge to be transferred from one domain to another, the data used
to train a neural network, and the data given to a transfer learning model
must be similar. More specifically, the new data’s generalization needs to
correspond with weights within the previously trained neural network.
In experiment 2, the ResNet-18 model has not been trained in advance,
meaning that the neural network has not seen any data and can therefore
not transfer any knowledge. This is done to see how a transfer learning
model performs with no initial training and to demonstrate how essential
similar data is to classify accurately.

4.5.1 Results
The results of classifying 3D data without initial training can be seen in
figure 4.3 and table 4.4

In figure 4.3 the displayed graph shows erratic behavior due to the

22
model classifying incorrectly. The figure also shows that using transfer
learning with no initial training leads to inconsistent classifications. The
results in table 4.4 show a relatively low accuracy and high loss for the
three best runs. The standard deviation is also high for both measurements,
confirming that the model is inconsistent during the ten iterations. The
inconsistent behavior is due to the model not determining a sure way to
classify the 3D data. Since the model’s weights have no initial training, the
model has no previous indication of how to classify the data. The model is
therefore guessing aimlessly.

For humans to learn new tasks based on previous tasks, the previous
tasks need to be somewhat similar. This experiment shows that the model
has similar features to how humans learn. Due to the model not being
trained to classify initially, the model seems to struggle to learn the new
task.

4.6 Experiment 3 - Introducing "confusing" objects


Experiment 1 shows that using the transfer learning model on similar 3D
shapes results in accurate classification. In this experiment, images of
cylinders are added when performing transfer learning.
Since the 3D dataset consists of figures viewed from all angles, there
will be some angles that might confuse the model. For example, a cylinder
viewed from above might look like a circle. This experiment is performed
for two reasons: (1) To observe how the model responds to an object that is
somewhat dissimilar from the 2D training data; and (2) to observe how the
model reacts to an object that can seem confusing.

4.6.1 Results
The results of classifying the entire 3D dataset are shown in figures 4.4 and
table 4.5.

The model starts by classifying with approximately 80% accuracy and


gradually increases over time. The mean accuracy at the last epoch is
95.3% with a standard deviation of 0.002, meaning that the model performs
consistently well.
Table 4.5 displays a high value of loss compared to experiment 1. This
means that the model classifies more incorrectly during its 12 epochs.
The measurement of loss combined with results displayed in figure 4.4
indicates that the added class manages to confuse the model at the
beginning of its training. However, the steady increase in accuracy for all
ten iterations indicates that the model manages to learn how to classify
accurately even though confusing objects are added.

23
Figure 4.4: Experiment 3: Measuring accuracy of classification when
adding "confusing" objects. The figure shows ten distinct runs of the model.

Top 3 runs Accuracy Top 3 runs Loss


1. 0.957 1. 0.142
2. 0.955 2. 0.143
3. 0.955 3. 0.143
Mean 0.953 Mean 0.145
Standard deviation 0.002 Standard deviation 0.002

Table 4.5: Results of experiment 3.

4.7 Experiment 4 - Transfer learning with non-similar


data
Experiments 1 and 3 show that introducing the transfer learning model
to similar-looking objects produces satisfactory results. In experiment 2,
however, the results show that the model depends on previously seen data
to perform reliably. Given the results from the previous experiments, it
is possible to determine that the model depends on prior knowledge to
perform new tasks.
In this experiment, the model is trained using the 2D dataset but is
then tasked to classify non-similar data. The non-similar data used in this
experiment is the Animal Image dataset introduced in section 3.3.

4.7.1 Results
The model results of classifying images of animals after being trained using
the 2D dataset can be seen in figure 4.5 and 4.6.

24
Figure 4.5: Experiment 4: Measuring accuracy of classification when
classifying images of animals with initial training from 2D data. The figure
shows ten distinct runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.445 1. 1.048
2. 0.443 2. 1.049
3. 0.442 3. 1.052
Mean 0.432 Mean 1.054
Standard deviation 0.009 Standard deviation 0.004

Table 4.6: Results of experiment 4.

Figure 4.5 shows erratic, non-deterministic behavior due to the data


used to train the model being too dissimilar. The graph shows accuracy
measurement between 34% and 37% during the first epoch for all ten
iterations. Since the model only classifies correctly about 13 of the time, it is
reasonable to argue that the model is shooting in the dark in the beginning.
Firstly, this indicates that the data from the 2D dataset is not suitable as
previous knowledge when learning to classify the animal dataset, but also
that the ResNet-18 model is dependent on similar enough data to perform
transfer learning successfully.
The results from table 4.6 show a high measurement of loss which
reflects the displayed graph. However, the table also shows a noticeable
reduction in standard deviation compared to experiment 2. This indicates
that the model is consistent in the way it classifies. Even though the model
displays poor results, the consistent behavior implies that initial training
shows more consistent results.

25
Figure 4.6: Experiment 5: Measuring accuracy of classification when
classifying when initially trained on large scale datasets. The figure shows
ten distinct runs of the model. The model is trained with batch size = 12.

4.8 Experiment 5 - Transfer learning with ImageNet


Finding suitable training data to perform transfer learning can be challen-
ging. Perming transfer learning can be problematic if the new data is too
different from the initial data, as seen in experiments 2 and 4.
To achieve lifelong learning within AI, a machine has to perceive the
world and interact with it based on a continuous stream of information.
If a machine is to be tough to interact with and solve multiple tasks, it
has to have a generalized understanding of the world. As the results in
experiment 4 show, training a machine on narrow data non-similar to the
new data will lead the machine to perform poorly. Achieving lifelong
learning in AI is therefore dependent on the data being comprehensive
enough for the machine to get a holistic view of the world.
In experiment 5, a pre-trained ResNet-18 model, trained on ImageNet,
is used to classify the animal dataset.

4.8.1 Results
The results of experiment 5 can be seen in figures 4.6 and 4.7 as well as table
4.7.
Figure 4.6 indicates that the model manages to achieve high accuracy
and low loss. However, all ten iterations of running the model seem to
result in fluctuating behavior during all 12 epochs. This is due to the batch
size being twelve, which is low. The results of increasing the batch size can
be seen in figure 4.7.
After running the model for 12 epochs, all iterations achieve accuracy
between 98% and 99% and loss between 0.05 and 0.06.

26
Top 3 runs Accuracy Top 3 runs Loss
1. 0.988 1. 0.033
2. 0.987 2. 0.038
3. 0.987 3. 0.039
Mean 0.984 Mean 0.044
Standard deviation 0.004 Standard deviation 0.008

Top 3 runs Accuracy Top 3 runs Loss


1. 0.990 1. 0.048
2. 0.988 2. 0.048
3. 0.987 3. 0.051
Mean 0.985 Mean 0.052
Standard deviation 0.002 Standard deviation 0.003

Table 4.7: Results of experiment 5. The measurements of accuracy and loss


on the top displays the results of training with batch size = 12, and the
bottom measurements of accuracy and loss displays the results of training
with batch size = 128.

Figure 4.7: Experiment 5: Measuring accuracy of classification when


classifying when initially trained on large scale datasets. The figure shows
ten distinct runs of the model. The model is trained with batch size = 128.

This proves that transfer learning using large-scale datasets, such as


ImageNet proves successful for when transferring knowledge.

27
4.9 Summary
The results show high accuracy when applying transfer learning to image
classification while also defining some limits to how similar data used for
training needs to be. Transferring similar knowledge proves successful,
and the model learns to distinguish between classes even though the
classes within the data can be confusing. The results also show that a model
performs significantly better when trained on any initial data, and no initial
training leads to inaccurate and inconsistent results. Training on large-scale
datasets proves to be successful and results in high accuracy and loss.

28
29
Chapter 5

Conclusion

5.1 Conclusion
In this thesis, the performance of a machine’s ability to transfer knowledge
has been tested using image classification. Other research approaching
Kahneman’s dual-system approach and research in lifelong learning ma-
chines and task identification have been discussed to give a broader view
of the subject and its benefits in achieving artificial general intelligence. The
use of transfer learning is proposed to achieve System 2 abilities within ma-
chine learning, and attempts have been made to argue that learning from
similar knowledge is a viable approach in approaching lifelong learning
machines. Using comprehensive large-scale datasets has also been pro-
posed as a possible way for machines to learn to solve multiple tasks from
different domains.
Evaluating the transfer learning system shows that the system performs
well in learning from similar knowledge, and the system’s dependency on
previous, related knowledge has also been demonstrated. The system has
also proven to classify accurately when presented with "confusing" objects.

5.2 Future work


5.2.1 Implementing "Thinking, fast and slow"
This thesis focuses only on the ability of machines to transfer knowledge
but to see how transfer learning performs as an alternative to System 2, a
complete implementation of a system needs to be in place. This system has
to include a suitable way of identifying tasks to determine which system to
use, such as TIDE or PINE, and an algorithm that can look through a library
of previously-stored knowledge as presented in figure 3.4. Implementing a
system that stores previous knowledge will be computationally expensive
due to the incremental increase in experiences. Consequently, two ways
of solving this issue can be suggested; (1) Implementing a fitting way to
generalize and store comparable segments of the knowledge to reduce
computational resources used in comparing knowledge, or (2) generalizing
and storing knowledge categorically in a hierarchical data structure.

30
5.2.2 Cross-domain knowledge transfer
The experiments in this thesis show that transfer learning proves successful
in image classification. However, to achieve artificial general intelligence,
a machine needs to be able to use transfer knowledge in other domains
and between domains. As mentioned in section 2.2, transfer learning in
other domains has proven to be successful, but research in cross-domain
knowledge transfer is still limited.
Using a generalized way to represent all types of data across domains
or using sparsely connected neural networks, such as Swarup et al. [26] do,
could be a possible way to achieve a transfer of knowledge across domains.

31
32
Bibliography

[1] EL KORCHI (2020) Anas. 2D geometric shapes dataset. https://data.mendeley.com/datasets/


Accessed: 2021-11-03. 2011.
[2] Animals Image Dataset. https://www.kaggle.com/code/bygbrains/dog-
cat-pandas-image-classifier/data. Accessed: 2022-04-14. 2020.
[3] Y. Bengio, P. Simard and P. Frasconi. ‘Learning long-term dependen-
cies with gradient descent is difficult’. In: IEEE Transactions on Neural
Networks 5.2 (1994), pp. 157–166. DOI: 10.1109/72.279181.
[4] Grady Booch et al. ‘Thinking Fast and Slow in AI’. In: CoRR
abs/2010.06002 (2020). arXiv: 2010 . 06002. URL: https : / / arxiv . org /
abs/2010.06002.
[5] Joseph Cichon and Wen-Biao Gan. ‘Branch-specific dendritic Ca2+
spikes cause persistent synaptic plasticity’. In: Nature 520.7546 (2015),
pp. 180–185.
[6] Jia Deng et al. ‘Imagenet: A large-scale hierarchical image database’.
In: 2009 IEEE conference on computer vision and pattern recognition. Ieee.
2009, pp. 248–255.
[7] Xavier Glorot and Yoshua Bengio. ‘Understanding the difficulty of
training deep feedforward neural networks’. In: Proceedings of the
thirteenth international conference on artificial intelligence and statistics.
JMLR Workshop and Conference Proceedings. 2010, pp. 249–256.
[8] Demis Hassabis et al. ‘Neuroscience-Inspired Artificial Intelligence’.
In: Neuron 95.2 (2017), pp. 245–258. ISSN: 0896-6273. DOI: https://doi.
org/10.1016/j.neuron.2017.06.011. URL: https://www.sciencedirect.
com/science/article/pii/S0896627317305093.
[9] Kaiming He et al. Deep Residual Learning for Image Recognition. 2015.
DOI : 10.48550/ARXIV.1512.03385. URL : https://arxiv.org/abs/1512.
03385.
[10] J. D. Hunter. ‘Matplotlib: A 2D graphics environment’. In: Computing
in Science & Engineering 9.3 (2007), pp. 90–95. DOI: 10 . 1109 / MCSE .
2007.55.
[11] Daniel Kahneman. Thinking, fast and slow. New York: Farrar, Straus
and Giroux, 2011. URL: https://www.amazon.de/Thinking-Fast-Slow-
Daniel-Kahneman/dp/0374275637/ref=wl_it_dp_o_pdT1_nS_nC?
ie=UTF8&colid=151193SNGKJT9&coliid=I3OCESLZCVDFL7.

33
[12] Ronald Kemker et al. ‘Measuring catastrophic forgetting in neural
networks’. In: Proceedings of the AAAI Conference on Artificial Intelli-
gence. Vol. 32. 1. 2018.
[13] Diederik P Kingma and Jimmy Ba. ‘Adam: A method for stochastic
optimization’. In: arXiv preprint arXiv:1412.6980 (2014).
[14] Warren S McCulloch and Walter Pitts. ‘A logical calculus of the
ideas immanent in nervous activity’. In: The bulletin of mathematical
biophysics 5.4 (1943), pp. 115–133.
[15] Orly Moreno et al. ‘Talmud: transfer learning for multiple domains’.
In: Proceedings of the 21st ACM international conference on Information
and knowledge management. 2012, pp. 425–434.
[16] Mohammad Amin Morid, Alireza Borjali and Guilherme Del Fiol.
‘A scoping review of transfer learning research on medical image
analysis using ImageNet’. In: Computers in biology and medicine 128
(2021), p. 104115.
[17] Peter Norstein. ‘Thinking fast and slow in intelligent systems’. In:
(2020). URL: http://hdl.handle.net/10852/79608.
[18] Maxime Oquab et al. ‘Learning and transferring mid-level image
representations using convolutional neural networks’. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. 2014,
pp. 1717–1724.
[19] Sinno Jialin Pan and Qiang Yang. ‘A survey on transfer learning’.
In: IEEE Transactions on knowledge and data engineering 22.10 (2009),
pp. 1345–1359.
[20] German I Parisi et al. ‘Continual lifelong learning with neural
networks: A review’. In: Neural Networks 113 (2019), pp. 54–71.
[21] Colin Raffel et al. ‘Exploring the limits of transfer learning with a
unified text-to-text transformer’. In: arXiv preprint arXiv:1910.10683
(2019).
[22] Fiona M Richardson and Michael SC Thomas. ‘Critical periods
and catastrophic interference effects in the development of self-
organizing feature maps’. In: Developmental science 11.3 (2008),
pp. 371–389.
[23] Frank Rosenblatt. ‘The perceptron: a probabilistic model for inform-
ation storage and organization in the brain.’ In: Psychological review
65.6 (1958), p. 386.
[24] Suvash Sharma et al. ‘Semantic segmentation with transfer learning
for off-road autonomous driving’. In: Sensors 19.11 (2019), p. 2577.
[25] Richard S Sutton and Andrew G Barto. Reinforcement learning: An
introduction. MIT press, 2018.
[26] Samarth Swarup and Sylvian R Ray. ‘Cross-domain knowledge
transfer using structured representations’. In: Aaai. Vol. 6. 2006,
pp. 506–511.

34
[27] Håkon Tønnessen. ‘Thinking Fast and Slow: PINE: Planning and
Identifying Neural Network’. In: (2021). URL: http://hdl.handle.net/
10852/90312.
[28] Lisa Torrey and Jude Shavlik. ‘Transfer learning’. In: Handbook of
research on machine learning applications and trends: algorithms, methods,
and techniques. IGI global, 2010, pp. 242–264.
[29] Karl Weiss, Taghi M Khoshgoftaar and DingDing Wang. ‘A survey of
transfer learning’. In: Journal of Big data 3.1 (2016), pp. 1–40.
[30] Zhilu Zhang and Mert Sabuncu. ‘Generalized cross entropy loss for
training deep neural networks with noisy labels’. In: Advances in
neural information processing systems 31 (2018).

35

You might also like