Master Inspera
Master Inspera
Department of Informatics
Faculty of mathematics and natural sciences
UNIVERSITY OF OSLO
Spring 2022
Thinking, Fast and Slow
http://www.duo.uio.no/
i
Contents
1 Introduction 1
1.1 Kahneman’s "Thinking, Fast and Slow" . . . . . . . . . . . . 1
1.2 Thinking, Fast and Slow in machine learning . . . . . . . . . 2
1.3 Transfer of knowledge . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 5
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Artificial neural networks . . . . . . . . . . . . . . . . 6
2.1.2 Residual Neural Networks . . . . . . . . . . . . . . . 7
2.2 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Neuroscience-Inspired Artificial Intelligence . . . . . . . . . 8
2.3.1 Lifelong learning machines . . . . . . . . . . . . . . . 9
2.3.2 Throwing a ball into a basket . . . . . . . . . . . . . . 9
2.4 Identifying tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 10
ii
4.7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.8 Experiment 5 - Transfer learning with ImageNet . . . . . . . 26
4.8.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Conclusion 30
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 Implementing "Thinking, fast and slow" . . . . . . . . 30
5.2.2 Cross-domain knowledge transfer . . . . . . . . . . . 31
iii
List of Figures
iv
List of Tables
v
Acknowledgements
I would like to give a special thank you to my supervisor, Jim Tørresen, for
good guidance and support throughout this thesis.
vi
Chapter 1
Introduction
In Thinking, Fast and Slow [11], Kahneman divides the human way of
thinking into two distinct categories. He distinguishes between their traits
and calls them "System 1" and "System 2". System 1 is categorized as fast
and automatic thinking and uses pre-learned knowledge to solve problems.
It concludes effortlessly, but it also tends to generalize when solving tasks
quickly.
System 2, on the other hand, is categorized as slow and effortful
thinking. It is rational and systematic and takes over when System 1
lacks the knowledge to solve a given task. System 2 attains knowledge
and creates models for System 1 to use so that solving the same task later
becomes easier.
An example of this is teaching a child to identify a group of animals. Say
that we teach a child how to identify the differences between a crocodile, a
turtle, and a donkey. The child will use System 2 to learn how to recognize
and distinguish each animal from the other. System 2 engages until the
process of distinguishing the animals becomes automated. Once System
2 has created reliable enough models for identifying and classifying the
animals, System 1 starts taking over the process.
1
System 1 operates independently and relies on the model created by
System 2. However, imagine introducing pictures of an alligator, a tortoise,
or a mule to this child. Since System 1 tends to generalize, the child would
likely classify one of the new animals as one of the previously learned
ones. System 1 has only been given a model for distinguishing the previous
animals from each other, and since System 1’s objective is to solve tasks
quickly, it would most likely fail. For the child to be able to distinguish
the new animals from the previous ones, System 2 has to revise the current
System 1 model. System 2 will then take over, identify the differences and
create systematic models aware of the similarities and differences. Since
the child already knows how to distinguish the previous animals from each
other, System 2 can transfer some of the previously learned knowledge to
not learn from scratch.
2
1.3 Transfer of knowledge
Humans excel at generalizing, converting, and reusing knowledge to solve
tasks in previously unknown domains. A person who usually drives a
coupe, for example, can also manage to operate a large van even though
driving a larger vehicle is perceptually different and can seem more
complicated. By knowing how to drive a car, a driver will not have to
learn to drive from scratch because that person understands how to operate
vehicles. In other words, that person can take knowledge from one car, i.e.,
one domain, and apply it to multiple different vehicles.
3
4
Chapter 2
Background
5
neurons within the human brain send information signals to each other.
There are several types of neural networks, and some types are better for
specific types of AI tasks.
6
function L. The result of L(y − ŷ) is then fed back to the network in which
the network uses an optimizer to adjust its weights and biases in such a
way that error is minimized. This is called back propagation.
Forward and back propagation is performed with all the training data
until the network learns to predict correctly.
7
2.2 Transfer Learning
There are several other approaches within the three branches of machine
learning. Transfer learning is one of these. Transfer learning takes
inspiration from how humans transfer knowledge from one domain to
another. Consider an example of two people learning to play the piano.
One of them has an extensive background as a guitarist, while the second
person has no musical background. The guitarist will likely have fewer
problems playing the piano than a beginner because the guitarist has
experience from a similar domain [19]. Transfer learning uses a similar
approach to learning. Firstly, a machine is trained using one type of data
and learns to solve tasks related to that data. The machine’s knowledge
is then transferred and used to solve different tasks. In some instances,
additional training might be required if the previously obtained knowledge
does not correspond well with the new task. This training often consists of
adjusting the final layers of the artificial neural network instead of training
the entire network. Consequently, transfer learning tends to be an efficient
way of performing machine learning since both time and computation cost
is reduced [29].
Another motivation for using transfer learning is the problem of
acquiring vast amounts of labeled data. Since sufficient and relevant data
can be hard to acquire, using previous data can teach machines to solve
new tasks without requiring large amounts of new data. Oquab et al.
[18] show that by using large-scale datasets, such as ImageNet [6], image
representations are transferable to new image classification tasks without
the need for vast amounts of training data. Using transfer learning trained
on comprehensive datasets has also proven effective when classifying
data that is non-relatable to the training data. Morid et al. [16] achieve
adequate performance by training a CNN using ImageNet in order to
classify medical data.
The use of transfer learning has also proven successful in other
fields such as natural language processing (NLP) [21], recommendation
systems (RS) [15], autonomous driving [24], and research in cross-domain
knowledge transfer through sparsely connected neural networks has also
shown promising results [26]. Weiss et al. conjecture that with the recent
increment of data-collection through smartphones, computers, online
tracking, etc., the potential use for transfer learning increases to limit
computer resources [29].
8
intelligently.
Booch et al. suggest translating Kahneman’s theories of human-like
thinking behavior into the AI environment and conjecture that this will
possibly lead to some of the same capabilities in machines [4]. Their paper,
Thinking Fast and Slow in AI, tries to identify similar traits between the
systems Kahneman writes about and the two main lines of work in AI:
machine learning (data-driven) and symbolic logic reasoning (knowledge-
driven).
9
also write that the added anchors can be removed immediately if all parts
are restored. They compare this to how humans compensate when certain
parts of the human body are defective. For example, if a human has a
defective knee, the human uses anchors to learn how to limp and get on
forward. If the knee is in good physical condition again, they can remove
the adjustments without relearning how to walk.
10
11
Chapter 3
12
Figure 3.1: A figure explaining how transfer learning is performed. (1) A
residual neural network is trained on initial data; (2) Transfer learning is
applied by reusing the already-trained network; (3) The network is then
reused using transfer learning. The fully connected layer is then trained
using new data. The model predicts based on the new data.
other areas of machine learning might also show good results of knowledge
transfer.
13
A diagram explaining how transfer learning is applied is shown in
figure 3.1.
14
Figure 3.3: Eight samples from the 3D dataset
15
By implementing at system that identify tasks, such as TIDE [17]
or PINE [27], and actively using transfer learning to learn from similar
knowledge when a task is unsolvable, achieving lifelong learning machines
might be possible. Machines that can learn by themselves will thus
see a problem, identify their lack of knowledge, and search for similar
knowledge in order to solve the task, as illustrated in figure 3.4.
The illustrated system uses a library containing all previous experi-
ences. When a new task is observed, it starts by identifying and labeling
the observation in order to determine if the task is solvable based on know-
ledge stored within the library. The task is completed if a matching la-
bel corresponding to a experience is found within the library. If a cor-
responding label is not found within the library, the system proceeds to
generalizing the observation to look for similar knowledge. When similar
knowledge is found, the system performs transfer learning using the sim-
ilar knowledge and the observed task. The new knowledge is then labeled
and stored for future use, and the task is completed.
16
17
Chapter 4
4.1 Motivation
As described in section 2.2, transfer learning allows for less training data
and has proven effective in terms of time and computer resources. The
motivation of these experiments is to demonstrate a machine’s learning
capabilities when using transfer learning. The experiments are performed
using different compositions of the datasets described in section 3.3 to give
an overview of how transfer learning works in different environments.
The experiments shown demonstrate to what degree previous know-
ledge is transferable with an overall motivation to apply this to achieve
self-learning machines that resemble Kahneman’s description of Systems 1
and 2. The hope is that after training the machine’s equivalent of System 2
to solve a task and then introducing the machine to a similar but unknown
task, the machine will be able to generalize and transfer the prior know-
ledge to solve the new task.
4.2 Description
Emulating the transfer of knowledge will be demonstrated through five
different experiments. The accuracy and loss during 12 epochs of training
and test will measure the performance of each experiment.
18
Experiment Dataset for initial training Dataset for transfer learning
1 2D dataset 3D dataset
2 Untrained 3D dataset
3 2D dataset 3D dataset (incl. cylinders)
4 2D dataset Animal
5 ImageNet Animal dataset
19
Figure 4.1: Accuracy and loss measurements from training the ResNet-18
model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.999 1. 0.003
2. 0.999 2. 0.006
3. 0.998 3. 0.006
Mean 0.996 Mean 0.014
Standard deviation 0.003 Standard deviation 0.01
4.3.1 Results
The results of training the ResNet-18 model on 2D dataset is shown in table
4.3 and figure 4.1.
20
Figure 4.2: Experiment 1: Measuring accuracy of classification when
classifying 3D objects with training using similar objects. The figure shows
ten distinct runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.984 1. 0.057
2. 0.983 2. 0.058
3. 0.983 3. 0.059
Mean 0.982 Mean 0.059
Standard deviation 0.001 Standard deviation 0.002
4.4.1 Results
The results of using classifying matching 3D objects can be seen in figure
4.2 and table 4.3.
21
Figure 4.3: Experiment 2: Measuring accuracy of classification when
classifying 3D objects with no initial training. The figure shows ten distinct
runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.731 1. 0.669
2. 0.705 2. 0.709
3. 0.692 3. 0.742
Mean 0.673 Mean 0.777
Standard deviation 0.031 Standard deviation 0.060
4.5.1 Results
The results of classifying 3D data without initial training can be seen in
figure 4.3 and table 4.4
In figure 4.3 the displayed graph shows erratic behavior due to the
22
model classifying incorrectly. The figure also shows that using transfer
learning with no initial training leads to inconsistent classifications. The
results in table 4.4 show a relatively low accuracy and high loss for the
three best runs. The standard deviation is also high for both measurements,
confirming that the model is inconsistent during the ten iterations. The
inconsistent behavior is due to the model not determining a sure way to
classify the 3D data. Since the model’s weights have no initial training, the
model has no previous indication of how to classify the data. The model is
therefore guessing aimlessly.
For humans to learn new tasks based on previous tasks, the previous
tasks need to be somewhat similar. This experiment shows that the model
has similar features to how humans learn. Due to the model not being
trained to classify initially, the model seems to struggle to learn the new
task.
4.6.1 Results
The results of classifying the entire 3D dataset are shown in figures 4.4 and
table 4.5.
23
Figure 4.4: Experiment 3: Measuring accuracy of classification when
adding "confusing" objects. The figure shows ten distinct runs of the model.
4.7.1 Results
The model results of classifying images of animals after being trained using
the 2D dataset can be seen in figure 4.5 and 4.6.
24
Figure 4.5: Experiment 4: Measuring accuracy of classification when
classifying images of animals with initial training from 2D data. The figure
shows ten distinct runs of the model.
Top 3 runs Accuracy Top 3 runs Loss
1. 0.445 1. 1.048
2. 0.443 2. 1.049
3. 0.442 3. 1.052
Mean 0.432 Mean 1.054
Standard deviation 0.009 Standard deviation 0.004
25
Figure 4.6: Experiment 5: Measuring accuracy of classification when
classifying when initially trained on large scale datasets. The figure shows
ten distinct runs of the model. The model is trained with batch size = 12.
4.8.1 Results
The results of experiment 5 can be seen in figures 4.6 and 4.7 as well as table
4.7.
Figure 4.6 indicates that the model manages to achieve high accuracy
and low loss. However, all ten iterations of running the model seem to
result in fluctuating behavior during all 12 epochs. This is due to the batch
size being twelve, which is low. The results of increasing the batch size can
be seen in figure 4.7.
After running the model for 12 epochs, all iterations achieve accuracy
between 98% and 99% and loss between 0.05 and 0.06.
26
Top 3 runs Accuracy Top 3 runs Loss
1. 0.988 1. 0.033
2. 0.987 2. 0.038
3. 0.987 3. 0.039
Mean 0.984 Mean 0.044
Standard deviation 0.004 Standard deviation 0.008
27
4.9 Summary
The results show high accuracy when applying transfer learning to image
classification while also defining some limits to how similar data used for
training needs to be. Transferring similar knowledge proves successful,
and the model learns to distinguish between classes even though the
classes within the data can be confusing. The results also show that a model
performs significantly better when trained on any initial data, and no initial
training leads to inaccurate and inconsistent results. Training on large-scale
datasets proves to be successful and results in high accuracy and loss.
28
29
Chapter 5
Conclusion
5.1 Conclusion
In this thesis, the performance of a machine’s ability to transfer knowledge
has been tested using image classification. Other research approaching
Kahneman’s dual-system approach and research in lifelong learning ma-
chines and task identification have been discussed to give a broader view
of the subject and its benefits in achieving artificial general intelligence. The
use of transfer learning is proposed to achieve System 2 abilities within ma-
chine learning, and attempts have been made to argue that learning from
similar knowledge is a viable approach in approaching lifelong learning
machines. Using comprehensive large-scale datasets has also been pro-
posed as a possible way for machines to learn to solve multiple tasks from
different domains.
Evaluating the transfer learning system shows that the system performs
well in learning from similar knowledge, and the system’s dependency on
previous, related knowledge has also been demonstrated. The system has
also proven to classify accurately when presented with "confusing" objects.
30
5.2.2 Cross-domain knowledge transfer
The experiments in this thesis show that transfer learning proves successful
in image classification. However, to achieve artificial general intelligence,
a machine needs to be able to use transfer knowledge in other domains
and between domains. As mentioned in section 2.2, transfer learning in
other domains has proven to be successful, but research in cross-domain
knowledge transfer is still limited.
Using a generalized way to represent all types of data across domains
or using sparsely connected neural networks, such as Swarup et al. [26] do,
could be a possible way to achieve a transfer of knowledge across domains.
31
32
Bibliography
33
[12] Ronald Kemker et al. ‘Measuring catastrophic forgetting in neural
networks’. In: Proceedings of the AAAI Conference on Artificial Intelli-
gence. Vol. 32. 1. 2018.
[13] Diederik P Kingma and Jimmy Ba. ‘Adam: A method for stochastic
optimization’. In: arXiv preprint arXiv:1412.6980 (2014).
[14] Warren S McCulloch and Walter Pitts. ‘A logical calculus of the
ideas immanent in nervous activity’. In: The bulletin of mathematical
biophysics 5.4 (1943), pp. 115–133.
[15] Orly Moreno et al. ‘Talmud: transfer learning for multiple domains’.
In: Proceedings of the 21st ACM international conference on Information
and knowledge management. 2012, pp. 425–434.
[16] Mohammad Amin Morid, Alireza Borjali and Guilherme Del Fiol.
‘A scoping review of transfer learning research on medical image
analysis using ImageNet’. In: Computers in biology and medicine 128
(2021), p. 104115.
[17] Peter Norstein. ‘Thinking fast and slow in intelligent systems’. In:
(2020). URL: http://hdl.handle.net/10852/79608.
[18] Maxime Oquab et al. ‘Learning and transferring mid-level image
representations using convolutional neural networks’. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. 2014,
pp. 1717–1724.
[19] Sinno Jialin Pan and Qiang Yang. ‘A survey on transfer learning’.
In: IEEE Transactions on knowledge and data engineering 22.10 (2009),
pp. 1345–1359.
[20] German I Parisi et al. ‘Continual lifelong learning with neural
networks: A review’. In: Neural Networks 113 (2019), pp. 54–71.
[21] Colin Raffel et al. ‘Exploring the limits of transfer learning with a
unified text-to-text transformer’. In: arXiv preprint arXiv:1910.10683
(2019).
[22] Fiona M Richardson and Michael SC Thomas. ‘Critical periods
and catastrophic interference effects in the development of self-
organizing feature maps’. In: Developmental science 11.3 (2008),
pp. 371–389.
[23] Frank Rosenblatt. ‘The perceptron: a probabilistic model for inform-
ation storage and organization in the brain.’ In: Psychological review
65.6 (1958), p. 386.
[24] Suvash Sharma et al. ‘Semantic segmentation with transfer learning
for off-road autonomous driving’. In: Sensors 19.11 (2019), p. 2577.
[25] Richard S Sutton and Andrew G Barto. Reinforcement learning: An
introduction. MIT press, 2018.
[26] Samarth Swarup and Sylvian R Ray. ‘Cross-domain knowledge
transfer using structured representations’. In: Aaai. Vol. 6. 2006,
pp. 506–511.
34
[27] Håkon Tønnessen. ‘Thinking Fast and Slow: PINE: Planning and
Identifying Neural Network’. In: (2021). URL: http://hdl.handle.net/
10852/90312.
[28] Lisa Torrey and Jude Shavlik. ‘Transfer learning’. In: Handbook of
research on machine learning applications and trends: algorithms, methods,
and techniques. IGI global, 2010, pp. 242–264.
[29] Karl Weiss, Taghi M Khoshgoftaar and DingDing Wang. ‘A survey of
transfer learning’. In: Journal of Big data 3.1 (2016), pp. 1–40.
[30] Zhilu Zhang and Mert Sabuncu. ‘Generalized cross entropy loss for
training deep neural networks with noisy labels’. In: Advances in
neural information processing systems 31 (2018).
35