Artificial Intelligence and Brain Research Neural Networks, Deep Learning and The Future of Cognition
Artificial Intelligence and Brain Research Neural Networks, Deep Learning and The Future of Cognition
Patrick Krauss
Intelligence and
Brain Research
Neural Networks, Deep Learning
and the Future of Cognition
Artificial Intelligence and Brain Research
Patrick Krauss
Artificial Intelligence
and Brain Research
Neural Networks, Deep Learning and
the Future of Cognition
Patrick Krauss
University of Erlangen-Nuremberg
Erlangen, Germany
Translation from the German language edition: “Künstliche Intelligenz und Hirnforschung” by Patrick
Krauss, © Der/die Herausgeber bzw. der/die Autor(en), exklusiv lizenziert an Springer-Verlag GmbH, DE, ein
Teil von Springer Nature 2023. Published by Springer Berlin Heidelberg. All Rights Reserved.
This book is a translation of the original German edition “Künstliche Intelligenz und Hirnforschung” by
Patrick Krauss, published by Springer-Verlag GmbH, DE in 2023. The translation was done with the help of
an artificial intelligence machine translation tool. A subsequent human revision was done primarily in terms
of content, so that the book will read stylistically differently from a conventional translation. Springer Nature
works continuously to further the development of tools for the production of books and on the related tech-
nologies to support the authors.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer-Verlag GmbH, DE,
part of Springer Nature 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the
whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustra-
tions, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar meth-
odology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer-Verlag GmbH, DE, part of Springer
Nature.
The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
How does Artificial Intelligence work? How does the brain function? What
are the similarities between natural and artificial intelligence, and what are
the differences? Is the brain a computer? What are neural networks? What is
Deep Learning? Should we attempt to recreate the brain to create real gen-
eral Artificial Intelligence, and if so, how should we best proceed?
We are in an extremely exciting phase of cultural and technological devel-
opment of humanity. Recently, Artificial Intelligence (AI) and Machine
Learning have been making their way into more and more areas, such as
medicine, science, education, finance, engineering, entertainment, and even
art and music, and are becoming ubiquitous in twenty-first-century life.
Particularly in the field of so-called Deep Learning, the progress is extraor-
dinary in every respect, and deep artificial neural networks show impressive
performance in a variety of applications such as processing, recognition, and
generation of images or natural language. Especially in combination with a
method called Reinforcement Learning, the networks are becoming increas-
ingly powerful, for example when it comes to playing video games, or they
even achieve superhuman abilities in complex board games like Go, when
they are trained by playing millions of games against themselves.
Many of the algorithms, design principles, and concepts used in AI today,
such as neural networks or the aforementioned reinforcement learning, have
their origins in biology and psychology. Therefore, neuroscience lectures
are becoming an integral part of courses such as computer science or artifi-
cial intelligence at more and more universities. But it is also worthwhile for
brain researchers to engage with artificial intelligence, as it not only provides
important tools for data evaluation, but also serves as a model for natural
vii
viii Preface
time to delve into the details. If you want to get an overview of how the
brain works, then start with Part I. However, if you are more interested in
getting an overview of the state of research in Artificial Intelligence, then
I recommend you start with Part II. The open questions and challenges of
both disciplines are presented in Part III. If you are already familiar with the
basics and open questions of AI and brain research and are primarily inter-
ested in the integration of both research branches, then read Part IV.
I have tried to clarify complex issues through illustrative diagrams wher-
ever possible. My children have actively supported me in creating these dia-
grams. English quotes have been translated by me, unless otherwise noted.
Colleagues, friends, and relatives have greatly helped in correcting errors
and improving the clarity and readability of the text. I would like to thank
Konstantin Tziridis, Claus Metzner, Holger Schulze, Nathaniel Melling,
Tobias Olschewski, Peter Krauß, and Katrin Krauß for this.
My special thanks go to Sarah Koch, Ramkumar Padmanaban, and Ken
Kissinger from Springer Publishing, who have supported me in the realiza-
tion of this book project.
My research work has been and continues to be supported by the German
Research Foundation. I am grateful to those in charge. Without the inspir-
ing working atmosphere at the Friedrich-Alexander University Erlangen-
Nuremberg and the University Hospital Erlangen, many of my ideas and
research projects would not have been possible. My special thanks go to
Holger Schulze, Andreas Maier, and Thomas Herbst for their support, as
well as Claus Metzner and Achim Schilling for the countless inspiring con-
versations. I sincerely thank my father for the many discussions on the var-
ious topics of this book. My greatest thanks go to my wife, who has always
supported everything over the years and continues to do so. What I owe her,
I cannot put into words. I dedicate this book to my children.
1 Introduction 1
xi
xii Contents
Part IV Integration
20 AI as a Tool in Brain Research 191
21 AI as a Model for the Brain 197
22 Understanding AI Better with Brain Research 203
23 The Brain as a Template for AI 209
24 Outlook 219
Glossary 231
1
Introduction
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 1
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_1
2
P. Krauss
long been considered the Holy Grail of research in the field of Artificial
Intelligence. Even though passing the Turing Test does not necessarily mean
that ChatGPT actually thinks, you should still remember November 30,
2022 well. It not only represents the most important milestone in the his-
tory of Artificial Intelligence to date, but its significance is certainly com-
parable to the invention of the loom, the steam engine, the automobile, the
telephone, the internet, and the smartphone, which often only turned out to
be game-changers and decisive turning points in development in retrospect.
shook man’s confidence in his ability for self-control and rationality. The
Libet experiments, which even question the existence of free will, further
intensified the impact of this affront.
AI can be considered as a newly added fourth major affront to human
self-understanding. Until now, our highly developed language was consid-
ered the decisive distinguishing feature between humans and other spe-
cies. However, the development of large language models like ChatGPT
has shown that machines are in principle capable of dealing with natural
language in a similar way to humans. This fact challenges our concept of
uniqueness and incomparability as a species again and forces us to at least
partially rethink our definition of being human.
This “AI affront” affects not only our linguistic abilities, but our cognitive
abilities in general. AI systems are already capable of solving complex prob-
lems, recognizing patterns, and achieving human-like or even superhuman
performance in certain areas (Mnih et al., 2015; Silver et al., 2016, 2017a,
b; Schrittwieser et al., 2020; Perolat et al., 2022). This forces us to reinter-
pret human intelligence and creativity, where we have to ask ourselves what
role humans play in a world where machines can take over many of our pre-
vious tasks. It also forces us to think about the ethical, social, and philo-
sophical questions that arise from the introduction of AI into our lives. For
example, the question arises as to how we should deal with the responsibility
for decisions made by AI systems, and what limits we should set on the use
of AI to ensure that it serves the good of humanity (Anderson & Anderson,
2011; Goodall, 2014; Vinuesa et al., 2020 ).
Less than half a year after the publication of ChatGPT, its successor GPT-
4 was released in March 2023, which significantly surpasses the performance
of its predecessor. This prompted some of the most influential thinkers in
this field to call for a temporary pause in the further development of AI sys-
tems, which are even more powerful than GPT-4, in a widely noticed open
letter1 to prevent a potentially impending loss of control.
1 https://futureoflife.org/open-letter/pause-giant-ai-experiments/
4
P. Krauss
potential to revolutionize it. Indeed, AI and brain research have always been
closely intertwined in their history. The so-called cognitive revolution in
the middle of the last century can also be seen as the birth of research in
the field of AI, where it developed as an integral part of the newly emerged
research agenda of cognitive sciences as an independent discipline. In fact,
AI research was never just about developing systems to take over tedious
work. From the beginning, it was also about developing and testing theories
of natural intelligence. As we will see, some astonishing parallels between
AI systems and brains have been uncovered recently. Therefore, AI plays an
increasingly important role in brain research, not only as a pure tool for data
analysis, but especially as a model for the function of the brain.
Conversely, neuroscience has also played a key role in the history of arti-
ficial intelligence and has repeatedly inspired the development of new AI
methods. The transfer of design and processing principles from biology to
computer science has the potential to provide new solutions for current
challenges in the field of AI. Here too, brain research not only plays the role
of providing the brain as a model for new AI systems. Rather, a variety of
methods for deciphering the representation and calculation principles of
natural intelligence have been developed in neuroscience, which can now
in turn be used as a tool for understanding artificial intelligence and thus
contribute to solving the so-called black box problem. An endeavor occa-
sionally referred to as Neuroscience 2.0. It is becoming apparent that both
disciplines will increasingly merge in the future (Marblestone et al., 2016;
Kriegeskorte & Douglas, 2018; Rahwan et al., 2019; Zador et al., 2023).
Once upon a time, there were six blind scientists who had never seen an ele-
phant and wanted to research what an elephant is and what it looks like. Each
examined a different part of the body and accordingly came to a different
conclusion.
The first blind approached the elephant and touched its side. “Ah, an ele-
phant is like a wall,” she said.
1 Introduction
5
The second blind touched the elephant’s tusk and exclaimed: “No, an ele-
phant is like a spear!”
The third blind touched the elephant’s trunk and said: “You are both
wrong! An elephant is like a snake!”
The fourth blind touched a leg of the elephant and said: “You are all wrong.
An elephant is like a tree trunk.”
The fifth blind touched the elephant’s ear and said: “None of you know
what you’re talking about. An elephant is like a fan.”
Finally, the sixth blind approached the elephant and touched its tail: “You
are all wrong,” he said. “An elephant is like a rope.”
If the six scientists had combined their findings, they would have come
much closer to the true nature of the elephant. In this story, the elephant
represents the human mind, and the six blind people represent the various
scientific disciplines that try to understand its functioning from different
perspectives (Fig. 1.1). The punchline of the story is that while each indi-
vidual’s perspective is valuable, a comprehensive understanding of cognition
Fig. 1.1 The Blind Men and the Elephant. Each examines a different part of the body
and accordingly comes to a different conclusion. The elephant represents the mind
and brain, and the six blind represent different sciences. The perspective of each indi-
vidual discipline is valuable, but a comprehensive understanding can only be achieved
through collaboration and interdisciplinary exchange
6
P. Krauss
can only be achieved when the different sciences work together and
exchange ideas.
This is the founding idea of cognitive science, which began in the 1950s
as an intellectual movement referred to as the cognitive revolution (Sperry,
1993; Miller, 2003). During this time, there were significant changes in
the way psychologists and linguists worked and new disciplines such as
computer science and neuroscience emerged. The cognitive revolution was
driven by a number of factors, including the rapid development of personal
computers and new imaging techniques for brain research. These technolog-
ical advances allowed researchers to better understand how the brain works
and how information is processed, stored, and retrieved. As a result of these
developments, an interdisciplinary field emerged in the 1960s that brought
together researchers from a wide range of disciplines. This field went by vari-
ous names, including information processing psychology, cognition research,
and indeed cognitive science.
The cognitive revolution marked a significant turning point in the his-
tory of psychology and related disciplines. It fundamentally changed the way
researchers approach questions of human cognition and behavior, paving the
way for numerous breakthroughs in areas such as artificial intelligence, cog-
nitive psychology, and neuroscience.
Today, cognitive science is understood as an interdisciplinary scientific
endeavor to explore the different aspects of cognition. These include lan-
guage, perception, memory, attention, logical thinking, intelligence, behav-
ior and emotions. The focus is primarily on the way natural or artificial
systems represent, process, and transform information (Bermúdez, 2014;
Friedenberg et al., 2021).
The key questions are: How does the human mind work? How does cog-
nition work? How is cognition implemented in the brain? And how can cog-
nition be implemented in machines?
Thus, cognitive science addresses some of the most difficult scientific
problems, as the brain is incredibly difficult to observe, measure, and manip-
ulate. Many scientists even consider the brain to be the most complex sys-
tem in the known universe.
The disciplines involved in cognitive science today include linguistics,
psychology, philosophy, computer science, artificial intelligence, neuro-
science, biology, anthropology, and physics (Bermúdez, 2014). For a time,
cognitive science fell somewhat out of fashion, particularly the idea of inte-
grative collaboration between different disciplines was somewhat forgotten.
Specifically, AI and neuroscience developed independently and thus also
away from each other. Fortunately, the idea that AI and brain research are
1 Introduction
7
complementary and can benefit greatly from each other is currently expe-
riencing a real renaissance, with the term “cognitive science” apparently
being interpreted differently in some communities today or considered too
old-fashioned, which is why terms like Cognitive Computational Neuroscience
(Kriegeskorte & Douglas, 2018) or NeuroAI (Zador et al., 2023) have been
suggested instead.
The legacy of the cognitive revolution is evident in the many innovative
and interdisciplinary approaches that continue to shape our understanding
of the human mind and its functioning. Whether through state-of-the-art
brain imaging techniques, sophisticated computer models, or new theoreti-
cal frameworks—researchers are constantly pushing the boundaries of what
we know about the human brain and its complex processes.
Brain-Computer Analogy
Many researchers believe that computer models of the mind can help us
understand how the brain processes information, and that they can lead to
the development of more intelligent machines. This assumption is based
on the brain-computer analogy (Von Neumann & Kurzweil, 2012). It
is assumed that mental processes such as perception, memory, and logical
thinking involve the manipulation of mental representations that correspond
to the symbols and data structures used in computer programs (Fig. 1.2).
Like a computer, the brain is capable of receiving, storing, processing, and
outputting information.2
However, this analogy does not mean that the brain is actually a com-
puter, but that it performs similar functions. By considering the brain as a
computer, one can abstract from biological details and focus on the way it
processes information to develop mathematical models for learning, mem-
ory, and other cognitive functions.
The brain-computer analogy is based on two central assumptions that
underlie cognitive science. These are computationalism and functionalism.
Linguistic
utterances are
a form of
behaviour
Fig. 1.2 Brain-Computer Analogy. Information processing includes the input, rep-
resentation, transformation, and output of information. For a computer, the input
may come from the keyboard, for a biological organism from the sensory organs.
This input must then be represented: by storing it on a hard drive or in the comput-
er’s RAM, or in the brain as momentary neuronal activity in short-term memory or
in long-term memory in the interconnection of neurons. Then a transformation or
processing takes place, i.e., mental processes or algorithms must act on the stored
information and change it to generate new information. For a computer, this could
be text processing, for humans, for example, logical reasoning. Finally, the result of
information processing is output. The output can be, for example, via a printer for
a computer. In living beings, the output corresponds to observable behavior or, as a
special case of behavior, to human linguistic utterances
Computationalism
In computationalism, it is assumed that cognition is synonymous with
information processing, i.e., that mental processes can be understood as
calculations and that the brain is essentially an information processing sys-
tem (Dietrich, 1994; Shapiro, 1995; Piccinini, 2004, 2009). Like any such
system, the brain must therefore represent information and then transform
these represented information, i.e., there must be mental representations of
information and there must be mental processes that can act on these rep-
resentations and manipulate them. Computationalism has greatly influenced
the way cognitive scientists and researchers in the field of artificial intelli-
gence think about intelligence and cognition.
However, there is also criticism of this view, as evidenced by numerous
ongoing debates in philosophy and cognitive science. Some critics argue, for
example, that the computer model of the mind is too simple and cannot
1 Introduction
9
fully capture the complexity and richness of human cognition. Others argue
that it is unclear whether mental processes can really be understood as calcu-
lations or whether they fundamentally differ from the way processes occur in
computers.
Functionalism
Is cognition only possible in a (human) brain? Functionalism clearly answers
this question with a no. Accordingly, mental states and processes are defined
exclusively by their functions or their relationship to behavior, not by their
physical or biochemical properties (Shoemaker, 1981; Jackson & Pettit,
1988; Piccinini, 2004). What does this mean in concrete terms?
Please imagine a car in your mind’s eye. And now remember the last sit-
uation in which you ate chocolate, and try to recall the taste as accurately
as possible. Did you succeed? I assume you did. As I write these lines, I
have brought to mind the same two mental states “seeing a car” and “tast-
ing chocolate”. Obviously, each of us can activate the corresponding mental
representations in our brains, even though you, I, and every other reader of
these lines have completely different brains. All human brains are of course
similar in their basic structure. But they are certainly not identical down to
the smallest detail, especially not in the exact wiring of the neurons, if only
because every person has had completely different, individual experiences in
their life, which affect the wiring pattern of the brain. In computer science
terminology, one would say that each person has a different, individual hard-
ware. Yet we can all bring to mind the same mental state.
While in the previous example the systems were somehow very similar—
they were always human brains—the following example may illustrate how
much the different physical implementations of the same algorithm can dif-
fer from each other. Consider the addition of two numbers. The representa-
tion of these numbers, as well as the associated process or algorithm to add
them, can be implemented in your brain when you “calculate in your head”,
or for example also in a laptop with spreadsheet program, a slide rule, a cal-
culator or a calculator app on your smartphone. Each time, the same num-
bers are represented and added, while the information processing systems are
completely different. This is the concept of multiple realizability.
Accordingly, the same mental state or process can in principle be real-
ized by completely different natural or artificial systems. Put simply, this
means that cognition and presumably also consciousness can in principle
be implemented in any physical system capable of supporting the required
10
P. Krauss
calculations. If many different human brains are already capable of this, why
should this ability be limited to humans or biological systems? From the per-
spective of functionalism, it is therefore quite possible that the ability for
human-like cognition can also be implemented in correspondingly highly
developed machines or alien brains (Fig. 1.3).
Conclusion
In recent years, spectacular advances in artificial intelligence have turned
our understanding of cognition, intelligence, and consciousness upside
down and will have profound impacts on society and our understanding of
the human brain. Cognitive science is the key to a deeper understanding of
brain and mind, and computer models of the mind can help us understand
how the brain processes information and contribute to the development of
smarter machines. These models are based on the central assumptions of
computationalism and functionalism, which emphasize the equivalence of
cognition and information processing as well as the independence of cogni-
tive processes from their physical implementation.
The advances in artificial intelligence have also led to the fields of neu-
roscience and computer science becoming increasingly intertwined. The
transfer of construction and processing principles from biology to computer
science promises new solutions for current challenges in artificial intelli-
gence. Conversely, the close collaboration of these disciplines will become
increasingly important in the future to understand complex systems like the
human brain.
1 Introduction
11
References
Anderson, M., & Anderson, S. L. (Hrsg.). (2011). Machine ethics. Cambridge
University Press.
Bermúdez, J. L. (2014). Cognitive science: An introduction to the science of the mind.
Cambridge University Press.
Dietrich, E. (1994). Computationalism. In thinking computers and virtual persons
(pp. 109–136). Academic.
Friedenberg, J., Silverman, G., & Spivey, M. J. (2021). Cognitive science: An intro-
duction to the study of mind. Sage.
Goodall, N. J. (2014). Machine Ethics and Automated Vehicles. In G. Meyer & S.
Beiker (Eds.), Road vehicle automation. Lecture notes in mobility. Springer. https://
doi.org/10.1007/978-3-319-05990-7_9.
Jackson, F., & Pettit, P. (1988). Functionalism and broad content. Mind,97(387),
381–400.
Kriegeskorte, N., & Douglas, P. K. (2018). Cognitive computational neuroscience.
Nature neuroscience,21(9), 1148–1160.
Marblestone, A. H., Wayne, G., & Kording, K. P. (2016). Toward an integration of
deep learning and neuroscience. Frontiers in computational neuroscience,10, 94.
Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in
cognitive sciences,7(3), 141–144.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
… & Hassabis, D. (2015). Human-level control through deep reinforcement
learning. Nature,518(7540), 529–533.
Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., … &
Tuyls, K. (2022). Mastering the game of Stratego with model-free multiagent
reinforcement learning. Science,378(6623), 990–996.
Piccinini, G. (2004). Functionalism, computationalism, and mental contents.
Canadian Journal of Philosophy,34(3), 375–410.
Piccinini, G. (2009). Computationalism in the philosophy of mind. Philosophy
Compass,4(3), 515–532.
Rahwan, I., Cebrian, M., Obradovich, N., et al. (2019). Machine behaviour.
Nature,568, 477–486.
12
P. Krauss
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S.,
… & Silver, D. (2020). Mastering Atari, Go, Chess and Shogi by planning with
a learned model. Nature,588(7839), 604–609.
Shapiro, S. C. (1995). Computationalism. Minds and Machines,5, 517–524.
Shoemaker, S. (1981). Some varieties of functionalism. Philosophical topics,12(1),
93–119.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G.,
… & Hassabis, D. (2016). Mastering the game of Go with deep neural networks
and tree search. Nature,529(7587), 484–489.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., …
& Hassabis, D. (2017a). Mastering the game of Go without human knowledge.
Nature,550(7676), 354–359.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., … &
Hassabis, D. (2017b). Mastering Chess and Shogi by self-play with a general rein-
forcement learning algorithm. arXiv preprint arXiv:1712.01815.
Sperry, R. W. (1993). The impact and promise of the cognitive revolution.
American Psychologist,48(8), 878.
Turing, A. M. (1950). Computing machinery and intelligence. Mind,59(236),
433–460.
Vinuesa, R., Azizpour, H., Leite, I., Balaam, M., Dignum, V., Domisch, S., … &
Fuso Nerini, F. (2020). The role of artificial intelligence in achieving the sustain-
able development goals. Nature Communications,11(1), 233.
Von Neumann, J., & Kurzweil, R. (2012). The computer and the brain. Yale
University Press.
Zador, A., Escola, S., Richards, B., Ölveczky, B., Bengio, Y., Boahen, K., … &
Tsao, D. (2023). Catalyzing next-generation artificial intelligence through
NeuroAI. Nature Communications,14(1), 1597.
Part I
Brain Research
In the first part of the book, the aim is to familiarize you with the most
important aspects of the structure and function of the brain. In doing so,
a detailed and systematic description of many molecular biological, physio-
logical, and anatomical details is deliberately avoided. The presentation also
makes no claim to completeness. Interested readers may deepen their knowl-
edge with one of the many excellent textbooks available on psychology and
neuroscience. Rather, these first chapters are intended to convey the basics
necessary from the author’s point of view, on the basis of which we want
to show the numerous cross-connections to Artificial Intelligence in later
chapters.
2
The Most Complex System in the Universe
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 15
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_2
16
P. Krauss
Conclusion
If you feel like the author, then you can’t imagine anything meaningful
under these orders of magnitude, as these numbers are essentially approach-
ing infinity. While we were able to rewrite the number of possible brains
or connectomes with a number with a trillion zeros, the number of men-
tal states can no longer be put into words, as we would already have to use
exponential notation for the number of digits of this number.
We can try to at least somewhat classify these absurd orders of magnitude.
Apart from infinity, what is the largest number that still has a meaningful
18
P. Krauss
meaning in the natural sciences? It is 1082. That’s about how many atoms
there are in the observable universe (Eddington, 1931). Or in other words:
there are significantly fewer of the smallest in the largest than there are possi-
ble brains and mental states.
We can rightly claim that the human brain is probably the most com-
plex system in the universe—that we know of. Of course, it is by no means
excluded that there are far more complex systems somewhere in the vast-
ness of the universe, such as the natural or possibly also artificial information
processing systems of a highly developed species or artificial intelligence.
References
Eddington, A. S. (1931). Preliminary note on the masses of the electron, the pro-
ton, and the universe. In Mathematical Proceedings of the Cambridge Philosophical
Society (Vol. 27, No. 1, pp. 15–19). Cambridge University Press.
Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C. J., Van Wedeen,
J., et al. (2008). Mapping the structural core of human cerebral cortex. PLoS
Biology, 6(7), 1479–1493.
Herculano-Houzel, S. (2009). The human brain in numbers: A linearly scaled-up
primate brain. Frontiers in Human Neuroscience, 3, 31.
Kandel, E. R., Schwartz, J. H., Jessell, T. M., Siegelbaum, S., Hudspeth, A. J., &
Mack, S. (Eds.). (2000). Principles of neural science (Vol. 4, pp. 1227–1246).
McGraw-Hill.
Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: A structural
description of the human brain. PLoS Computational Biology, 1(4), 0245–0251.
Pöppel, E. (1997). A hierarchical model of temporal perception. Trends in Cognitive
Sciences, 1(2), 56–61.
3
Building Blocks of the Nervous System
There is not the slightest reason to doubt that brains are nothing more than machines
with a huge number of parts, functioning in perfect accordance with the laws of physics.
Marvin Minsky
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 19
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_3
20
P. Krauss
Cell body
Telodendron
Dendrites
Axon
Cell nucleus
Myelin
Fig. 3.1 Biological neuron. Characteristic of the neuron are two types of extensions
of the cell membrane. The dendrites receive signals from other neurons and pass
them on to the cell body. Action potentials are transmitted to other neurons via the
axon. In many neurons, the axon is surrounded by a myelin sheath, which acts like a
kind of insulation and increases the signal conduction speed
When an electrical signal, a so-called action potential, reaches the end of the
presynaptic neuron, it triggers the release of chemical messengers, so-called
neurotransmitters, into the synaptic gap. These neurotransmitters diffuse
through the synaptic gap and then bind to receptors on the postsynaptic
neuron or target cell. So, there is a chemical transmission. Depending on
the number of receptors and the amount of released neurotransmitters, the
synapses can be of varying strength, i.e., they can transmit a signal more
or less well and thus have a varying influence on the activity of the succes-
sor neuron. In addition, there are basically two different types of synapses,
which can either stimulate or inhibit the activity of the postsynaptic cell:
excitatory synapses, which favor the activation of the successor neuron,
and inhibitory synapses, which rather prevent the activation of the succes-
sor neuron. Formally, a real number can be assigned to each synapse, where
the absolute value corresponds to the signal transmission strength and the
sign corresponds to the type of synapse: positive for excitatory and negative
for inhibitory. This number is called synaptic weight or short weight, often
abbreviated as w. In the entirety of the synapses of the nervous system, more
precisely: in the resulting network and the resulting directed information
flow, all the information within a nervous system is stored, i.e., all knowl-
edge as well as memories, learned skills, innate behaviors and reflexes, and
even character traits (Kandel et al., 2000).
All signals arriving at the dendrites are transmitted at the respective syn-
apses and weighted with the value of the synapse, which mathematically
3 Building Blocks of the Nervous System
21
1 No rule without exception: Actually, the effect depends on the receptor at the postsynapse. In the
dopaminergic system, for example, the same transmitter can have opposite effects (Missale et al., 1998).
Neurons have also been discovered that, contrary to Dale’s Principle, release two different transmitters
at their synapses (Vaaga et al., 2014).
22
P. Krauss
Neuroplasticity
Learning is synonymous with the experience-dependent change in the con-
nection structure of the neural networks in the brain, which is referred to
as neuroplasticity. Various types of changes can be distinguished (Kolb &
Whishaw, 1989, 1998; Kandel et al., 2000).
Pruning refers to the process of eliminating unused or weak synap-
tic connections between neurons in the brain. This thinning is an impor-
tant mechanism for the development and refinement of neuronal circuits
in the brain. During the development of the brain, neurons form synaptic
connections on a large scale to enable a multitude of synaptic connections.
However, through the process of pruning, only the most frequently used
connections between neurons are reinforced and maintained, while unused
or weak connections are eliminated. This leads to neural networks with more
efficient and specific connections. Pruning does not only occur during the
development of the brain, but can also occur in adulthood. It is believed
that pruning can help to relieve the brain by removing no longer needed or
redundant synaptic connections to free up resources for important connec-
tions. Moreover, pruning plays a crucial role in imprinting learning during
the development of the brain (Kandel et al., 2000).
Synaptic plasticity refers to the process of strengthening or weakening
the synaptic connection between neurons by enhancing or reducing activity
(Kandel et al., 2000). Various mechanisms are discussed that could under-
lie this type of plasticity. The Hebb’s rule states that the synaptic connec-
tion between two neurons is strengthened when the presynaptic neuron is
active and simultaneously activates the postsynaptic neuron. In other words:
“Cells that fire together wire together” (Hebb, 2005). So, if a neuron repeat-
edly fires and another neuron fires at the same time, then the synapse and
thus the connection between the two neurons is strengthened. This is also
referred to as long-term potentiation. Conversely, synaptic connections are
weakened if the neurons are rarely or never active at the same time, which is
also referred to as long-term depression.
The so-called Spike Timing Dependent Plasticity (STDP), on the other
hand, describes how the synaptic connection between neurons changes
due to the exact temporal sequence of their activity (Gerstner et al., 1996).
Accordingly, the synapse is only strengthened when the presynaptic neuron
is active shortly before the postsynaptic neuron, and weakened when the
presynaptic neuron is active shortly after the postsynaptic neuron. In a way,
STDP can be considered an extension of Hebb’s rule. The dependence on
3 Building Blocks of the Nervous System
23
Glia Cells
Not to be left unmentioned are the glia cells, which make up more than half
of the volume of neural tissue. Although this important type of cell cannot
generate action potentials itself, it still plays a significant role in the func-
tion of the nervous system. Their main functions include the myelination of
neurons to accelerate signal transmission and fix the position of neurons. In
addition, glia cells provide neurons with nutrients and oxygen, fight path-
ogens, remove dead neurons, and recycle neurotransmitters (Kandel et al.,
2000).
24
P. Krauss
Recent research suggests that glia cells may also play an active role in neu-
ral processing. For example, it has been found that astrocytes are involved in
the regulation of neurotransmitter levels, in maintaining the correct chemi-
cal environment for neural signal transmission, and even in modulating syn-
aptic transmission (Clarke & Barres, 2013; Sasaki et al., 2014). Microglia,
on the other hand, have been associated with pruning and neural develop-
ment (Schafer et al., 2012).
In some theoretical works, computer models have been simulated in
which glia cells, especially astrocytes, play a role in the generation and mod-
ulation of neuronal oscillations. In these models, it is assumed that glia cells
act as a kind of buffer for extracellular potassium ions (K+) released by neu-
rons during action potentials. By absorbing excess K+ ions and modulating
extracellular ion concentrations, glia cells could potentially regulate neuronal
excitability and synchronization, which could contribute to the emergence
of neuronal oscillations (Wang et al., 2012).
Conclusion
The brain has a remarkable ability to flexibly adapt to new circumstances
and change its structure. The potential implications of the various types of
neuroplasticity for the development of new learning algorithms in Artificial
Intelligence are considerable. By studying the learning and adaptability of
the brain, researchers can develop new algorithms that can learn and adapt
in a similar way to the human brain. The exploration of synaptic plasticity,
i.e., the strengthening or weakening of connections between neurons, has
for example led to the development of artificial neural networks that mimic
the structure and function of the human brain and are capable of learning
from examples and adapting their behavior over time.
The exploration of structural plasticity, which involves changes in the
physical structure of neurons and their connections, can also lead to the
development of algorithms that can restructure in response to new data and
experiences. This could lead to more flexible and adaptable AI systems that
can learn from new data and adjust their behavior accordingly.
References
Clarke, L. E., & Barres, B. A. (2013). Emerging roles of astrocytes in neural circuit
development. Nature Reviews Neuroscience, 14(5), 311–321.
3 Building Blocks of the Nervous System
25
Wang, F., Smith, N. A., Xu, Q., Fujita, T., Baba, A., Matsuda, T., & Nedergaard,
M. (2012). Astrocytes modulate neural network activity by Ca2+-dependent
uptake of extracellular K+. Science Signaling, 5(218), ra26–ra26.
Zhang, W., & Linden, D. J. (2003). The other side of the engram: Experience-
driven changes in neuronal intrinsic excitability. Nature Reviews Neuroscience,
4(11), 885–900.
4
Organization of the Nervous System
Never trust something that can think for itself if you can’t see where it keeps its brain!
Arthur Weasley
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 27
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_4
28
P. Krauss
and somatic nervous system and can be considered as an interface between the
organism and the world. The autonomic nervous system controls a number of
involuntary body processes such as heart rate, respiration, digestion, and pupil
contraction. These processes run automatically and are not subject to con-
scious control. The autonomic nervous system consists of two complementary
systems: Sympathetic and Parasympathetic. The sympathetic nervous system
prepares the organism for activity by activating organs and body functions,
thus putting the body into fight-or-flight mode. The parasympathetic, on the
other hand, is active during rest and relaxation phases and contributes to the
calming and recovery of the body. The autonomic nervous system responds
to a variety of stimuli, including stress, physical activity, and changes in the
environment. It works with other body systems to maintain homeostasis and
allow the body to function optimally. While the autonomic functions cannot
be consciously controlled, they can be influenced in various ways, e.g. through
relaxation techniques, movement and breathing exercises. The somatic nerv-
ous system, finally, is responsible for transmitting sensory information from
the sensory organs to the CNS and for forwarding motor commands from the
CNS to the muscles to control voluntary movements (Fig. 4.1).
Brain
Autonomic Somatic
nervous system nervous system
Brain
stem
Fig. 4.1 Organization of the nervous system. The central nervous system controls
the body and consists of the brain, the brainstem, and the spinal cord. The peripheral
nervous system includes the autonomic and the somatic nervous system. The auto-
nomic nervous system regulates involuntary body processes and includes the sympa-
thetic and the parasympathetic nervous system. The somatic nervous system transmits
sensory information to the CNS and forwards motor commands from the CNS to the
muscles
4 Organization of the Nervous System
29
The Thalamus
The thalamus is a small but important structure deep in the center of the
brain. It acts as a relay station for sensory information by processing signals
from the sensory organs and forwarding them to the corresponding areas of
the cerebral cortex (cortex). Only it alone is responsible for the perception and
conscious processing of all sensory stimuli and the (motor) response to them.
The thalamus is often referred to as the “gatekeeper” of the cerebral cortex, as
it filters and regulates the flow of information to the cerebral cortex, so that
only the most relevant and important sensory impressions reach conscious-
ness. This is achieved through a network of thalamocortical loops, i.e., recip-
rocal connections between the thalamus and various regions of the cerebral
cortex. These loops enable the activity of the cortex to be modulated and regu-
lated by filtering out irrelevant information and amplifying important signals.
The thalamus also plays a role in regulating sleep and wakefulness as well
as certain cognitive functions such as memory, attention, and language. It
is connected to various other parts of the brain, including the basal ganglia,
the hypothalamus, and the amygdala (parts of the so-called limbic system,
which is also responsible for processing emotions), contributing to a variety
of functions and behaviors.
The thalamus is not a homogeneous structure, but consists of several
nuclei, each with its own function. Specific nuclei, also referred to as relay
nuclei, are responsible for forwarding specific sensory information from
the periphery (such as eyes, ears, skin, or taste buds) to the corresponding
regions of the cerebral cortex. These nuclei act as filters, amplifying or atten-
uating the sensory input depending on the significance of the information.
The non-specific nuclei, on the other hand, are involved in regulating the
overall activity of the cortex. These nuclei receive input from various regions
of the brain, including the basal ganglia, the limbic system, and the brain-
stem, and send projections to the cortex to modulate its activity. Non-
specific thalamic nuclei play a crucial role in regulating attention, arousal,
and sleep-wake cycles (Kandel et al., 2000).
4 Organization of the Nervous System
31
The Cortex
The cerebrum, the largest part of the human brain and most higher mam-
mals, consists of two hemispheres, which are connected by the so-called
corpus callosum. This powerful bundle of fibers contains about 200 million
nerve fibers, the so-called commissures. For comparison: The optic nerve,
which transmits all information from the retina of the eye to the brain, con-
sists only of 1 million nerve fibers.
The nerve cells are organized in six layers parallel to the surface of the
cortex. To accommodate as large a brain surface as possible in a limited skull
volume, the cerebral cortex (Cortex) is strongly folded by numerous convo-
lutions (gyri) and furrows (sulci). This folding effect increases the number
of nerve cells that can be accommodated along the surface of the cerebrum.
Since the skull is subject to anatomical constraints, it is as if one were stuff-
ing a large towel into a small cooking pot. To solve the problem, one must
crumple the towel. The outer areas of the cortex contain the gray matter,
which consists of the cell bodies of the approximately 16 billion neurons of
the cerebrum. The rest of the cerebrum consists of the axons of the neu-
rons, the “connecting cables”, which are referred to as white matter due to
their myelin sheath. With an area of about 2500 cm2, which corresponds to
about four DIN A4 pages, the cortex is only about 3 mm thick. The cortex
is therefore a flat, approximately two-dimensional structure. All higher cog-
nitive performances such as conscious perception, language, thinking, mem-
ory, movement and feelings are located in the cortex (Kandel et al., 2000),
which is why the organization of the cortex will be presented in detail in the
next chapter.
The Hippocampus
The Hippocampus is located in the medial temporal lobe. It owes its name
to the fact that it looks like a seahorse. Together with the amygdala, it
belongs to the limbic system and forms the so-called hippocampus forma-
tion with the entorhinal cortex. It receives input from virtually all regions of
the cortex and is essential for the formation of new declarative and episodic
memory contents, especially in the consolidation and retrieval of episodic
and spatial memories, as well as for spatial navigation.
The hippocampus is considered the highest level of the cortical hierarchy,
as it integrates and processes information from various brain regions before
32
P. Krauss
routes like the daily commute to work or on long motorway journeys, the
cortex is in the so-called default mode, a resting state that the cortex always
assumes when there is “nothing to do”. Therefore, driving can often even be
meditative for experienced drivers, as the cortex is essentially idling and one
can indulge in daydreams or think about something completely different
than driving. However, there is also the danger that it becomes too monot-
onous and boring and one falls asleep. This certainly would not have hap-
pened during the first driving lesson.1
The cortex becomes fully active again when a critical situation occurs. In
such situations, the cortex takes over control again, and the driver focuses
his full attention on the road because he has to react quickly to the unex-
pected. Surely you have experienced this or a similar situation yourself,
whether driving or riding along. You are having a conversation during the
drive, thanks to their basal ganglia the driver can use her cortex for some-
thing other than steering the car. When suddenly a child runs onto the road
from behind a parked car, the driver immediately stops talking and focuses
her full attention on the road.
The Cerebellum
The cerebellum (Cerebellum) is one of the oldest brain regions in terms of
evolutionary history and makes up about 10% of the total volume of the
brain. The cerebellum consists of two hemispheres and is divided into var-
ious lobes and nuclei. Of the total 86 billion neurons in the brain, the cer-
ebellum contains the most neurons of all brain parts with about 69 billion
neurons. In evolution, the volume of the cerebellum has increased parallel
to the surface enlargement of the cortex and is most pronounced in humans
(Barton & Venditti, 2014; Sereno et al., 2020).
The cerebellum is responsible for the coordination and fine-tuning of
movements by integrating information from sensory systems and other brain
regions and interacting with motor systems. It also supports the planning
and execution of movements as well as adaptation to changed conditions
and the correction of errors. In addition, the cerebellum also plays a role in
1 The sequence of different phases of neural activity described here corresponds to the development of
competence levels, a model from developmental psychology, starting with unconscious incompetence,
conscious incompetence and conscious competence, to unconscious competence (Adams, 2011).
4 Organization of the Nervous System
35
cognition and emotion, although its exact function in these areas is not yet
fully understood (Kandel et al., 2000; Kolb & Whishaw, 2009).
One of the most important functions of the cerebellum is the temporal
coordination of movements, which enables the smooth execution of com-
plex movements such as walking, running, or playing a musical instrument.
The cerebellum receives information from the spinal cord, the sense of bal-
ance, and practically all sensory cortex areas and uses this to coordinate and
adjust movements. In addition to coordinating movements, the cerebellum
is also responsible for correcting movement errors. If there is a discrepancy
between the intended and the actual movement, the cerebellum recog-
nizes this error and makes the necessary adjustments to correct the move-
ment. The cerebellum also plays a crucial role in learning new motor skills.
Through constant adjustment and refinement of movements, the cerebellum
enables humans to improve their motor skills over time (Kandel et al., 2000;
Kolb & Whishaw, 2009).
The cerebellum is divided into three main parts. The vestibulocerebellum
receives information about the position and movement of the body from the
vestibular system (balance organs in the inner ear), which it uses to regulate
body posture and balance. It is also responsible for the precise coordination
of almost all eye movements that originate from various oculomotor centers
in the brainstem (Kandel et al., 2000; Kolb & Whishaw, 2009).
The spinocerebellum receives sensory signals from the spinal cord, which
provide information about the position of joints and muscles, as well as con-
tinuous feedback about the movement signals sent to the spinal cord and the
periphery. It also ensures fine-tuning of the movement signals and that the
movement is executed as intended. This includes the complex coordination
of the facial and laryngeal muscles required for speaking.
The pontocerebellum is functionally connected to the cerebral cortex. It
receives input from various areas, particularly from the premotor centers in
the frontal lobe (premotor cortex and supplementary motor cortex), where
motor plans are formed. In the cerebellum, these are temporally precisely
modulated and the planned activity of the involved muscles is coordinated.
The results of these calculations are forwarded to the thalamus, from where
they are finally forwarded to the motor cortex as input (Kandel et al., 2000).
There is some evidence that the cerebellum plays an important role not
only for motor functions but also for cognitive processes. There are various
arguments for this thesis. As already mentioned, the hemispheres of the cere-
bellum in humans are particularly pronounced, which in evolutionary terms
goes hand in hand with the growth of the cerebral cortex and the devel-
opment of human cognitive abilities. On the other hand, the cerebellum
36
P. Krauss
Purkinje cells
Parallel fibers
Output
Granule cells
Input
Fig. 4.2 Circuit diagram of the cerebellum. The granule cells represent the input
layer of the cerebellum, their axons rise and then branch into parallel fibers. There
they form synapses with the fan-like dendrites of the Purkinje cells, which function
like a kind of coincidence detectors. Only when enough action potentials arrive at
the same time via the parallel fibers, they are activated
4 Organization of the Nervous System
37
Conclusion
The brain is not a single large neural network, but on the contrary, it exhib-
its a very high degree of modularity. The study of the brain’s modularity can
provide valuable insights for the development of new AI systems, architec-
tures, and algorithms. An important concept of machine learning is the idea
of modular learning, where a complex problem is broken down into smaller,
more manageable sub-problems. This is comparable to the way the brain is
divided into different modules that perform specific functions.
Furthermore, studying the modularity of the brain also offers the oppor-
tunity to develop more efficient and specialized AI algorithms that can per-
form specific tasks with high accuracy. By breaking down a complex task
into smaller subtasks and developing separate modules for each subtask, an
AI system can be optimized for this specific task. This can lead to faster and
more accurate performance than an AI system that tries to handle all tasks in
a single module.
The modular organization of the brain allows for flexibility and adapt-
ability, as different modules can be reconfigured and used for different tasks.
Similarly, AI systems with modular architectures can be developed, allow-
ing for easy integration of new modules and adaptation to new tasks and
environments. For example, the hippocampus-cortex system could serve as a
model for future AI systems to solve the stability-plasticity dilemma.
38
P. Krauss
References
Adams, L. (2011). Learning a new skill is easier said than done. Gordon Training
International.
Bar-Gad, I., Morris, G., & Bergman, H. (2003). Information processing, dimen-
sionality reduction and reinforcement learning in the basal ganglia. Progress in
Neurobiology, 71(6), 439–473.
Barton, R. A., & Venditti, C. (2014). Rapid evolution of the cerebellum in humans
and other great apes. Current Biology, 24(20), 2440–2444.
Bell, C. C., Han, V., & Sawtell, N. B. (2008). Cerebellum-like structures and their
implications for cerebellar function. Annual Review of Neuroscience, 31, 1–24.
Bellmund, J. L., Gärdenfors, P., Moser, E. I., & Doeller, C. F. (2018). Navigating
cognition: Spatial codes for human thinking. Science, 362(6415), eaat6.
Bermudez-Contreras, E., Clark, B. J., & Wilber, A. (2020). The neuroscience of
spatial navigation and the relationship to artificial intelligence. Frontiers in
Computational Neuroscience, 14, 63.
Grossberg, S. (1982). How does a brain build a cognitive code? Studies of Mind
and Brain. Boston Studies inthe Philosophy of Science, 70. Springer. https://doi.
org/10.1007/978-94-009-7758-7_1.
Hawkins, J., & Blakeslee, S. (2004). On intelligence. Macmillan.
Kandel, E. R., Schwartz, J. H., Jessell, T. M., Siegelbaum, S., Hudspeth, A. J., &
Mack, S. (Eds.). (2000). Principles of neural science (Vol. 4, pp. 1227–1246).
McGraw-Hill.
Kolb, B., & Whishaw, I. Q. (2009). Fundamentals of human neuropsychology.
Macmillan.
Krauss, P., Tziridis, K., Metzner, C., Schilling, A., Hoppe, U., & Schulze, H.
(2016). Stochastic resonance controlled upregulation of internal noise after hear-
ing loss as a putative cause of tinnitus-related neuronal hyperactivity. Frontiers in
Neuroscience, 10, 597.
Krauss, P., Metzner, C., Schilling, A., Schütz, C., Tziridis, K., Fabry, B., & Schulze,
H. (2017). Adaptive stochastic resonance for unknown and variable input sig-
nals. Scientific Reports, 7(1), 2450.
Licklider, J. C. R. (1951). A duplex theory of pitch perception. The Journal of the
Acoustical Society of America, 23(1), 147–147.
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are
complementary learning systems in the hippocampus and neocortex: Insights
from the successes and failures of connectionist models of learning and memory.
Psychological Review, 102(3), 419.
Morris, R. G., Garrud, P., Rawlins, J. A., & O’Keefe, J. (1982). Place navigation
impaired in rats with hippocampal lesions. Nature, 297(5868), 681–683.
Oertel, D., & Young, E. D. (2004). What’s a cerebellar circuit doing in the auditory
system? Trends in Neurosciences, 27(2), 104–110.
4 Organization of the Nervous System
39
A person consists of two parts—his brain and his body. But the body has more fun.
Woody Allen
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 41
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_5
42
P. Krauss
Parietal lobe
Frontal lobe
Occipital lobe
Insular cortex
Temporal lobe
Fig. 5.1 Division of the Cortex into Lobes. Each hemisphere of the Cortex consists of
five lobes: frontal lobe, parietal lobe, temporal lobe, and occipital lobe. Not visible
from the outside, as it is covered by the frontal, parietal, and temporal lobes, is the
fifth, the insular lobe (dotted line)
eye sockets. The parietal lobe, which contains the primary somatosensory
center and areas of spatial perception, follows to the rear. The areas respon-
sible for visual perception are located in the occipital lobe. The temporal
lobe contains the auditory center as well as higher visual areas, e.g., for face
recognition, and areas of long-term memory, so-called multimodal associa-
tion areas. The insular cortex is the least explored and houses, among other
things, the primary taste area and other areas that represent vegetative body
states. Here, the primary center for basic viscerosensitivity, i.e., for informa-
tion from the viscera, is also suspected.
Brodmann Areas
In the early twentieth century, the psychiatrist and neuroanatomist
Korbinian Brodmann systematically examined many sections from all
regions of the cortex and found that the cortex can be divided into different
areas, each characterized by a unique combination of cell types, densities,
and layers (Brodmann, 1910). Brodmann published a system for mapping
the human brain based on its cytoarchitecture, i.e., the organization of cells
in the various brain regions. The Brodmann areas named after him are num-
bered according to their location in the brain, from the primary sensory and
motor areas in the upper central part of the brain to the more complex asso-
ciation areas in the front part of the brain. The numbering system ranges
from 1 to 52, with some numbers omitted because it was later found that
the areas were duplicates or combinations of other areas (Fig. 5.2).
Interestingly, these structural anatomical differences correlate with func-
tional differences. This means that each Brodmann area, which externally
differs from its neighbors under the microscope, actually performs a differ-
ent function. For example, it has been found that visual information is pro-
cessed in Brodmann areas 17, 18, and 19 in the back of the brain, while
auditory information is processed in areas 41 and 42 at the top edge of the
temporal lobe. Area 4 corresponds to the primary motor cortex, which sends
control commands to the skeletal muscles, and areas 1, 2, and 3 correspond
to the primary sensory cortex, which is responsible for processing tactile
stimuli from the body surface.
5 Organization of the Cortex
45
Fig. 5.2 Brodmann Areas. Each Brodmann area is defined by a unique combination
of cell types, cell densities, and layers and is associated with different functions and
behaviors. For example, Brodmann area 4 in the primary motor cortex is responsible
for controlling voluntary movements, while Brodmann area 17 in the primary visual
cortex is responsible for processing visual information
However, the actual number of cortex areas is significantly larger than the
division into 52 Brodmann areas suggests (Amunts & Zilles, 2015; Coalson
et al., 2018; Gordon et al., 2016). Using multimodal MRI data from the
Human Connectome Project (Elam et al., 2021) and an objective, semi-au-
tomatic neuroanatomical approach, 180 areas per hemisphere have recently
been identified that differ in their architecture, function, connectivity, or
topography (Glasser et al., 2016).
But even this is likely to be only a preliminary, approximate number. The
practical difficulty of dividing the cortex into meaningful maps or areas,
and the fundamental difficulty of assigning specific functions to these maps
that are neither too narrowly defined nor too general, is described by David
Poeppel as the Maps Problem (division into maps) and Mapping Problem
(assignment of functions) (Poeppel, 2012).
Fig. 5.3 Somatosensory Homunculus. Projection of the body surface onto the pri-
mary sensorimotor cortex area. The relative size of the body parts is greatly distorted
and is based on the respective density of touch and tactile receptors. The representa-
tions of hand, tongue, and lips take up significantly more space than those for foot
and back
5 Organization of the Cortex
47
several centimeters. High tactile acuity means that a lot of information from
the corresponding body region reaches the cortex, which in turn requires a
lot of processing capacity, i.e., a relatively large area of the corresponding
area. In contrast, only a relatively small area is needed for processing for the
back, where the receptor density is lowest.
The primary motor area, which sends control commands to the muscles,
behaves analogously. We can move our fingers very finely tuned and pre-
cisely, but not so much our foot. Accordingly, the areas for controlling hands
and face are relatively large compared to the areas that control the muscles of
the feet and toes. Try playing the piano with your feet!
The primary visual area (visual cortex) also follows this topographic
organizational principle. Here we speak of retinotopy, as the input comes
from the retina. The retina is structured in such a way that the spatial rela-
tionships between the different parts of the visual field are preserved. This
means that adjacent parts of the retina correspond to adjacent parts of the
visual field. When the information from the optic nerve reaches the visual
cortex in the brain, it is organized in a similar way. Adjacent regions of the
visual cortex correspond to adjacent regions of the visual field, creating a
“map” of the visual world. Again, this map is not a perfect one-to-one rep-
resentation of the visual field. Rather, there are distortions and irregularities
in the way different parts of the visual field are represented in the brain. The
fovea, the point of sharpest vision in the relatively small central area of the
retina, is greatly overrepresented in the primary visual cortex. Although the
fovea only makes up about 1 percent of the total area of the retina, about
half of the primary visual cortex is responsible for processing information
from the fovea.
The reason for this strong overrepresentation lies again in the high sen-
sitivity and visual acuity of the fovea. Since the fovea contains a high den-
sity of photoreceptors and is responsible for our sharpest vision, it provides
the brain with the most detailed and accurate information about the visual
world. Therefore, the brain assigns a larger proportion of its processing
resources to the fovea to ensure that this information is processed with the
greatest possible accuracy. In contrast, the peripheral regions of the visual
field are represented in the visual cortex in a highly compressed form.
The last example refers to the auditory system, which is responsible for
the perception of sounds and speech. Here too, there is a map-like struc-
ture, the tonotopy, where adjacent frequencies are processed at adjacent loca-
tions of the primary auditory area. The acoustic input is broken down into
48
P. Krauss
2/3 and finally to layer 5/6, which serves as the output layer of the cortex,
from where the information is transmitted back to the thalamus or to lower
cortical areas as feedback. In addition, fibers from layers 2/3 project into
further cortical areas. Depending on which layer is the source and target,
three different types of connections can be distinguished between two corti-
cal areas: ascending (bottom-up, feed-forward), descending (top-down, feed-
back), and horizontal (lateral) connections (Kolb & Whishaw, 2009; Kandel
et al., 2000; Imam & Finlay, 2020) (Fig. 5.4).
Layer 5/6
Feedback to
thalamus or
lower areas
Feedback from
higher areas
Layer 2/3
Output to
higher areas
Fig. 5.4 Information flow in the cortical layers. The six layers of the cortex are spe-
cifically interconnected. Input from sensory organs or hierarchically lower cortical
areas reaches layer 4, is forwarded to layers 2/3, and finally reaches the output layers
5/6. From there, feedback is sent back to the thalamus or hierarchically lower corti-
cal areas. In addition, fibers from layers 2/3 project into hierarchically higher cortical
areas or across the corpus callosum into homologous areas on the other hemisphere.
Feedback from higher areas ends in layers 2/3 and 5/6
50
P. Krauss
Fig. 5.5 Hierarchical-parallel organization of the cortex. The analysis of the con-
nections between cortical areas allows the reconstruction of a circuit diagram of the
information flow. The areas are hierarchically and parallel arranged, with several
areas present at each level and the number of areas per level increasing with higher
hierarchy level
Conclusion
The cerebral cortex has an impressively uniform structure that allows a hier-
archical-parallel processing, with the cortical areas being map-like organized.
They represent their respective input topologically preserving similarity and
importance. Similar input is processed in adjacent areas, with the relative
size of the area correlating with the amount of information and thus the
importance of the input. Understanding the organization and function of
the cortex could have significant implications for the development of new AI
architectures.
Insights into the vertical and horizontal organization of the cortex could
contribute to the development of new architectures for deep neural networks
to make information processing in different layers and between different
areas of the network more efficient. In addition, the study of the function-
ing of micro- and macro-columns as well as population coding in the cortex
could contribute to developing new approaches for information processing
and representation in artificial neural networks that may be more robust
and efficient. The analysis of canonical circuits and the flow of information
5 Organization of the Cortex
51
References
Amunts, K., & Zilles, K. (2015). Architectonic mapping of the human brain
beyond Brodmann. Neuron, 88(6), 1086–1107.
Averbeck, B. B., Latham, P. E., & Pouget, A. (2006). Neural correlations, popula-
tion coding and computation.Nature Reviews Neuroscience, 7(5), 358–366.
Brodmann, K. (1910). Feinere Anatomie des Grosshirns (pp. 206–307). Springer.
Coalson, T. S., Van Essen, D. C., & Glasser, M. F. (2018). The impact of tradi-
tional neuroimaging methods on the spatial localization of cortical areas.
Proceedings of the National Academy of Sciences, 115(27), E6356–E6365.
Elam, J. S., Glasser, M. F., Harms, M. P., Sotiropoulos, S. N., Andersson, J. L.,
Burgess, G. C., … & Van Essen, D. C. (2021). The human connectome project:
A retrospective. NeuroImage, 244, 118543.
Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in
the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47.
Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub,
E., … & Van Essen, D. C. (2016). A multi-modal parcellation of human cere-
bral cortex. Nature, 536(7615), 171–178.
Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., &
Petersen, S. E. (2016). Generation and evaluation of a cortical area parcellation
from resting-state correlations. Cerebral Cortex, 26(1), 288–303.
Imam, N., & Finlay, B. L. (2020). Self-organization of cortical areas in the devel-
opment and evolution of neocortexs. Proceedings of the National Academy of
Sciences, 117(46), 29212–29220.
Kandel, E. R., Schwartz, J. H., Jessell, T. M., Siegelbaum, S., Hudspeth, A. J., &
Mack, S. (Eds.). (2000). Principles of neural science (Vol. 4, pp. 1227–1246).
McGraw-Hill.
Kolb, B., & Whishaw, I. Q. (2009). Fundamentals of human neuropsychology.
Macmillan.
Mountcastle, V. B. (1997). The columnar organization of the neocortex. Brain: A
Journal of Neurology, 120(4), 701–722.
52
P. Krauss
Poeppel, D. (2012). The maps problem and the mapping problem: Two challenges
for a cognitive neuroscience of speech and language. Cognitive Neuropsychology,
29(1–2), 34–55.
Van Essen, D. C., Anderson, C. H., & Felleman, D. J. (1992). Information pro-
cessing in the primate visual system: An integrated systems perspective. Science,
255(5043), 419–423.
6
Methods of Brain Research
CT
The Computed Tomography (CT) uses X-rays to create detailed cross-sec-
tional images of the brain. X-ray images of the head are taken from various
angles and reconstructed into a three-dimensional image of the brain on a
computer. The resulting image shows the structures of the brain, including
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 53
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_6
54
P. Krauss
the ventricles, the skull, and the blood vessels. CT imaging is a common
method for diagnosing neurological diseases such as stroke, traumatic brain
injury, and brain tumors. However, CT imaging does not provide informa-
tion about the functional activity of the brain, which can be obtained with
other imaging techniques such as PET, fMRI, EEG, and MEG (Ward,
2015; De Groot & Hagoort, 2017).
PET
The Positron Emission Tomography (PET) visualizes the metabolic activity
of cells and tissues in the body. A small amount of a radioactive substance,
a so-called radiotracer, is injected into the body. The radiotracer emits pos-
itrons, positively charged particles that interact with electrons in the body.
When a positron encounters an electron, they annihilate each other and pro-
duce gamma rays, which are detected by the PET scanner. The PET scan-
ner records these gamma rays and creates a three-dimensional image of the
brain’s metabolic activity from them. However, since the brain is constantly
active, meaningful data with PET can only be generated by subtracting two
images. Typically, an image is taken during a specific cognitive task or stim-
ulus, and another image of the brain’s background activity is taken, and
then the difference image is calculated (Ward, 2015; De Groot & Hagoort,
2017).
MRI
The Magnetic Resonance Imaging (MRI), also known as nuclear magnetic
resonance imaging, uses the property of hydrogen nuclei to align along mag-
netic field lines to generate detailed images. Short magnetic pulses stimulate
the aligned nuclei to emit electromagnetic waves, which are detected and
used to determine the distribution of hydrogen, mainly in the form of water.
Functional Magnetic Resonance Imaging (fMRI) uses the fact that the red
blood pigment affects the magnetic signal of the hydrogen nuclei differently,
depending on whether it has bound oxygen or not. Comparing measure-
ments under different stimulation conditions allows changes in blood oxy-
gen saturation (Blood Oxygenation Level Difference, BOLD) to be derived,
which serve as an indirect measure of the change in blood flow to a spe-
cific brain area. The assumption behind this is that active brain areas require
more oxygen and are therefore more heavily perfused. The BOLD signal
6 Methods of Brain Research
55
builds up slowly and reaches a maximum about six to ten seconds after the
stimulus begins, before it slowly decreases again. Compared to MEG and
EEG, nuclear magnetic resonance imaging has a lower temporal resolution
of about one recording per second (1 Hz), but a much higher spatial reso-
lution in the range of about one cubic millimeter (Menon & Kim, 1999),
i.e., the activity of the brain can be measured in about one million so-called
voxels1 simultaneously (Ward, 2015; De Groot & Hagoort, 2017).
1A voxel, from volume pixel, is the three-dimensional analogue to a pixel (picture element).
56
P. Krauss
brain regions below the cerebral cortex (subcortical regions). MEG and
EEG therefore complement each other well and can be used simultaneously
in combined M/EEG measurements (Ward, 2015; De Groot & Hagoort,
2017).
Intracranial EEG
Intracranial Electroencephalography (iEEG) is an invasive technique for
measuring brain activity (Parvizi & Kastner, 2018). Electrodes are placed
directly on or in the brain, usually during surgery, to record electrical activ-
ity. This allows for very high spatial resolution, as the electrodes are placed
in specific regions of the brain, often in close proximity to the area of inter-
est. iEEG also has a very high temporal resolution, allowing electrical signals
to be recorded at a rate of up to several thousand times per second (Ward,
6 Methods of Brain Research
57
2015; De Groot & Hagoort, 2017). Patients being treated for drug-resist-
ant epilepsy have iEEG electrodes implanted for diagnostic purposes a few
weeks before surgery to resect their epilepsy focus, to determine which brain
regions need to be spared during surgery to preserve important functions
such as language. These patients often participate in neuropsychological
studies, as the data obtained in this way are extremely rare and valuable.
Conclusion
Today, various techniques are available to measure the structure and activity
of the brain. Each method has specific advantages and disadvantages, and no
method is perfect. Ideally, complementary methods such as MEG and EEG
or fMRI and EEG are used simultaneously to get a more complete picture
of brain activity. But even then, we are still far from being able to read the
brain in the spatiotemporal resolution that would be necessary to capture
the exact temporal activity course of each neuron and each synapse. This is
one reason why computer models of brain function are absolutely necessary.
In contrast to the brain, simulated models offer the crucial advantage that
all internal parameters and variables can be read out at any time with any
desired accuracy.
References
De Groot, A. M., & Hagoort, P. (Eds.). (2017). Research methods in psycholinguistics
and the neurobiology of language: A practical guide. Wiley.
Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V.
(1993). Magnetoencephalography – Theory, instrumentation, and applications
to noninvasive studies of the working human brain. Reviews of Modern Physics,
65(2), 413.
Handy, T. C. (Eds.). (2005). Event-related potentials: A methods handbook. MIT
press.
Menon, R. S., & Kim, S. G. (1999). Spatial and temporal limits in cognitive neuro-
imaging with fMRI. Trends in Cognitive Sciences, 3(6), 207–216.
Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial
electroencephalography. Nature Neuroscience, 21(4), 474–483.
Sur, S., & Sinha, V. K. (2009). Event-related potential: An overview. Industrial
Psychiatry Journal, 18(1), 70.
Ward, J. (2015). The student’s guide to cognitive neuroscience. Psychology Press.
7
Memory
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 59
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_7
60
P. Krauss
Sensory Memory
The sensory memory is the first stage in the memory’s information process-
ing system. It stores information received through the sensory organs only
for a very short period, usually less than a second. Sensory memory is crucial
for us to perceive and process the world around us. Depending on the sen-
sory modality, different types of sensory memory are distinguished, includ-
ing iconic, echoic, and haptic memory.
Iconic memory refers to visual stimuli and allows an image to be briefly
retained in the mind’s eye (Sperling, 1963). This type of memory plays a
crucial role in tasks such as reading, where the eyes need to move quickly
to process the text. Echoic memory, on the other hand, refers to auditory
stimuli and allows sounds or words to be retained for a short time, which
forms the basis for the ability to process language and communication.
Finally, haptic memory refers to tactile stimuli and allows sensations of phys-
ical touch to be stored for a short time. This type of memory is important
for people to navigate their environment and interact with objects in their
surroundings.
Sensory memory is believed to correspond to short-term neural activity in
specific sensory brain regions such as the visual or auditory cortex. The exact
neural correlates of sensory memory vary depending on the type of sensory
information being stored, and research continues into how this information
is processed and maintained in the brain (Gazzaniga et al., 2006).
The partial-report paradigm, introduced by George Sperling in 1963
(Sperling, 1963), is a method for investigating sensory memory. In the clas-
sic version, subjects are presented with a matrix of letters or numbers for
a very short time, usually about 50 milliseconds. Immediately afterwards, a
tone or cue is given indicating which row of the grid the participants should
name. The subjects are then asked to repeat the letters or numbers from the
indicated row. Sperling found that participants were able to name almost
all the letters in the indicated row if the tone or cue was given immediately
after the grid disappeared, even though the participants did not know which
7 Memory
61
row was relevant before the matrix was presented. This suggests that sensory
memory must have a relatively large storage capacity, as apparently all rows
of the matrix can be stored for a short period of time. However, if the cue
to the row was delayed by a few hundred milliseconds, participants usually
could only remember very few letters from the corresponding row. This, in
turn, suggests that sensory memory fades quickly.
Sperling also introduced another variant of the paradigm, the full-report
paradigm, in which the test subjects are asked to remember all the letters or
numbers of the grid, not just those in a particular row. In general, however,
participants are then only able to remember a few letters or numbers from
the entire matrix, regardless of whether the cue is given immediately or later.
This suggests that conscious access to sensory memory has a limited capacity,
which is smaller than the actual storage capacity of sensory memory.
The partial-report paradigm is widely used in research on sensory memory
and attention and has provided important insights into the nature of these
processes. It has also been used to investigate visual and auditory perception
as well as the encoding and retrieval of memories (Fig. 7.1).
Short-term Memory
Short-term memory refers to the ability to temporarily store a small amount
of information for a few seconds to a minute without altering it. It is based
on short-term neural activity in multimodal brain regions, particularly in
the prefrontal, parietal, and temporal cortex (Fuster & Alexander, 1971;
Funahashi et al., 1989). The involvement of the thalamus, basal ganglia, and
cerebellum is also discussed.
Short-term memory is essential for daily life as it allows us, for example,
to remember important details such as phone numbers, addresses, or direc-
tions. However, short-term memory has a very limited capacity. This is often
associated with the “magic number” seven. This goes back to a publication
by George Miller from the 1950s (Miller, 1956). There he suggested that
the average number of elements or “chunks” that can be stored in short-
term memory is seven plus or minus two. Accordingly, most people should
be able to store five to nine units of information at a given time. However,
recent research suggests that the actual number may be somewhat lower,
with an average capacity of only about four chunks (Cowan, 2001; Luck &
Vogel, 1997; Rouder et al., 2008). This means that if we try to remember
more than these four to five objects or contents, we do it very poorly and we
will probably forget some of them.
62
P. Krauss
Fig. 7.1 Partial-Report Paradigm. Top: Sensory memory has a relatively large stor-
age capacity, as almost all letters in the indicated row can be named when the cue is
given immediately. Middle: However, it fades quickly, as shown when the cue is given
with a delay. Bottom: The full-report paradigm shows that conscious access to sensory
memory has a limited capacity, which is smaller than the storage capacity of sensory
memory itself
It should have been much easier for you now. The difference is that in the
first case the number consists of ten digits that you have to remember, while
in the second case only three chunks, each consisting of three to four digits,
need to be remembered.
As we have just seen, this chunking interestingly works even when the
individual chunks have no special meaning. However, if the chunks have a
meaningful significance, the storage capacity of short-term memory can be
significantly further increased, as it can then be applied recursively, which is
also referred to as hierarchical chunking.
Try to remember these two characters:
你好
Working Memory
Working memory is a theoretical concept that plays a central role in cog-
nitive psychology, neuropsychology, and cognitive neuroscience. It is an
important cognitive system that allows us to temporarily store and manip-
ulate information. This system is essential for many higher cognitive func-
tions such as logical reasoning, problem-solving, decision-making, and
behavioral control.
Working memory consists of several subcomponents (Baddeley & Hitch,
1974), each of which has its own function (Fig. 7.2). The central executive
component is responsible for controlling attention and coordinating infor-
mation processing. Another component is the phonological loop, which is
responsible for the temporary storage of verbal information. It can be further
divided into the articulatory loop, which is responsible for the subvocal rep-
etition of verbal information, and the acoustic buffer, which is responsible
64
P. Krauss
Central executive
articulatory auditory
loop buffer
Long-term Memory
Long-term memory is essentially based on synaptic plasticity (Bailey &
Kandel, 1993) and allows us to store and retrieve information over longer
periods—from hours to decades. Two categories of this type of memory are
distinguished: declarative (explicit) memory and non-declarative, procedural
(implicit) memory (Eichenbaum & Cohen, 2004).
Declarative memory refers to the conscious retrieval of information and
can be further subdivided into semantic memory, which contains general
factual knowledge, and episodic memory, which contains specific events and
their sequence (Tulving, 1972, 1983). These do not necessarily have to be
events or episodes experienced by oneself, but can also be movies watched or
the plot of books read. A special case of episodic memory is autobiograph-
ical memory, which contains memories of specific events from one’s own
life (Conway & Pleydell-Pearce, 2000). The medial temporal lobe and the
hippocampus are important brain regions involved in the formation and
retrieval of declarative long-term memories (Milner, 1970; Zola-Morgan
& Squire, 1986); Squire, 1987, 1992; Gazzaniga et al., 2006). As we have
already seen, it is particularly the case that the hippocampus acts as a kind
of intermediate storage for declarative memories (Scoville & Milner, 1957;
Penfield & Milner, 1958) and transfers its information to the cortex during
deep sleep. This process is also referred to as memory consolidation (Squire,
1992; Conway & Pleydell-Pearce, 2000).
Non-declarative, procedural memory, on the other hand, refers to our
unconscious memory for skills, procedures, and motor abilities such as play-
ing the piano, swimming, or cycling. This type of memory often manifests
itself in our behavior and does not need to be consciously retrieved. The cer-
ebellum and the basal ganglia are key regions of the brain involved in the
formation and retrieval of procedural long-term memory (Graybiel, 1995;
Desmond & Fiez, 1998).
Conclusion
In summary, memory is an integral part of human cognition that allows
us to encode, manipulate, store, and retrieve information over time.
Understanding how memory functions as an information processing system
is crucial to understanding its role in our experience and behavior.
66
P. Krauss
The study of human memory has significant implications for the develop-
ment of Artificial Intelligence. If we understand how human memory works,
we can develop AI systems that learn and process information more effi-
ciently than previous systems. One of the most important areas of research
in this context is the development of working memory models for AI sys-
tems. Working memory is crucial for many cognitive tasks, including prob-
lem-solving and decision-making. By integrating a working memory into AI
systems, their abilities to handle complex tasks and make more accurate pre-
dictions could be significantly improved.
In addition, the study of long-term memory has implications for the
development of Artificial Intelligence that is supposed to learn and adapt
over time. Just as humans store and retrieve information over long periods,
these systems could be designed to store and retrieve large amounts of data
and use this information to improve their performance over time. This is
particularly important in areas such as natural language processing or real
image processing, where the systems constantly need to adapt to new data
and contexts.
Since human memory is a complex, multi-layered system that is not yet
fully understood, the development of AI systems that simulate human mem-
ory still poses a major challenge.
References
Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed sys-
tem and its control processes. In Psychology of learning and motivation (Vol. 2,
pp. 89–195). Academic: Elsevier.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In Psychology of learning
and motivation (Vol. 8, pp. 47–89). Academic, Elsevier.
Bailey, C. H., & Kandel, E. R. (1993). Structural changes accompanying memory
storage. Annual Review of Physiology, 55(1), 397–426.
Conway, M. A., & Pleydell-Pearce, C. W. (2000). The construction of autobiograph-
ical memories in the self-memory system. Psychological Review, 107(2), 261.
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsidera-
tion of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114.
Desmond, J. E., & Fiez, J. A. (1998). Neuroimaging studies of the cerebellum:
Language, learning and memory. Trends in Cognitive Sciences, 2(9), 355–362.
Eichenbaum, H., & Cohen, N. J. (2004). From conditioning to conscious recollection:
Memory systems of the brain (No. 35). Oxford University Press on Demand.
7 Memory
67
Squire, L. R., Knowlton, B., & Musen, G. (1993). The structure and organization
of memory. Annual Review of Psychology, 44(1), 453–495.
Tulving, E. (1972). Episodic and semantic memory. In Organization of memory
(pp. 381–402). Academic, Elsevier.
Tulving, E. (1983). Elements of episodic memory. Oxford University Press.
Watanabe, Y., & Funahashi, S. (2012). Thalamic mediodorsal nucleus and working
memory. Neuroscience & Biobehavioral Reviews, 36(1), 134–142.
Zola-Morgan, S., & Squire, L. R. (1986). Memory impairment in monkeys follow-
ing lesions limited to the hippocampus. Behavioral Neuroscience, 100(2), 155.
8
Language
Language is, by its nature, a communal thing, that is, it never expresses the exact,
but a compromise—that which is common to you, me, and everyone else.
Thomas Ernest Hulme
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 69
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_8
70
P. Krauss
(de Saussure, 1916). However, it is still very controversial to what extent lan-
guage skills are innate or need to be learned (Goodluck, 1991).
David Poeppel refers to as the Maps Problem and the Mapping Problem (see
also Chap. 5).
One of the most important processes in language comprehension is the
so-called parsing, in which the brain breaks down a sentence into its com-
ponents and assigns a meaning to each part. This requires working memory,
attention, and the ability to quickly process and integrate information from
various sources.
Language production involves a similar series of processes, in which the
brain first generates a message and then encodes it into a sequence of words
and grammatical structures. This process also requires the integration of vari-
ous information sources, including semantic knowledge, syntactic rules, and
social context (Kemmerer, 2014).
Language acquisition is a complex process in which various cognitive and
neural mechanisms such as attention, memory, and social perception inter-
act. It is assumed that children are particularly sensitive to language during
critical developmental phases and that contact with language during these
phases is crucial for the acquisition of language competence.
According to a new model proposed for understanding the functional
anatomy of language, the early stages of language processing take place in
the auditory areas located in the temporal lobe of the cortex on both sides of
the brain. Analogous to the visual system, which has a ventral “where” and
a dorsal “what” path of processing, language processing in the cortex also
divides into a ventral and a dorsal path (Hickok & Poeppel, 2004, 2007,
2016). While the “what” path is responsible for assigning sounds to mean-
ings, the “where” path maps sounds onto motor-articulatory representations
(where is the sound produced).
Of course, not only the cortex is involved in the processing of language.
The thalamus, the basal ganglia, and the cerebellum also play an important
role in language perception and production.
The thalamus, as a sensory relay station in the brain that receives and
processes incoming sensory information before it is forwarded to the cor-
responding cortical areas for further processing, is also involved in the pre-
processing of auditory information, including speech sounds.
The basal ganglia are involved in the planning and execution of speech
movements. They also play a role in the formation of grammatical structures
and in the selection of appropriate words during language production.
And finally, the cerebellum, traditionally associated with motor coordi-
nation and balance, is also involved in cognitive processes such as language
processing. It is responsible for the coordination of speech movements as
74
P. Krauss
well as for the timing of language production and plays an important role in
learning (foreign) languages.
Conclusion
The question of how humans acquire language has occupied science for
millennia. The study of human language and language development has
deepened our understanding of how language is acquired, processed, and
represented in the brain. Theories such as Chomsky’s Universal Grammar
and usage-based approaches offer different perspectives on language acquisi-
tion, while neuroscience research expands our knowledge of the underlying
brain mechanisms.
In the context of Artificial Intelligence and large language models like
GPT-4, these insights provide important knowledge for the development
and improvement of language processing systems. The study of human lan-
guage abilities and language development can help optimize the architecture
and learning mechanisms of such models and improve their ability to under-
stand and generate natural language.
Future research in linguistics and neuroscience could help further close
the gap between human and artificial language processing. The integration
of insights from various disciplines such as cognitive linguistics, neurosci-
ence, and AI research could lead to even more powerful and human-like lan-
guage models. Furthermore, such models could deepen our understanding
of human language and cognition by serving as tools for investigating lin-
guistic phenomena and cognitive processes.
References
Bybee, J. L., Perkins, R. D., & Pagliuca, W. (1994). The evolution of grammar:
Tense, aspect, and modality in the languages of the world (Vol. 196). University of
Chicago Press.
Bybee, J. L. (2013). Usage-based theory and exemplar representations of con-
structions. In Th. Hoffmann & G. Trousdale (Eds.), The Oxford Handbook of
Construction Grammar (online edn, 16 Dec. 2013), Oxford Academic. https://
doi.org/10.1093/oxfordhb/9780195396683.013.0004. Accessed 9 Aug 2023.
Chomsky, N. (2012). On the nature, use and acquisition of language. In Language
and meaning in cognitive science (pp. 13–32). Taylor and Francis.
Chomsky, N. (2014). Aspects of the theory of syntax (Vol. 11). MIT press.
8 Language
75
An age-old mystery
For more than 2000 years, the question of understanding consciousness has
been at the center of interest for many philosophers and scientists. Modern
philosophy distinguishes between an easy and a hard problem (Chalmers,
1995). While the easy problem consists of explaining the function, dynam-
ics, and structure of consciousness, the hard problem is to explain whether
and why any physical system, be it a human, an animal, a fetus, a cell orga-
noid, or an AI (Bayne et al., 2020), is conscious and not unconscious at all.
Throughout history, many different perspectives have been proposed, rang-
ing from the pessimistic view of “Ignorabimus”—which means as much as
“We will never know”1—to more optimistic mechanistic ideas that even aim
at the construction of an artificial consciousness. These different views have
led to ongoing debates and discussions about the nature of consciousness
and whether it can ultimately be understood or not.
1 Emildu Bois-Reymond made this statement at the 45th annual meeting of German natural scientists
and doctors in 1872 in his lecture on the limits of scientific knowledge, referring to the relationship
between brain processes and subjective experience.
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 77
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_9
78
P. Krauss
For Aristotle, the various forms in the physical world were simply different
physical states, and there was no non-physical or mental substance beyond
the physical world.
Dualism, on the other hand, assumes that both mental and physical sub-
stances are possible. This thought was first developed by Plato, who believed
that mind and body exist in two separate worlds. According to Plato’s idea,
the mind was part of the ideal world of forms, it was immaterial, non-ex-
tended, and eternal. The idea of an ideal circle, for example, exists in the
mind as a perfect concept, although no physical circle in the material world
can ever truly correspond to it. In contrast, the body belongs to the mate-
rial world, it is extended and perishable. Concrete physical circles could be
found in the material world, but they would always be imperfect and change.
“…imagine that one has webbing on one’s arms,which enables one to fly
around at dusk and dawn catching insects in one’smouth; that one has very
poor vision, and perceives the surrounding world by asystem of reflected
high-frequency sound signals; and that one spends the dayhanging upside
down by one’s feet in an attic. In so far as I can imagine this(which is not very
far), it tells me only what it would be like for me tobehave as a bat behaves.
But that is not the question. I want to know what itis like for a bat to be a
bat.” (Nagel, 1974).
processes that a bat uses to navigate and interact with its environment in
total darkness.
According to Nagel’s view, consciousness is not just about objective facts
or physical processes, but also about subjective experiences, i.e., what it is
like to be a particular organism. He suggests that we need to develop a new
kind of science that takes into account subjective experience if we hope to
fully understand the nature of consciousness.
He further argues that the subjective nature of consciousness undermines
any attempt to decipher it with objective and reductionist means, i.e., the
means of natural sciences. He believes that the subjective character of experi-
ence cannot be explained by a system of functional or intentional states.
He concludes that consciousness cannot be fully understood if its subjec-
tivity is ignored, as it cannot be explained reductionistically, as it is a mental
phenomenon that cannot be reduced to materialism. Nagel concludes with
the assertion that physicalism, while not wrong, is also only incompletely
understood, as it lacks the characterization of subjective experience. This, in
turn, is a necessary prerequisite for understanding the mind-body problem.
At its core, he thus represents a dualistic perspective, much like Plato two
and a half millennia before him.
(Crick & Koch, 2003; Koch, 2004). In other words, NCCs are the brain
processes necessary for the occurrence of consciousness. The concept of NCC
is based on the idea that there is a close relationship between neural activity
and conscious experience. When we perceive something consciously, there
are certain patterns of neural activity that are always associated with this
experience. By investigating these patterns of neural activity, we hope to gain
a better understanding of the mechanisms underlying conscious experience.
There are various approaches to identifying NCCs. One approach is to com-
pare brain activity during conscious and unconscious states such as sleep or
anesthesia. Another approach is to investigate the changes in neural activity
that occur when a person perceives a stimulus, e.g., a visual or auditory cue.
Francis Crick2 and Christof Koch proposed that brain waves with a fre-
quency range between 30 and 100 cycles per second, so-called gamma oscil-
lations, play a crucial role in the emergence of consciousness (Crick & Koch,
1990). Koch further developed this concept and investigated the neural
correlates of consciousness in humans (Tononi & Koch, 2008; Koch et al.,
2016). Accordingly, activity in the primary visual cortex is essential for con-
scious perception, but not sufficient, as activity in the hierarchically higher
cortical areas of the visual system is more closely correlated with the various
aspects of visual perception and damage to these areas can selectively impair
the ability to perceive certain features of stimuli (Rees et al., 2002)—a phe-
nomenon referred to as agnosia. We will return to this later in this chapter.
Furthermore, Koch suggests that the precise timing or synchronization of
neural activity may be much more important for conscious perception than
simply the extent of neural activity. Recent studies using imaging techniques
on visually triggered activity in parietal and prefrontal cortex regions seem to
confirm these hypotheses (Boly et al., 2017).
Based on the idea of the neural correlates of consciousness, a kind of
measurement procedure was even developed, which aims to determine the
degree of consciousness (Perturbational Complexity Index) e.g., in comatose
patients, independently of the activity associated with sensory and motor
processing (Seth et al., 2008; Casali et al., 2013; Casarotto et al., 2016).
While this approach may be suitable for quantifying the complexity of neu-
ral activity (Demertzi et al., 2019), it does not provide information about
the underlying neural circuits and the “algorithms” implemented in them.
2 This is, by the way, the same Francis Crick who deciphered the double helix structure of DNA in
Attention system
Long-term memory
Limbic system
Global
workspace
Motor system
Perception systems
Fig. 9.1 Global workspace. Various neural processes interact with each other and
exchange information. The global workspace acts as a central “stage” where con-
sciously perceived information is processed and coordinated
86
P. Krauss
Emotions are body states caused by so-called somatic markers, i.e., any-
thing that any receptors inside the body can measure, such as blood pressure,
heart rate, blood pH, blood sugar level, concentration of certain hormones,
and so on and so forth. The totality of all somatic markers, i.e., all measure-
ments from inside the body at a certain point in time, corresponds to the
body state. This is represented and controlled by a multitude of brain regions,
including the amygdala, the insula, and the prefrontal cortex. These brain
regions work together to represent the state of the organism and to evoke and
control the coordinated physiological reactions to environmental stimuli in a
feedback loop, e.g., changes in heart rate, breathing, and hormone levels.
Feelings, on the other hand, arise from the conscious perception and
interpretation of these physiological reactions. For example, the feeling of
fear can be evoked by the physiological reactions associated with the emo-
tion of fear, such as an increased heart rate and sweating. In Damasio’s view,
feelings give us a conscious representation of our bodily states and enable us
to understand and respond to the world around us.
Damasio also emphasizes the role of emotions and feelings in deci-
sion-making and behavior. He assumes that emotions provide us with
important information about the environment and contribute to steer-
ing our actions in an adaptive way. For example, the feeling of disgust can
help us avoid potentially harmful substances, while the feeling of love can
encourage us to seek social contacts.
Hierarchies of Consciousness
pronounced and differentiated. New functions are added through the pro-
gressive myelination and the associated functional integration of ever new
cortex areas into the existing nervous system.
own body. Here too, we can try to interpret the symptoms in the context of
Damasio’s consciousness model.
Since both the flow of sensory information from the body to the brain and
the neural structures responsible for generating the core self are not impaired
in Locked-in Syndrome, the core self remains intact. Despite the almost
complete lack of motor output, the brain still receives sensory input and
can thus continue to generate the core self. This allows the person to main-
tain a basic level of self-perception and consciousness. However, Locked-in
Syndrome interrupts the person’s ability to interact with the world through
motor actions, which can have long-term effects on the extended self (Kübler
& Birbaumer, 2008), as the extended self is based on autobiographical mem-
ories, personal identity, and the ability to plan and think about the future.
The inability to communicate and interact with the environment can lead
to difficulties in maintaining and updating the extended self. However, the
extended self does not completely disappear, as the individual is still able to
process and create new memories based on sensory experiences.
Agnosia is a neurological phenomenon in which people have difficulty
recognizing sensory information, even though the corresponding sensory
organs are intact. There are a wide variety of different forms of agnosia in
all sensory modalities. In visual agnosia, the recognition of visual stimuli is
disturbed, while vision is intact. Examples are prosopagnosia, where sufferers
can no longer recognize faces, object agnosia, which impairs object recogni-
tion, or color agnosia, where sufferers perceive the world in black and white
only. Particularly bizarre is akinetopsia, which impairs motion perception.
Patients suffering from it perceive the world only as a stroboscopic sequence
of single images. Depending on the form, aphasias impair various aspects
of language comprehension and speech. And there are many more of these
impairments of the extended self.
Of course, in all these cases, the affected individuals are fully conscious.
Their proto- and core self is completely intact. Only very specific functions
of their extended self are isolatedly impaired.
Conclusion
The mind-body problem is far from solved and remains the subject of
intense debates among philosophers, neuroscientists, and psychologists.
However, it is clear that our understanding of the relationship between mind
and body is essential for understanding the nature of human consciousness
and subjective experience.
9 Consciousness
91
References
Baars, B. J. (1994). A global workspace theory of conscious experience. In
Consciousness in Philosophy and Cognitive Neuroscience, (pp. 149–171). Erlbaum.
Baars, B. J. (2017). The global workspace theory of consciousness. In The Blackwell
Companion to Consciousness, (pp. 236–246). Wiley.
Baars, B. J., & Newman, J. (1994). A neurobiological interpretation of global work-
space theory. Consciousness in Philosophy and Cognitive Neuroscience, 211–226.
Bayne, T., Seth, A. K., & Massimini, M. (2020). Are there islands of awareness?
Trends in Neurosciences, 43(1), 6–16.
Boly, M., Massimini, M., Tsuchiya, N., Postle, B. R., Koch, C., & Tononi, G.
(2017). Are the neural correlates of consciousness in the front or in the back of
9 Consciousness
93
Tononi, G., Boly, M., Massimini, M., & Koch, C. (2016). Integrated information
theory: From consciousness to its physical substrate. Nature Reviews Neuroscience,
17(7), 450–461.
Tononi, G., & Koch, C. (2008). The neural correlates of consciousness: An update.
Annals of the New York Academy of Sciences, 1124(1), 239–261.
10
Free Will
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 97
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_10
98
P. Krauss
Fig. 10.1 Free will. The existence of free will is disputed in philosophy and neurosci-
ence. However, there is no doubt about the existence of the village of the same name
in the district of Schleswig-Flensburg in the north of Germany.
EEG amplitude
Readiness potential
Time (ms)
Fig. 10.2 The Libet Experiment. A subject performs a voluntary movement while his
brain activity is monitored using EEG. A clock-like device allows him to record the
time of his conscious decision. He then indicates the position of the rotating pointer
on the clock at the decision time. Finally, brainwave measurements and decision
reports are compared. Results suggest that unconscious processes already initiate
actions (−500 ms) before the conscious decision to do so is made (−200 ms) and the
action is then finally executed (0 ms)
fourth, the results of the brainwave measurement were compared with the
decision reports to determine whether the participants’ conscious decision
was indeed the cause of the voluntary movement (Fig. 10.2).
The most important result of Libet’s experiment was that the readiness
potential in the brain already increased significantly (500−300 milliseconds)
before the time when the participants reported that they had become aware
of their decision to perform the movement. Libet concluded from this that
unconscious processes in the brain initiate voluntary actions before con-
scious decisions are made.
Conclusion
The results of the Libet experiment seem to argue against the existence of
free will. They have therefore been widely discussed and have led to heated
controversies. Critics claim that the methodology of the experiment is
10 Free Will
101
flawed and the conclusions about free will are too deterministic. It is also
argued that the decision to perform a certain movement at a certain time
oversimplifies the nature of free will. For example, this simple action is not
comparable to freely deciding for or against a certain course of study, a cer-
tain place of residence, or a life partner.
From the author’s perspective, it would be highly strange if our (free)
decisions were completely independent of our genetic, neural, and autobi-
ographical history. Of course, important life decisions like the ones men-
tioned above are made against the backdrop of our personal experience and
life history, which in turn are stored in the brain as knowledge and memo-
ries (Hallet, 2007; Lau, 2009).
Imagine if Libet’s results had been exactly the opposite: The test subjects
report their decision to perform the movement before the corresponding neu-
ral activity is measurable. This result would have been much more disturbing,
as it would have implied that there are biology-independent mental processes
that are capable of causing changes in the brain. This would have been de
facto evidence of dualistic views on the relationship between brain and mind.
Some even went so far as to question the practice of punishment with
the argument that if the will is not free, the person cannot be held respon-
sible for their actions (Roth, 2010, 2012; Singer, 2020). This of course
overlooks that punishments can still be meaningful as they serve different
purposes and work on different levels (Feinberg, 1965; Van den Haag, 1975;
Jakobs, 2011; Kühl, 2017). First, victims of crimes, even if the existence of
free will is questioned, still have a need for justice, atonement, and repara-
tion. Second, resocialization, regardless of the question of free will, can con-
tribute to rehabilitating the offender and imparting the skills and knowledge
to become a constructive member of society. In this context, resocialization
focuses on changing behaviors and thought patterns that have led to crimes,
not on whether the offender consciously made these decisions. Third, the
punishment of offenders can serve as a deterrent for potential imitators by
clarifying the consequences of crimes. Even if people do not have free will,
they can still respond to incentives and deterrence. The sanction system can
thus help to reduce the frequency of crimes by deterring potential offend-
ers from criminal behavior. And finally, the imprisonment of an offender
can protect society from further crimes by this person. Even if the offender
does not have free will, he could still pose a danger to society if left free.
Thus, punishment and imprisonment of an offender can remain meaningful
and justified even if the existence of free will is questioned (Viney, 1982;
Stroessner & Green, 1990; Hallett, 2007; Mobbs et al., 2009; Hodgson,
2009; Vincent et al., 2011; Focquaert et al., 2013).
102
P. Krauss
References
Busch, P., Heinonen, T., & Lahti, P. (2007). Heisenberg’s uncertainty principle.
Physics Reports, 452(6), 155–176.
Dirac, P. A. M. (1925). The fundamental equations of quantum mechanics.
Proceedings of the Royal Society of London. Series A, Containing Papers of a
Mathematical and Physical Character, 109(752), 642–653.
Dirac, P. A. M. (1926). On the theory of quantum mechanics. Proceedings of the
Royal Society of London. Series A, Containing Papers of a Mathematical and
Physical Character, 112(762), 661–677.
Earman, J. (1986). A primer on determinism (Vol. 37). Springer Science & Business
Media.
Ekstrom, L. (2018). Free will. Routledge.
Feinberg, J. (1965). The expressive function of punishment. The Monist, 49(3),
397–423.
Focquaert, F., Glenn, A. L., & Raine, A. (2013). Free will, responsibility, and the
punishment of criminals. The future of punishment, 247–274.
Hallett, M. (2007). Volitional control of movement: The physiology of free will.
Clinical Neurophysiology, 118(6), 1179–1192.
Hallett, M. (2009). Physiology of volition. Downward causation and the neurobiol-
ogy of free will, 127–143.
Hallett, M. (2011). Volition: How physiology speaks to the issue of responsibility.
Conscious will and responsibility, 61–69.
Harris, S. (2012). Free will. Simon and Schuster.
Hodgson, D. (2009). Criminal responsibility, free will, and neuroscience.
Downward causation and the neurobiology of free will, 227–241.
Jakobs, G. (2011). Strafrecht: Allgemeiner Teil. de Gruyter.
James, W. (1884). The dilemma of determinism (pp. 1878–1899). Kessinger
Publishing.
Kane, R. (Hrsg.). (2001). Free will. Wiley.
Koch, C. (2009). Free will, physics, biology, and the brain. Downward causation
and the neurobiology of free will, 31–52.
Kožnjak, B. (2015). Who let the demon out? Laplace and Boscovich on determin-
ism. Studies in History and Philosophy of Science Part A, 51, 42–52.
10 Free Will
103
Despite all the hype and excitement about AI, it is still very limited
compared to human intelligence.
Andrew Ng
History of AI
The idea that human intelligence or the cognitive processes that humans use
could be mechanized or automated is very old. One of the first mentions
of this idea can be found in Julien Offray de La Mettrie’s work L’Homme
Machine, published in 1748. Another theoretical precursor of AI is the
Laplace’s Demon, which we have already encountered in the chapter on free
will. It is named after the French mathematician, physicist, and astronomer
Pierre-Simon Laplace. The concept is based on the idea that the entire uni-
verse functions like a mechanical machine, similar to a clock, and that the
human mind and intelligence function in the same way.
Artificial Intelligence (AI) is based on the belief that human thinking can
be structured and systematized. The roots of this concept go back to the 1st
millennium BC, when Chinese, Indian, and ancient Greek philosophers
developed structured techniques for formal reasoning. Later, in the seven-
teenth century, philosophers like René Descartes and Gottfried Wilhelm
Leibniz tried to formalize rational thinking and make it as precise as algebra
or geometry. They considered thinking as equivalent to the manipulation
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 107
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_11
108
P. Krauss
of symbols. This model later served as the basis for research in the field of
Artificial Intelligence.
The study of mathematical logic played a crucial role in the development
of AI in the twentieth century. One of the most important contributions
came from George Boole and Gottlob Frege, who laid the foundations for
the formal manipulation of symbols. These works established a set of rules
and principles for the manipulation of symbols in a logical system, which
proved essential for the development of intelligent machines.
Another important contribution was the Church-Turing thesis, a funda-
mental concept in computer science. It states that any mathematical prob-
lem that a human can solve with paper and pencil can also be solved by a
machine, specifically a mechanical machine that can process zeros and ones.
This concept laid the foundation for the development of digital computers
and the theory of computing.
The Turing Machine, a theoretical computing model proposed by
Alan Turing, was a crucial breakthrough in the development of Artificial
Intelligence. It is a simple and abstract model that captures the essential
features of any mechanical machine capable of performing abstract sym-
bol manipulations. The Turing Machine allowed researchers to formalize
the concept of computing and develop algorithms that can be executed by
machines.
The Dartmouth Conference, which took place in the summer of 1956
at Dartmouth College in Hanover, New Hampshire, is considered the
founding event of the academic field of Artificial Intelligence (AI). The
conference was a six-week workshop titled “Dartmouth Summer Research
Project on Artificial Intelligence,” organized by John McCarthy as part of a
research project funded by the Rockefeller Foundation. The term “Artificial
Intelligence” was used for the first time in the announcement of this con-
ference. Some of the brightest minds in computer science and related fields
attended the conference, including McCarthy himself, Marvin Minsky,
Nathaniel Rochester, and Claude Elwood Shannon, the founder of informa-
tion theory. During the conference, the participants discussed the possibility
of creating machines that can think and argue like humans. They exam-
ined various AI approaches such as logic-based systems, neural networks,
and heuristic search algorithms. They also discussed the ethical implications
of developing intelligent machines and the question of whether it is even
possible to develop truly intelligent machines. The Dartmouth Conference
established AI as an academic discipline and provided a roadmap for future
research in this field.
11 What is Artificial Intelligence?
109
Concept Clarification
Artificial Intelligence generally refers to machines capable of intelligent
behavior (Russell, 2010). Unlike the natural intelligence of animals and
humans, which is based on biological processes, AI is based on algorithms
and software developed by humans. AI systems are programmed to perform
tasks that normally require human intelligence, such as pattern recognition,
learning from experience, decision-making, and problem-solving. An impor-
tant aspect of AI systems is their ability to be autonomous and adaptable.
They can learn from experiences and feedback and improve over time. This
process is also referred to as Machine Learning and includes various tech-
niques such as supervised learning, unsupervised learning and reinforcement
learning. There are many different types of AI systems, each specialized in
their own way. Natural language processing, for example, allows machines to
understand and respond to human language (Görz & Schneeberger, 2010;
Russell, 2010).
Expert systems use knowledge in the form of symbols and rules to solve
problems and make decisions. Autonomous robots are another example of
AI systems specialized in physical actions. They can perform tasks such as
navigation, object manipulation, and interaction with the environment.
Multi-agent systems, on the other hand, are a type of swarm of autonomous
agents that work together to achieve a common goal (Görz & Schneeberger,
2010; Russell, 2010).
When artificial intelligence is discussed in the media and politics today, it
usually refers to only a relatively small part of these approaches. All the spec-
tacular successes and breakthroughs of the last one or two decades—whether
it’s composing music, playing Go, generating text, or classifying images—are
based on machine learning and pattern recognition, a subfield of AI, and
here in particular on deep learning, which is only a part of machine learn-
ing. In public discourse, the terms artificial intelligence, machine learning,
and deep learning are often used synonymously today (Fig. 11.1). Often in
the context of AI, “the algorithms” are mentioned. An algorithm is similar to
110
P. Krauss
Artificial intelligence
Machine learning /
pattern recognition
Deep
learning
The human brain also uses pattern recognition to process and interpret
sensory impressions. It has specialized regions responsible for certain types
of pattern recognition. The brain’s ability to learn patterns and adapt to new
patterns is crucial for intelligent, goal-directed behavior and our ability to
interact with the world.
Deep Learning
Deep Learning is a branch of Machine Learning that focuses on artificial
neural networks with multiple layers of interconnected neurons. The more
layers a neural network has, the “deeper” it is. Modern architectures can
have hundreds of layers and are capable of processing and learning from
large amounts of data by recognizing complex patterns and establishing
abstract connections. One of the greatest achievements of Deep Learning
is the ability to train neural networks to process large amounts of unstruc-
tured data, such as images or voice recordings, and recognize complex cor-
relations and patterns. This has led to breakthroughs in many application
areas, including language processing, image recognition and segmentation,
and has enabled advances in medical research, natural sciences, and other
fields (Schmidhuber, 2015; LeCun et al., 2015; Goodfellow et al., 2016).
The main advantage of Deep Learning is the ability to recognize and learn
meaningful patterns in complex and unstructured data without the need for
human expertise. Deep Learning models can even work with data that is dif-
ficult or impossible for the human mind to interpret, such as digital X-rays.
This makes Deep Learning an enormously powerful tool in many fields.
Despite the incredible advances in recent years, there are still some chal-
lenges in the development and use of Deep Learning models. One of the
biggest problems is the difficulty of interpreting and explaining the work-
ings of deep neural networks. It is often hard to understand why a particular
model made a certain decision or how it recognized certain patterns.
Conclusion
The successes of AI have been remarkable in recent years and cover a wide
range of applications. One of the most impressive advances is the ability of
AI systems to understand human language and process natural language.
This has led to significant advances in speech recognition and translation.
AI has also made progress in image processing by being able to recognize
112
P. Krauss
and analyze images. This has contributed to the development of face recog-
nition, object detection, and medical imaging. Many practical applications
of Artificial Intelligence (AI) are often integrated so quickly into everyday
products that they are no longer perceived as AI. An example of this is text
recognition, which is now standard in many smartphones and is taken for
granted.
Because of this phenomenon known as the AI effect, it may seem that AI
research is only struggling with difficult problems that it has not yet solved,
such as the ability to make complex decisions in dynamic environments, or
the problem that it is difficult to interpret and explain the workings of deep
neural networks.
A famous quote by computer scientist Larry Tesler, “Intelligence is what
machines have not yet done,” expresses this.
References
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learn-
ing (Vol. 4, No. 4, p. 738). Springer.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Görz, G., & Schneeberger, J. (Eds.). (2010). Handbuch der künstlichen Intelligenz.
De Gruyter.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553),
436–444.
Russell, S. J. (2010). Artificial intelligence a modern approach. Pearson Education,
Inc.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural
networks, 61, 85–117.
12
How Does Artificial Intelligence Learn?
After my training as a computer scientist in the 90s, everyone knew that AI did not
work. People tried. They tried with neural networks, but nothing worked.
Sergey Brin
Artificial Neuron
As in the brain, the neuron is also the fundamental processing unit in many
areas of AI. Artificial neurons are simplified mathematical models of their
biological counterparts. They receive their input in the form of real num-
bers through several input channels. As in the brain, this input can either
come “from outside” or from other neurons. Each input channel has a
weight, which corresponds to the synapse in biology, with which the respec-
tive input is multiplied. Subsequently, all weighted inputs of the neuron are
summed up. Since all inputs of a neuron can be summarized into an input
vector and analogously all corresponding weights can be summarized into
a weight vector, the weighted sum of the inputs corresponds to the scalar
product of the input and weight vector. This then goes into an activation
function to determine the output of the neuron. In the McCulloch-Pitts
neuron (McCulloch & Pitts, 1943), which represents the earliest and sim-
plest artificial neuron, the activation function corresponds to a threshold
function, i.e., it is compared whether the weighted sum of the inputs is
greater or smaller than a certain threshold. If it is greater, the neuron sends
an output (a one), otherwise not (which corresponds to a zero). Thus, the
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 113
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_12
114
P. Krauss
Output
Artificial
neuron
Fig. 12.1 Artificial Neuron. The basic processing unit in Artificial Intelligence.
Analogous to its biological model, it takes input x through several channels and
multiplies it with associated weights w, similar to the synapses in the brain. These
weighted inputs are summed up and processed through an activation function to
determine the output y of the neuron. In simple models like the McCulloch-Pitts neu-
ron, the activation function is a threshold function, leading to binary outputs (0 or 1).
More complex models use advanced activation functions for continuous outputs
neurons. This matrix can either contain all pairwise weights of all neurons
of the network including all self-connections (Autapses). In this case, one
speaks of a complete weight matrix. However, a weight matrix can also only
contain the forward-directed weights between the neurons of two successive
layers of a network.
In the simplest case, the network consists only of an input and an output
layer. These networks, also known as Perceptron (Rosenblatt, 1958), calcu-
late a weighted sum of the inputs and then apply an activation function to
generate the output. Due to this simple architecture, these neural networks
can only learn linear classifications, i.e., the data in the input space must be
linearly separable (Figs. 12.2 and 12.3).
Due to their linear character, two-layer networks are only limitedly able
to solve classification tasks. A famous example that illustrates the limits of
two-layer neural networks is the XOR problem. The XOR function (eXclu-
sive OR) is a binary operation that outputs 1 when the two input values are
different and 0 when they are the same. It turns out that this problem is not
linearly solvable and thus cannot be learned by two-layer networks.
To overcome this limitation and solve problems with more complex deci-
sion boundaries, multi-layer neural networks with one or more hidden layers
can be used. These hidden layers allow the network to learn nonlinear trans-
formations of the input data, so they can capture more complex relation-
ships and decision boundaries, making them ultimately more versatile and
powerful for various tasks.
In fact, the so-called universal approximation theorem states that in prin-
ciple, a neural network with a single hidden layer and a finite number of
Fig. 12.2 Two-layer network. The perceptron consists only of an input and an out-
put layer. The weight vectors are summarized as a weight matrix
116
P. Krauss
Fig. 12.3 The XOR problem. A two-layer neural network cannot solve the XOR prob-
lem. There is no suitable set of weights and threshold, as the two solutions (0, 1)
are not linearly separable, i.e., the decision boundary between the classes is not a
straight line. However, the logical AND can be solved with a simple two-layer net-
work, as the solutions here are linearly separable
Supervised Learning
For supervised learning, labeled data, i.e., label-data pairs, are required. An
application example is image classification. Here, there must be a label for
each image with information about what is seen in the image or to which
12 How Does Artificial Intelligence Learn?
117
Fig. 12.4 Multi-layer neural network. Artificial neural networks are complex struc-
tures built from artificial neurons and interconnected. In such a structure, several
neurons can be present in one layer, connected to other neurons via weights. Hidden
layers or intermediate layers are located between the input and the output layer.
Deep learning refers to the use of deep neural networks, which consist of a large
number of hidden layers. The more layers a neural network has, the deeper it is. All
these networks are feedforward architectures, i.e., forward-directed networks, as the
input flows in one direction without feedback loops, namely from the input to the
output layer
category the image belongs. In an image dataset, “cat”, “apple” or “car” can
be possible labels. The labels correspond to the desired output of the model
in supervised learning.
For the training of the neural network, the images are first broken down
into individual pixels, with the color or grayscale values coded as numerical
values serving as input channels for the first layer of the neural network. In a
fully connected network, each input channel is connected to each neuron of
the next layer. Activation is achieved by applying the weights to the inputs.
This process can be repeated as often as desired in the various layers. At the
end of the neural network is the output layer, which makes a decision, e.g.,
the classification of the image. If the network is already trained and has
learned to classify a certain image correctly, the task is completed. However,
if the network is not yet sufficiently trained, an error may occur. For exam-
ple, in the output layer, the “apple” neuron could be most strongly activated,
even though the input layer received an image of a “banana” as input.
During training, the desired output is compared with the actual output
and an error is calculated using the loss or cost function. Since it is similar
to a teacher monitoring the performance of the network and returning cor-
responding error messages, this learning paradigm is referred to as supervised
learning. The desired output for a given input, such as the appropriate label
(apple, banana) for an image, is usually provided by humans. You are prob-
ably familiar with these CAPTCHAs (completely automated public Turing
test to tell computers and humans apart) on certain websites, where you are
asked to confirm that you are not a robot. For example, you are asked to
118
P. Krauss
click on all images showing a traffic light. In this way, large amounts of labe-
led data for supervised learning are generated.
The goal of supervised learning is to minimize the cost function, i.e., to
reduce the sum of all errors. To do this, the individual errors are propagated
backwards from the output layer to the input layer through all layers. This
process is therefore referred to as Error-Backpropagation and allows for a
readjustment of the synaptic weights. This means that the errors are used to
change the weights between two layers in such a way that the correct output
is produced with a slightly higher probability the next time. This is done
by calculating so-called gradients, which indicate the directions in which
the weights need to be changed. This basic technique of machine learning is
therefore also referred to as Gradient Descent (Schmidhuber, 2015; LeCun
et al., 2015; Goodfellow et al., 2016).
In a figurative sense, the error landscape corresponds to a mountain range
with many peaks and valleys. The higher you are, the greater the total error
of the network. The gradients indicate at each point the direction in which
it goes down steepest. By taking a small step in the direction of the steepest
descent at each time step, you eventually end up in a valley, which corre-
sponds to a lower total error of the network.
At the end of the training, the test accuracy of the neural network is usu-
ally determined. This is done by calculating the ratio of correctly predicted
or classified objects to the total number of objects in the dataset. For exam-
ple, if a model trained to classify images correctly classifies 90 out of 100
images, then the test accuracy of the model is 90%.
Usually, the entire available dataset is randomly divided into a training
and a test dataset, usually in a ratio of 80 to 20, a practice referred to as
dataset splitting. The idea behind this is that you want to test how well the
neural network can generalize, i.e., how well it can handle previously unseen
data. The network is therefore trained exclusively with the training dataset.
At the end of the training, the test dataset is used to determine the test accu-
racy or accuracy of the neural network.
Unsupervised Learning
In contrast to supervised learning, no labeled data is required for unsuper-
vised or self-supervised learning.
Unsupervised learning is about extracting patterns and structures from
data without the need for labeled data. This type of learning typically
12 How Does Artificial Intelligence Learn?
119
includes tasks such as clustering, where similar data points are grouped, and
dimensionality reduction, where high-dimensional data is represented in a
low-dimensional space while preserving important information. Examples of
unsupervised learning algorithms include K-Means clustering, hierarchical
clustering, and Principal Component Analysis (PCA). Since they belong to
the field of machine learning but do not represent artificial neural networks
and thus are not the focus of this book, we do not want to delve further into
the presentation of these methods (MacKay, 2003; Bishop & Nasrabadi,
2006).
Self-supervised Learning
Self-supervised Learning is on the one hand a special case of unsupervised
learning and on the other hand a category of learning methods in its own
right (Liu et al., 2021). In this case, the algorithm generates its own super-
vision signal (label) from the input data. For example, the model is trained
to predict or reconstruct certain aspects of the data, e.g., the next image in
a video or the context of a specific word in a sentence. In self-supervised
learning, the data itself provides the labels or the supervision signal, with
which the parameters of the model are updated during training. In this way,
it is possible to learn from large amounts of unlabeled data, which are often
much more extensive than labeled data. By learning to extract useful rep-
resentations from the data, self-supervised learning can be used to improve
the performance of a wide range of downstream tasks, including classifica-
tion, object detection, and language understanding.
Often, the neural network is trained on an auxiliary task using the
self-generated labels, which should help it, for example, in learning useful
representations. After learning the auxiliary task, the learned representations
can be used for downstream tasks.
An example of self-supervised learning is the task of predicting the next
word in a sentence (or a masked word) in the context of natural language
processing. In this case, the model generates its own supervision signal by
using the surrounding words as context and thus learns useful language rep-
resentations. In this way, so-called word vectors are generated, which form
the basis of ChatGPT and Co. More on this in the chapter on language-tal-
ented AI.
Another example of self-supervised learning are autoencoders, which are
also referred to as encoder-decoder networks. This type of neural networks
consists of two parts. In the encoder, the layers from the input layer to the
120
P. Krauss
Reinforcement Learning
In contrast to all previous learning methods, reinforcement learning is a
type of machine learning where a model or agent is trained to learn useful
input-output functions, i.e., to make a series of decisions in an uncertain
environment that maximize a cumulative reward (Dayan & Niv, 2008; Li,
2017; Sutton & Barto, 2018; Botvinick et al., 2019; Silver et al., 2021).
In contrast to supervised learning, no outputs or labels are provided to the
model. Instead, the agent receives feedback in the form of rewards or penal-
ties after each action. The goal of the algorithm is to learn a strategy (Policy)
that maps states of the environment to actions in such a way that it leads to
a maximum reward in the long term.
The agent uses trial and error to learn from its experiences in the environ-
ment and randomly tries out different actions to find out which actions lead
to the highest rewards in certain situations, by increasingly using actions
that have already proven to be successful. Over time, the agent’s strategy is
continuously refined and optimized, so that it can make better decisions and
achieve higher rewards.
1 https://www.nytimes.com/2012/06/26/technology/in-A-big-network-of-computers-evidence-of-ma-
chine-learning.html
122
P. Krauss
Conclusion
Supervised learning requires labeled data to train models that can correctly
classify input data such as images. In contrast, both unsupervised and
self-supervised learning can process data without labels. In unsupervised
learning, the focus is on discovering patterns and structures in the data,
while self-supervised learning essentially generates its own labels from the
data and thus learns useful representations by solving auxiliary tasks.
In the case of the autoencoder, each input pattern is simultaneously its
own label, as it should be restored as accurately as possible from the com-
pression of the bottleneck layer. The input and desired output are therefore
identical. If the image of an apple is input, the desired output is this image
of the apple. In predicting, for example, the next element to a certain input
sequence (video or text), the next image or word is essentially the label of
this sequence. Training is then again, as in the supervised case, with error
backpropagation and gradient descent.
In reinforcement learning, training is focused on making decisions in an
uncertain environment that maximize a cumulative reward, without using
predefined outputs or labels. Instead, the agent uses trial and error to learn
which actions yield the highest rewards, and refines its strategy over time.
The concept originally comes from psychology and the study of animal
behavior.
While we have made considerable progress in artificial intelligence and
machine learning, it is important to emphasize that we are still at the begin-
ning. Neuroscience has the potential to inspire new learning paradigms by
12 How Does Artificial Intelligence Learn?
123
giving us deep insights into the workings of the human brain. For exam-
ple, insights from studies on brain plasticity could lead to new approaches
for learning and adapting neural networks. Similarly, investigations into
the mechanisms the brain uses to process complex sensory data could lead
to new architectures for deep neural networks that are more efficient and
robust. Furthermore, understanding the neural processes involved in deci-
sion-making and problem-solving could provide new approaches for rein-
forcement learning and optimization.
Finally, deeper investigations into the coordination and communication
between different brain regions could contribute to the development of
improved methods for training and coordinating larger ensembles of sev-
eral different neural networks develop. Autoencoder with an encoder and a
decoder part or Generative Adversarial Networks, which we will get to know
more closely in the chapter on generative AI, are examples of ensemble net-
works made up of two subnetworks. In principle, however, ensembles of
many subnetworks specialized in various tasks are also conceivable.
References
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies
with gradient descent is difficult. IEEE transactions on neural networks, 5(2),
157–166.
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learn-
ing (Vol. 4, No. 4, p. 738). Springer.
Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., & Hassabis,
D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences,
23(5), 408–422.
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.
Mathematics of Control, Signals and Systems, 2(4), 303–314.
Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the
ugly. Current Opinion in Neurobiology, 18(2), 185–196.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Kurdi, B., Gershman, S. J., & Banaji, M. R. (2019). Model-free and mod-
el-based learning processes in the updating of explicit and implicit evaluations.
Proceedings of the National Academy of Sciences, 116(13), 6035–6044.
Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint
arXiv:1701.07274.
Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., & Tang, J. (2021). Self-
supervised learning: Generative or contrastive. IEEE Transactions on Knowledge
and Data Engineering, 35(1), 857–876.
124
P. Krauss
Le, Q. V., Monga, R., Devin, M., Corrado, G., Chen, K., Ranzato, M. A., … &
Ng, A. Y. (2012). Building high-level features using large scale unsupervised learn-
ing. arXiv preprint. arXiv:1112.6209.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553),
436–444.
MacKay, D. J. (2003). Information theory, inference and learning algorithms.
Cambridge University Press.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in
nervous activity. The Bulletin of Mathematical Biophysics, 5, 115–133.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant
visual representation by single neurons in the human brain. Nature, 435(7045),
1102–1107.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information stor-
age and organization in the brain. Psychological Review, 65(6), 386.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural
Networks, 61, 85–117.
Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough.
Artificial Intelligence, 299, 103535.
Skinner, B. F. (1963). Operant behavior. American Psychologist, 18(8), 503.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT
press.
13
Game-playing Artificial Intelligence
If there are intelligent beings on other planets, then they play Go.
Emanuel Lasker
Video Games
The development of reinforcement learning was a major breakthrough in
artificial intelligence. In a groundbreaking work, the authors proposed com-
bining reinforcement learning with so-called convolutional networks for
image recognition, thereby learning directly from pixel values as input the
actions to control a video game (Mnih et al., 2015). Specifically, the neural
network received the last five images of the ongoing game and the score as
input. The task was to generate the control commands for the controller or
joystick, such as “left”, “right”, “up”, “down” or “fire”. The goal of the neural
network was, as usual in reinforcement learning, to maximize the reward,
i.e., the score.
The researchers trained their system on various Atari games and showed
that a single algorithm without task-specific knowledge can achieve human-
level performance. The AI system surpassed all previous methods of rein-
forcement learning and in some cases even the performance of human
experts.
However, it turned out that the system was particularly good at action
games that do not require a forward-looking strategy and where success
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 125
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_13
126
P. Krauss
does not depend on the previous course of the game, such as Video Pinball,
Boxing or Breakout. All these games are pure reaction games, which can be
won by purely tactical action (pressing the right button at the right time). In
this type of game, the neural network reached human level or above. In con-
trast, the system completely failed at comparatively simple strategy games
like Montezuma’s Revenge or Ms. Pac-Man, where you have to navigate
through a manageable world and solve certain tasks.
Nevertheless, the impact of the work on the development of AI was con-
siderable. It marked a milestone in the research on reinforcement learning
and showed that deep learning techniques can be combined with reinforce-
ment learning algorithms to achieve unprecedented performance in a variety
of tasks.
Go and Chess
The Asian board game Go, whose origins date back to ancient China, is
considered the most complex strategy game ever. Its basic rules are relatively
simple—much simpler than in chess—and can be learned within a few min-
utes. The complexity of the game arises from the sheer number of possible
positions, which is estimated at 10170 and thus exceeds the number of possi-
ble positions in chess (approx. 1040) by many orders of magnitude.
In recent decades, chess computers have achieved impressive perfor-
mances and are now able to beat even the best human players. In contrast,
Go computers until recently still had difficulties winning against even medi-
ocre human players. This is because the brute-force method used in chess in
Go is not applicable. In brute force, all possible moves for a certain number
of moves in the future are simulated and the resulting positions are evalu-
ated. The chess computer then selects the best move. Since there is a rela-
tively limited number of moves and possible positions in chess, a computer
with sufficient computing power can successfully apply this technique.
In contrast, brute force is ruled out in Go due to the astronomically high
number of possibilities. There are more possible positions on the Go board
than atoms in the universe, which means that even the most powerful com-
puters currently would not be able to calculate all possible moves. Instead,
Go computers must rely on a combination of heuristic techniques and
machine learning to determine their moves.
Another difference between chess and Go is that chess can produce rel-
atively simple positions after just a few moves, while Go can still be very
complex even after many moves. After only two or four moves in chess,
13 Game-playing Artificial Intelligence
127
Fig. 13.1 The game Go. The board game, whose origins date back to ancient China,
is considered the most complex strategy game ever, although the basic rules are rel-
atively simple, for example compared to chess. This is mainly due to the sheer num-
ber of possible positions, which exceeds that of chess by many orders of magnitude.
Developing an AI that can master the game at an advanced human level was long
considered unattainable
there are usually only a few possible positions, while in Go there are already
thousands of possible positions after two or four moves. This makes Go an
even greater challenge for computers, as they not only have to calculate the
best moves, but also have to analyze and evaluate the complexity of the posi-
tions (Fig. 13.1).
Conclusion
The success of AlphaGo and AlphaGo Zero demonstrates how far AI tech-
nology has come and what possibilities are opening up for the future. The
complex game of Go serves as a vivid example of the impressive capabilities
of artificial intelligence and the progress that has been made in this area.
13 Game-playing Artificial Intelligence
129
However, cognitive scientist Gary Marcus points out that a large part of
human knowledge has flowed into the development of AlphaZero. And he
suggests that human intelligence seems to include some innate abilities, such
as the intuitive ability to develop language. He advocates considering these
innate abilities in the development of future AI systems, i.e., using a priori
knowledge, instead of always starting training from scratch (Marcus, 2018).
Josh Tenenbaum, a professor at the Massachusetts Institute of Technology
who also deals with human intelligence, argues similarly and says that we
should study human flexibility and creativity if we want to develop real arti-
ficial intelligence at a human level. He highlighted, among other things,
the intelligence of Demis Hassabis and his colleagues at the company
DeepMind, who conceived, designed, and created AlphaGo in the first place
(Lake et al., 2017).
References
Choi, S., Kim, N., Kim, J., & Kang, H. (2022). How does AI improve human
decision-making? Evidence from the AI-powered Go program. Evidence from the
AI-powered Go program (April 2022). USC Marshall School of Business Research
Paper Sponsored by iORB, No. Forthcoming.
Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Castaneda,
A. G., … & Graepel, T. (2019). Human-level performance in 3D multiplayer
games with population-based reinforcement learning. Science, 364(6443),
859–865.
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building
machines that learn and think like people. Behavioral and Brain Sciences, 40,
e253.
Marcus, G. (2018). Innateness, alphazero, and artificial intelligence. arXiv preprint.
arXiv:1801.05667.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
… & Hassabis, D. (2015). Human-level control through deep reinforcement
learning. Nature, 518(7540), 529–533.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G.,
… & Hassabis, D. (2016). Mastering the game of Go with deep neural networks
and tree search. Nature, 529(7587), 484–489.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., …
& Hassabis, D. (2017). Mastering the game of Go without human knowledge.
Nature, 550(7676), 354–359.
Zhang, H., & Yu, T. (2020). AlphaZero. Deep reinforcement learning: Fundamentals,
research and applications (pp. 391–415). Springer.
14
Recurrent Neural Networks
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 131
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_14
132
P. Krauss
flows through the network and returns to its source, it can strengthen the
connections between the involved neurons, leading to the consolidation of
memories. This process is facilitated by synaptic plasticity, i.e., the ability of
synapses to change their strength over time depending on their activity.
In addition, the recurrent connectivity in the brain contributes to the
integration of sensory information from different modalities, which is essen-
tial for perception and decision-making. For example, when we simulta-
neously see and hear something, the brain’s recurrent circuits allow these
separate inputs to be merged into a coherent representation of the external
environment. This multisensory integration enables us to respond appropri-
ately to our environment and make informed decisions.
In addition to memory formation and sensory integration, recurrent con-
nections in the brain also play a crucial role for attention and consciousness.
By selectively amplifying or inhibiting certain signals in feedback loops, the
brain can modulate its attention to specific stimuli and allocate cognitive
resources accordingly. This mechanism is important to filter out irrelevant
information and maintain cognitive flexibility.
LSTMs
Long-Short-Term Memories (LSTMs) are recurrent neural networks that
are specifically designed to store information over long periods of time and
selectively forget (Hochreiter & Schmidhuber, 1997). The core idea behind
LSTMs is the use of a memory cell that can store information over a longer
period of time. The memory cell is updated by a series of gate mechanisms
that control the flow of information into and out of the cell. The gates are
trained to learn which information to store, forget, or update at each time
step. LSTMs have been successfully used for a variety of tasks, including
speech recognition, machine translation, and automatic caption genera-
tion. They have also been used in combination with other neural network
architectures such as convolutional networks to create more powerful mod-
els for tasks like object recognition and classification in images and video
sequences.
Elman Networks
Elman Networks are another type of recurrent neural networks, proposed by
Jeffrey Elman (Elman, 1990). In the simplest case, these are three-layer net-
works with an input, intermediate, and output layer, with the intermediate
layer being extended by a so-called context layer. This context layer stores
the state of the intermediate layer from the previous time step and passes it
on to the intermediate layer. Thus, the intermediate layer receives the new
input from the input layer and additionally its own activation state from the
previous time step. Therefore, Elman Networks are also capable of process-
ing input sequences and generating output sequences (Fig. 14.1).
Output layer
Hidden layer
Context layer
Input layer
Fig. 14.1 Elman Networks. Named after their inventor Jeffrey Elman, they are a type
of recurrent neural networks. They consist of three layers—input, intermediate, and
output layer—and are characterized by an additional context layer that stores the
state of the intermediate layer from the previous time step and passes it on. As a
result, Elman Networks can process input sequences and generate output sequences
w, wij = wji applies. Also, wii = 0, i.e., there are no self-connections. Hopfield
networks exhibit pronounced attractor dynamics. Attractors are stable states
towards which a dynamic system moves over time. Hopfield networks can
store patterns as attractors and denoise or complete them upon re-presenta-
tion, i.e., the network activity converges (usually in one time step) into the
attractor most similar to the input.
Highly recurrent neural networks, where the weights are no longer sym-
metric (reservoirs), represent complex dynamic systems, which are even
capable of continuous activity without external input (Krauss et al., 2019).
As we will see in the next section, recurrent neural networks are difficult to
train, with reservoirs being the most difficult to train. A radically new approach,
motivated by findings in brain research (Reservoir Computing), even com-
pletely refrains from training these networks. But more on that later in Part IV.
Fig. 14.2 Highly recurrent neural network. In this type of network, the ordered layer
structure is completely abolished. In principle, each neuron can be connected to any
other—even reciprocally. These networks are complex dynamic systems, which are
capable of continuous activity even without input
the training process over time and converted into a deep feedforward net-
work with shared weights (Werbos, 1990). This unfolded network repre-
sents the computations of the RNN over several time steps. The depth of
the unfolded network depends on the length of the input sequence and the
number of time steps processed by the RNN. The backpropagation algo-
rithm is then applied to the unfolded network and the gradients are calcu-
lated for each time step.
Conclusion
Recurrence in biological neural networks forms the basis for understanding
the functioning of artificial recurrent neural networks (RNNs). By mim-
icking the recurrent architecture of the brain, RNNs can efficiently process
or generate temporal information and sequences, making them particularly
suitable for tasks such as natural language processing, time series analysis,
and speech recognition.
Although RNNs are simplified abstractions of the complex recurrent
structures of the brain, they provide valuable insights into the principles
underlying the brain’s information processing capabilities.
However, recurrent networks are also more difficult to train than pure
feedforward architectures. As we will see in the chapter on language-talented
AI, recurrence has been completely replaced by another mechanism in large
language models like ChatGPT, which is significantly easier and therefore
faster to train.
References
Braitenberg, V., & Schutz, A. (1991). Anatomy of the cortex: Studies of brain func-
tion. Springer.
Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179–211.
Hanin, B. (2018). Which neural net architectures give rise to exploding and vanish-
ing gradients? Advances in neural information processing systems, 31.
Hochreiter, S. (1998). The vanishing gradient problem during learning recurrent
neural nets and problem solutions. International Journal of Uncertainty, Fuzziness
and Knowledge-Based Systems, 6(2), 107–116.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Computation, 9(8), 1735–1780.
14 Recurrent Neural Networks
137
Hopfield, J. J. (1982). Neural networks and physical systems with emergent col-
lective computational abilities. Proceedings of the National Academy of Sciences,
79(8), 2554–2558.
Krauss, P., Schuster, M., Dietrich, V., Schilling, A., Schulze, H., & Metzner, C.
(2019). Weight statistics controls dynamics in recurrent neural networks. PLoS
ONE, 14(4), e0214541.
Rehmer, A., & Kroll, A. (2020). On the vanishing and exploding gradient problem
in gated recurrent units. IFAC-PapersOnLine, 53(2), 1243–1248.
Werbos, P. J. (1990). Backpropagation through time: What it does and how to do
it. Proceedings of the IEEE, 78(10), 1550–1560.
15
Creativity: Generative Artificial Intelligence
Creativity means seeing what others see, and thinking what no one else has thought.
Albert Einstein
What is Creativity?
Before we turn to creative AI, let’s first briefly define what we want to under-
stand as creativity in the context of this presentation. Most definitions of
creativity contain two main aspects. Creativity means creating something
that is both new and useful. In addition, the results of Eagleman and Brandt
suggest that creativity often arises from questioning old prejudices through
three key methods: Bending, Blending, and Breaking. These concepts repre-
sent different strategies for manipulating existing ideas, concepts, or frame-
works to create new insights, solutions, or creations (Eagleman & Brandt,
2017).
In Bending, an existing concept or idea is altered to create something
new. By changing some aspects of the original concept while retaining its
core, unique interpretations or innovations can be developed. Bending
expands the boundaries of an idea or concept and opens up new perspectives
and possibilities. In art, for example, Bending could mean taking a tradi-
tional painting style and adapting it to express a modern theme or object.
In Blending, two or more seemingly different ideas or concepts are
combined to create a new, unified whole. By synthesizing elements from
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 139
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_15
140
P. Krauss
Next, the user selects a specific layer of the network they want to work
with, thereby determining the level of abstraction of the newly generated
image. Lower layers generate simpler patterns, while higher layers produce
more complex features. The user provides an existing input image that will
serve as the starting point for the Deep Dream process. This image is passed
through all layers up to the selected layer. The activations in this layer are
read out and represent the patterns and features recognized in the image at
the corresponding level of abstraction.
Now, the user defines a target function, which usually aims to maximize
the sum of the activations in the selected layer. This stimulates the neural
network to enhance the recognized patterns and features in the input image.
Similar to training with backpropagation, the gradient of the target func-
tion with respect to the input image is calculated. This indicates how the
input image needs to be changed to increase the value of the target function
and is applied to the input image to update it. This process is repeated for a
certain number of iterations or until a termination criterion is reached. The
resulting image contains enhanced patterns and features, giving it a unique,
dream-like appearance.
As already mentioned, the degree of abstraction of the image content can
be controlled by selecting the layer in the neural network, as early layers
represent simpler patterns like corners and edges, while deep layers contain
more abstract representations of whole objects or scenes. This leads to a par-
adoxically appearing curiosity. If a deep layer with abstract representations is
chosen as the starting point of Deep Dreaming, the resulting images tend to
be more figurative. In contrast, early layers with simpler representations lead
to images that are reminiscent of abstract art.
Style Transfer
Style Transfer aims to apply the artistic style of one image (style image) to
the content of another image (content image) to create a new image that
combines the content of the first image with the style of the second image
(Gatys et al., 2015, 2016). In the sense of the three strategies of creativity
introduced above, this technique thus falls into the area of blending. For
this purpose, the same networks are used again, which also serve for image
recognition, to separate and recombine content and style information of
images.
142
P. Krauss
As with Deep Dreaming, the process begins with a network that has
already been pre-trained on a large number of images. The user specifies the
content image, which contains the motif or scene to be preserved, as well
as the style image, which represents the artistic style to be applied to the
content image. Subsequently, both images are passed separately through the
neural network. For the content image, the activations from one or more
late layers are used to capture high-level content features. For the style
image, activations from multiple layers of all hierarchy levels are used to cap-
ture both low-level style features (e.g., textures) and high-level style features
(e.g., structures). These style features are typically represented by so-called
Gram matrices, which capture the correlations between different features in
each layer and thus effectively encode the style information.
Subsequently, a starting image is first created, which often begins as
random noise or as a copy of the content image. The goal is to iteratively
update this starting image so that it corresponds to both the content fea-
tures of the content image and the style features of the style image. This is
achieved by defining a special error function, which consists of two main
components: content error and style error. The content error measures the
difference between the content features of the starting image and the con-
tent image, while the style error measures the difference between the style
features of the starting image and the style image. The total error is the
weighted sum of content and style error. By weighting the two parts, it can
be determined how strong the respective influence on the new generated
image should be, i.e., how far away it is from the original content or style.
Again, the gradient of the total error is calculated, which indicates how the
starting image needs to be changed to minimize the error. A corresponding
optimization algorithm is applied iteratively to update the starting image.
The final image retains the content of the content image, but now appears in
the artistic style of the style image.
iteratively improve in their respective tasks. The Deep Fakes thus generated
are usually indistinguishable from real images and videos.
Diffusion Models
Diffusion models are a class of generative models that produce images
through a process known as Denoising Score Matching (Vincent et al.,
2010; Swersky et al., 2011; Sohl-Dickstein et al., 2015). These models learn
to generate images by simulating a diffusion process that transforms a target
image into random noise, and then learn to reverse this process. The basic
idea is to train the model to predict the statistical distribution of the pixel
values of the original image from a noisy version.
The training of a diffusion model begins with a dataset of images. By add-
ing Gaussian noise, a sequence of increasingly noisy versions of each image
is then generated. This process is referred to as diffusion. Subsequently,
the model is trained to predict or reconstruct the next less noisy version of
the original image from a noisy version. During training, the model learns
to predict the statistics of the original image at each step of the diffusion
process. Through this training, the model becomes increasingly efficient at
removing noise from images. When the trained model is now given random
noise as input, it iteratively generates (hallucinates) a completely new ran-
dom image from it.
Diffusion models typically consist of layers of neurons with local connec-
tivity, with each neuron only connected to a small neighborhood of neu-
rons in the previous layer. This local connectivity is advantageous for image
processing as it allows the model to learn local patterns and capture spatial
hierarchies.
To generate images from text descriptions, diffusion models can be com-
bined with language models (see next chapter). There are various ways to
achieve this coupling, but a common approach is the use of a technique
known as conditional diffusion (Batzolis et al., 2021; Nichol et al., 2021).
In this case, the diffusion model is made dependent on the text description
by integrating it into the model architecture. First, the text description is
encoded using a pre-trained language model (e.g., GPT-3 or BERT). This
generates a high-dimensional vector representation of the meaning of the
text (text embedding). The diffusion model is then conditioned on the text
representation by modifying its architecture. This can be done by adding
the text embedding as additional input or by integrating it into the hidden
layers of the model. Finally, the model is trained with the same diffusion
144
P. Krauss
process as before, but now with images and the corresponding text descrip-
tions, so the diffusion process does not run “freely”, but is constrained by
the respective text embedding.
If an image is to be generated from a text description, the conditioned
model is again given random noise as visual input, but this time also with
the encoding of the text from the language model. The model then generates
a random image that corresponds to the input description.
The most well-known diffusion models include DALL-E 2, Stable
Diffusion, and Midjourney.
DALL-E 2 (Dali Large Language Model Encoder 2) was specifically devel-
oped for the generation of high-quality photorealistic images from natural
language descriptions. It uses a combination of image and text processing
to generate abstract image representations from linguistic descriptions of
objects, scenes, or concepts. These image representations are then used by
a so-called decoder network to generate photorealistic images from them.
DALLE-E 2 is freely accessible online.1
Released in 2022, Stable Diffusion2 is also a generative model for generat-
ing detailed images from text descriptions. However, it can also be used for
other tasks such as generating image-to-image translations based on a text
prompt. What is special about Stable Diffusion is that its complete program
code and all model parameters have been published.3 It can be operated
on most standard PCs or laptops equipped with an additional GPU and
offers full access for systematic analysis, customization, or further develop-
ment. This represents a departure from the practice of other AI models like
ChatGPT or DALL-E 2, which are only accessible via cloud services and
whose exact internal architecture has not been published so far.
Midjourney4 is considered the most advanced diffusion model currently
available. It is capable of even generating photorealistic images from text
descriptions artificially.
Deep Fakes can also be generated with diffusion models. It has been shown
that the newer diffusion models are significantly superior to the longer-exist-
ing Generative Adversarial Networks (Dhariwal & Nichol, 2021).
1 https://openai.com/product/dall-E-2
2 https://stablediffusionweb.com/
3 https://github.com/CompVis/stable-diffusion
4 https://www.midjourney.com.
15 Creativity: Generative Artificial Intelligence
145
Conclusion
There exists a whole range of various methods of generative AI for creating
new content such as images, videos, or even music. The generated images
are usually hardly distinguishable from real ones by humans. All such
approaches fall into the areas of bending and blending, and probably not
into the area of breaking.
It should not go unmentioned at this point that the creation of Deep
Fakes is not limited to the creation of images. The system VALL-E5 is capa-
ble of swapping the voice of any audio recording, for example. This is also a
kind of style transfer, where the content of the spoken language is matched
and preserved, while the target voice corresponds to the style that is being
replaced (Wang et al., 2023).
By the way, the writing style of a text can also be swapped without chang-
ing the content. This is referred to as Prose Style Transfer. This falls into the
domain of probably the most spectacular form of generative AI currently,
that of the so-called large language models like ChatGPT, which due to their
relevance and timeliness, a separate—the next—chapter is dedicated to.
References
Batzolis, G., Stanczuk, J., Schönlieb, C. B., & Etmann, C. (2021). Conditional
image generation with score-based diffusion models. arXiv preprint
arXiv:2111.13606.
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthe-
sis. Advances in Neural Information Processing Systems, 34, 8780–8794.
Eagleman, D., & Brandt, A. (2017). The runaway species: How human creativity
remakes the world. Catapult.
Foster, D. (2019). Generative deep learning: Teaching machines to paint, write, com-
pose, and play. O’Reilly Media.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style.
arXiv preprint. arXiv:1508.06576.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convo-
lutional neural networks. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 2414–2423).
5 https://vall-e.io/
146
P. Krauss
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
… & Bengio, Y. (2020). Generative adversarial networks. Communications of the
ACM, 63(11), 139–144.
Mordvintsev, A., Olah, C., & Tyka, M. (2015). Inceptionism: Going deeper into neu-
ral networks. Google Research Blog. https://research.google/pubs/pub45507.
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., … &
Chen, M. (2021). Glide: Towards photorealistic image generation and editing with
text-guided diffusion models. arXiv preprint arXiv:2112.10741.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June).
Deep unsupervised learning using nonequilibrium thermodynamics. In International
Conference on Machine Learning (pp. 2256–2265). PMLR.
Swersky, K., Ranzato, M. A., Buchman, D., Freitas, N. D., & Marlin, B.
M. (2011). On autoencoders and score matching for energy based models. In
Proceedings of the 28th international conference on machine learning (ICML-
11) (S. 1201–1208).
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P. A., & Bottou, L.
(2010). Stacked denoising autoencoders: Learning useful representations in
a deep network with a local denoising criterion. Journal of Machine Learning
Research, 11(12).
Wang, C., Chen, S., Wu, Y., Zhang, Z., Zhou, L., Liu, S., … & Wei, F. (2023).
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. arXiv pre-
print arXiv:2301.02111.
16
Talking AI: ChatGPT and Co.
I am an artificial intelligence that can process and generate answers in natural language
to various types of questions and tasks.
ChatGPT
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 147
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_16
148
P. Krauss
Word Vectors
Techniques for representing words are crucial for natural language processing as
they provide a way to convert text data into numerical representations that can
be processed by machine learning algorithms. These representations capture the
structure and semantics (meaning) of the language and allow models to recog-
nize relationships between words and perform various tasks based on this.
One-Hot Encoding is a simple technique for representing words, where
each word in the vocabulary is represented as a binary vector whose size
(dimensionality) corresponds to the size of the vocabulary. The vector has a
“1” at the index that corresponds to the position of the word in the vocab-
ulary, and a “0” at all other positions. While one-hot encoding is easy to
implement, it suffers from the curse of dimensionality, as the size of the
vectors increases with the size of the vocabulary, leading to inefficient rep-
resentations and increased computational complexity. Nevertheless, one-hot
encoding is an important intermediate step in generating a dense encoding,
i.e., word embeddings. These are low-dimensional continuous vector spaces
in which the words are embedded (Latent Space Embeddings), as opposed
to the high-dimensional and sparsely populated space of one-hot encoding.
Another advantage of word embeddings is that in this space, word vec-
tors are arranged so that semantically similar words are closer together, while
dissimilar words are further apart. Ideally, synonyms even occupy the same
location. In the space of one-hot encoding, however, the location of a vector
has no connection to the meaning of the corresponding word (Fig. 16.1).
Word2Vec is one of these neural network-based techniques for generating
word embeddings (Mikolov et al., 2013, 2017). It learns word embeddings
from large text corpora in an unsupervised manner, i.e., it does not require
labeled data. Word2Vec is particularly effective at detecting semantic and
syntactic relationships between words. There are two different approaches
used in Word2Vec: the Continuous Bag of Words (CBOW) model and the
SkipGram model.
In the CBOW architecture, the model learns word embeddings by predict-
ing a target word based on its surrounding context words within a certain
window. The input consists of the context words and the output is the target
word. During training, the model adjusts the word embeddings to minimize
the prediction error for the target word depending on the context words.
The SkipGram architecture is somewhat the reverse of the CBOW model.
In this case, the model learns word embeddings by predicting the context
words surrounding a given target word. The input is thus the target word, and
the output is a context word within a certain window around the target word.
16 Talking AI: ChatGPT and Co.
149
Fig. 16.1 One-Hot versus Dense Encoding. In one-hot encoding, each vector con-
tains only one one, all other elements are zeros. The vector space has as many dimen-
sions as there are words, with each word lying on its own coordinate axis. The vast
majority of this incredibly high-dimensional space remains largely unoccupied. In
dense encoding, on the other hand, significantly fewer dimensions are needed,
resulting in a more compact vector space
Both the CBOW and the SkipGram model use a flat three-layer neural
network with an input layer, a hidden layer, and an output layer. The word
embeddings, which are adjusted during training to minimize prediction
error, are created in the hidden layer. As the model learns, it captures the
semantic and syntactic relationships between the words (Fig. 16.2).
The result is a more efficient and semantically rich representation of
words, enabling NLP models to better understand and process language.
By training Word2Vec on very large amounts of text, it learns the meaning
of a word, so to speak, on the back of many other words that often appear
with the word to be learned.1 For example, in many texts, the words “bark”,
1 Learning the meaning of words on the back of other words is quite similar to a child’s language acqui-
sition. Although the first approximately 50 words are learned in a different way, namely by linking
the respective sensory representation with the corresponding word. From the so-called 50-word limit,
however, around the 2nd birthday, the so-called vocabulary spurt begins, with many new words being
learned. It is assumed that this phase of language acquisition benefits from the fact that the meaning of
new words can be inferred from already known words (Ferguson and Farwell, 1975; Rescorla, 1989;
Aitchison, 2012).
150
P. Krauss
CBOW SkipGram
the the
little little
dog
barks barks
loudly loudly
Fig. 16.2 Word2Vec. Both the CBOW and the SkipGram model use a flat three-layer
neural network with an input layer, a hidden layer, and an output layer. In CBOW,
the context words serve as input and the target word as output. In SkipGram, it is
exactly the opposite. Here, the target word is the input and the context words are
the desired output. In both cases, the word embedding of the target word is created
in the hidden layer, in this example “dog”. In both cases, the dimensionality of the
hidden layer is smaller than that of the input and output layers
“pet”, and “fur” often appear together with “dog”, thus defining its mean-
ing. Conversely, the words “hamster”, “cat”, and “dog” often appear with the
word “pet”, thus defining its meaning, analogously for every other word in
a language. The word embeddings generated by Word2Vec, e.g., v(“dog”),
then serve as input for further language processing algorithms such as
machine translation or document classification.
A remarkable feature of word embeddings is the—initially surprising—
fact that they can even be calculated with. For example, by adding and
16 Talking AI: ChatGPT and Co.
151
subtracting word vectors, new word vectors with meaningful meaning can
be generated, as the following example impressively shows:
Transformer
Transformer (Vaswani et al., 2017) are the basis of all modern Large
Language Models (LLM) like DeepL and ChatGPT. This new neural net-
work architecture has revolutionized many tasks of natural language process-
ing such as machine translation, question answering, and text summarization.
The Transformer architecture deviates from traditional recurrent neural
networks (RNNs) by relying on a technique known as attention mecha-
nism (Self-Attention). The attention mechanism allows input sequences to
be processed in parallel rather than sequentially. This has the advantage that
long-range dependencies—i.e., references between words with a greater dis-
tance within the sequence—can have a strong influence on the processing. If
the sequence is processed serially as in RNNs, the influence of one word on
another decreases very quickly with increasing distance. Another advantage
of the attention mechanism is that it is highly parallelizable, which leads to
shorter training times for the transformers (Fig. 16.3).
The clue behind the attention mechanism is that it allows the model to
weight the meaning of different words in a specific context. To do this, trans-
formers calculate a new representation for each word in the input sequence,
which takes into account both the word itself and the surrounding context.
First, from the word embeddings (word vectors, x ) of each word in the
input sequence, three new vectors are calculated: query (q ), key (k ), and
value vector (v ). The values of the conversion matrices between the differ-
ent types of vectors correspond to the internal parameters of the language
model, which are learned during training.
From the query and key vectors, the attention (a ) for each word at each
position in the sequence is determined by calculating the respective sca-
lar product. The attention indicates for each possible pair of words in the
sequence how important one word is for understanding the other word in
the sequence.
From the attention values of all words for a specific word and the corre-
sponding value vectors of all words, a new representation (y ) for this word
is finally calculated, analogously for all other words. To do this, all value
152
P. Krauss
unfold
Fig. 16.3 RNN vs. Transformer. In an RNN, input sequences are processed serially. At
each time step, the RNN receives the next word along with its own state in the pre-
vious time step as input. As a result, the influence of past words on the current word
decreases very quickly with increasing distance in the sequence. In contrast, trans-
formers process the sequence as a whole. This means that in principle, every word
in the sequence can have a strong influence on every other. This is controlled by the
attention mechanism
vectors are added, after they have been weighted by multiplication with the
respective attentions (Fig. 16.4).
That one can calculate with the vectorial representation of words and
thereby meaningful new meanings arise, we have already seen in the pre-
vious section using the example “King/Queen”. The new representation of
each word thus corresponds to its own meaning plus the weighted sum of
the meaning of all other words in the specific context of the entered text.
Simplified, the following happens: The “standard meaning” of each word,
as it would be in a dictionary, is modified and adapted to the specific situa-
tion. From a linguistic point of view, this corresponds to the transition from
semantics to pragmatics.
Modern transformers are made up of many such modules, consisting of
attention and subsequent forward-directed neural network. Like the layers
in a deep neural network, they are connected in series, with the output of
one module serving as input for the subsequent module serves. ChatGPT,
for example, consists of 96 of these modules or layers. In addition, on each
layer there are several of these modules in parallel, which can then specialize
in different aspects, such as different languages or types of text.
16 Talking AI: ChatGPT and Co.
153
she plays Go
she she
plays
plays
she plays Go
Fig. 16.4 Attention mechanism. From each word vector x of the input sequence,
three new vectors are generated: query (q ), key (k ), and value vector (v ). The atten-
tion a, which represents the relative importance of a word for understanding another
word in the sequence, is calculated by the scalar product of query and key vectors.
Finally, a new representation for each word y is created by weighting the value vec-
tors of all words with the respective attention and adding them up
It has been shown that by iterative prediction of the next word based on
the previous input sequence of words and subsequent appending of the pre-
dicted word to the next input, in principle arbitrarily long, meaningful texts
can be generated (Liu et al., 2018). The final output, which the transformer
is trained on, therefore corresponds to the probability for each individual
word of the language (which can be several tens of thousands) that it will
appear next in the text. Subsequently, a word is randomly selected according
to this probability distribution over all words. So it is not generally the most
probable word that is chosen (which would correspond to a winner-takes-all
154
P. Krauss
Finally, in March 2023, GPT-4 was released, which in some tests appar-
ently performs even better than humans and shows signs of general artifi-
cial intelligence (Bubeck et al., 2023). So far, no details have been published
about its exact structure and the training data used.
ChatGPT
The freely accessible ChatGPT, released in November 2022, is probably the
most well-known example of a large language model and one of the most
advanced AI models for conversations. It is based on GPT-3 and is designed
to understand and generate human-like responses in a dialogue environ-
ment. It is so powerful that it can conduct longer coherent and context-re-
lated conversations with users. It can generate any type of text in seconds,
answer questions on any topic, and conduct conversations, remembering
their course and thus usually responding adequately in longer dialogues.
ChatGPT can summarize, rephrase, or translate texts in several dozen lan-
guages, tell jokes, write songs, and even program in all common program-
ming languages.
Remarkably, ChatGPT is even capable of playing strategic games like
chess, Go, and poker. During its training on virtually any type of text availa-
ble on the internet, it has, for example, “read” tens of thousands of games in
chess notation, each time trying to predict the next word or character. This
apparently resulted in the side effect that it learned the underlying rules of
the respective game without ever having seen a chessboard or pieces. This
suggests that the approach of encoding a problem into a sequence of ele-
ments and then learning to predict the next element could be a universally
applicable strategy in information processing and cognitive systems.
As already mentioned, ChatGPT is based on GPT-3. Unlike its predeces-
sors, which were trained exclusively self-supervised (on predicting the next
word), ChatGPT underwent subsequent fine-tuning through human feed-
back, which proved to be a crucial step for generating even better responses
and texts.
This fine-tuning was based on supervised learning and reinforcement
learning (Reinforcement Learning) and proceeded in three steps. First,
humans generated many example conversations, each consisting of a request
and the corresponding appropriate responses. To assist them in designing
their responses, the human trainers could access previously automatically
generated suggestions and then adapt them. In this way, a large dataset was
created, which contains real dialogues with good, i.e., adequate responses.
156
P. Krauss
Speech Interfaces
Speech interfaces such as Siri, Alexa, and Google Assistant have revolution-
ized the way we interact with our devices. They provide a conversational
interface that responds to user requests in a natural, human-like manner.
These voice-controlled AI assistants are trained with large amounts of text
data, like ChatGPT and other language models, to understand and gen-
erate natural language. While Siri, Alexa, and Google Assistant primarily
focus on responding to specific commands and questions, often with a focus
on functional support, e.g., setting reminders or playing music, models
like ChatGPT are designed to generate more nuanced and context-related
16 Talking AI: ChatGPT and Co.
157
Answers
Good answer.
Answer
Conclusion
The Transformer architecture and large language models like the GPT
series and DeepL have revolutionized natural language processing, with
the attention mechanism and the ability to consider far-reaching depend-
encies playing a key role. Despite these advances, there are still challenges
158
P. Krauss
References
Aitchison, J. (2012). Words in the mind: An introduction to the mental lexicon. Wiley.
Booth, A. D., & Richens, R. H. (1952). Some methods of mechanized translation. In
Proceedings of the Conference on Mechanical Translation.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., &
Amodei, D. (2020). Language models are few-shot learners. Advances in Neural
Information Processing Systems, 33, 1877–1901.
2 https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-
survived-three-days-gpt-3-science/
3 https://www.nytimes.com/2016/03/25/technology/microsoft-created-A-twitter-bot-to-learn-from-us-
ers-it-quickly-became-A-racist-jerk.html.
16 Talking AI: ChatGPT and Co.
159
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., …
& Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with
GPT-4. arXiv preprint arXiv:2303.12712.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training
of deep bidirectional transformers for language understanding. arXiv preprint
arXiv:1810.04805.
Ferguson, C. A., & Farwell, C. B. (1975). Words and sounds in early language
acquisition. Language, 419–439.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural
Computation, 9(8), 1735–1780.
Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., & Shazeer, N.
(2018). Generating wikipedia by summarizing long sequences. arXiv preprint
arXiv:1801.10198.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances
in pre-training distributed word representations. arXiv preprint arXiv:1712.09405.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applica-
tions in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Rescorla, L. (1989). The Language Development Survey: A screening tool for
delayed language in toddlers. Journal of Speech and Hearing disorders, 54(4),
587–599.
Trivedi, A., Pant, N., Shah, P., Sonik, S., & Agrawal, S. (2018). Speech to text
and text to speech recognition systems? A review. IOSR Journal of Computer
Engineering, 20(2), 36–43.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … &
Polosukhin, I. (2017). Attention is all you need. Advances in neural information
processing systems, 30.
17
What are AI Developers Researching Today?
Learning to Learn
In the rapidly evolving world of Artificial Intelligence, Machine Learning in
particular has made great strides in recent years. Almost daily, technical arti-
cles are published introducing new methods or further developing existing
ones. Even experts often find it difficult to keep up with the enormous speed
of development.
In this chapter, we want to take a closer look at some of these latest trends
and developments. All these innovative approaches have in common that
they have the potential to revolutionize the way AI systems learn and adapt.
The conventional approach of supervised learning requires a large amount of
labeled data for each new task, the procurement of which can be time-con-
suming and expensive. The new approaches are intended to enable efficient
use of prior knowledge, generalize across tasks, or solve novel problems with
minimal training.
Few-Shot, One-Shot, and Zero-Shot Learning are some of these advanced
Machine Learning techniques that are supposed to enable models to learn new
tasks or recognize objects with minimal amounts of data. These approaches
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 161
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_17
162
P. Krauss
have gained importance in recent years as they can potentially overcome the
need for huge amounts of training data and shorten training time.
Few-Shot Learning
Humans are very good few-shot learners: You don’t have to show a child
thousands of pictures of apples for it to learn the concept of “apple”. A few,
often even a single example, is usually sufficient.
With Few-Shot Learning, the aim is to train models that can quickly
adapt to new tasks with a small amount of training data, such as classifying
new objects based on a few examples (Snell, 2017; Sung et al., 2018). To do
this, the model is first trained on a relatively small dataset that contains only
a few examples for each class or task, and then tested on a new set of exam-
ples. The idea is to teach the model to learn from a few examples and gen-
eralize to new examples, rather than needing large amounts of data for each
task. A recent development in this area are Prototypical Networks (Snell
et al., 2017), where a deep neural network is used to learn a metric space in
which objects of the same class are grouped. This method has shown prom-
ising results in tasks such as image classification, where the model can recog-
nize new categories with just a few examples.
One-Shot Learning
With One-Shot Learning, the concept of learning from a few examples
is continued by having the model learn from just one example per class
(Santoro et al., 2016; Vinyals, 2016). A notable innovation in this area are
Memory Enhanced Neural Networks. This type of neural network uses an
external memory matrix to store and retrieve information about previously
seen examples, so that the model can make accurate predictions based on a
single example. One-Shot Learning has proven particularly useful in tasks
such as handwriting recognition, where a model can accurately recognize a
writer’s style based on a single example.
In biology, One-Shot Learning can be observed, for example, in animals
that are able to quickly recognize and respond to new stimuli or situations
without having been previously exposed to them. For instance, some bird
species are able to quickly recognize and avoid dangerous prey after a single
experience. Of course, humans are also excellent One-Shot learners.
17 What are AI Developers Researching Today?
163
Zero-Shot Learning
With Zero-Shot Learning, on the other hand, models can make predictions
for completely unknown classes without explicit training examples (Norouzi
et al., 2013; Socher et al., 2013). The model is trained to recognize objects
or categories it has never seen before. It can therefore classify new input pat-
terns even if no labeled data for the respective class were available during
training. In contrast to supervised learning, where a model is trained with a
certain amount of labeled data examples, Zero-Shot Learning is based on the
transfer of knowledge from related or similar classes that were seen during
training.
This is achieved by using semantic representations such as word vectors
that capture the meaning and relationships between different classes. In
this case, the word vectors replace the labels of the images. For example, if a
model has been trained to recognize images of different animal species (e.g.,
horse, tiger, …), but has never seen an image of a zebra, it can still classify
it as an animal and categorize it into a new category, as it has learned the
relationships between different animal species. In this case, the zebra would
probably be classified into a mixed category of the already learned catego-
ries “horse” (because of the shape of the animal) and “tiger” (because of the
stripes).
Zero-Shot Learning thus enables more efficient and flexible training of
models and generalization to new and unknown categories.
Transfer Learning
In transfer learning, a neural network is first trained on a very large data-
set and then refined on a smaller dataset for a specific task or special
application. This retraining of an already trained network is called fine-tun-
ing. The idea behind it is that the knowledge acquired in solving one prob-
lem can be transferred to another, related problem, thereby reducing the
amount of data and the time required for training a new model (Torrey &
Shavlik, 2010).
If the neural network was initially trained on the gigantic ImageNet data-
set, which consists of 14 million images divided into 20,000 categories, it
can be assumed that this network has already learned a lot of representations
that are useful for general image recognition. Thus, this network can be effi-
ciently adapted to a new task with a short fine-tuning on a few images.
164
P. Krauss
Meta-Learning
In traditional machine learning, a model is trained on a fixed set of train-
ing data and then used to make predictions for new, unseen data. In con-
trast, meta-learning approaches aim to enable models to learn from a small
amount of data and generalize to new tasks with little or no additional train-
ing (Finn et al., 2007; Santoro et al., 2016). To this end, a so-called meta-
learner is usually trained, i.e., a model that learns how to learn by observing
and extracting patterns from a series of different training tasks. The meta-
learner then uses this knowledge to quickly adapt to new tasks with similar
features.
There are various approaches to meta-learning, including metric-based
learning, optimization-based learning, and model-based learning. In met-
ric-based learning, a similarity metric is learned to compare new tasks with
the training tasks. In optimization-based learning, a model is trained to
quickly optimize its weights for a new task, while in model-based learning,
a generative model of the data is learned that can be used for quick adapta-
tion to new tasks. For example, a meta-learner can be generally trained to
play various strategy games or card games. In a new game, it can then use its
existing knowledge about how games generally work to learn the new game
faster. A similar effect is also observed in translation software.
In 2016, Google introduced the Google Neural Machine Translation
(GNMT) system, a significant advancement in the field of machine trans-
lation that surpassed all existing models (Wu et al., 2016). An interesting
observation was made when the GNMT system, which was initially trained
to translate from English to Spanish and then to translate from English to
Chinese, improved its translation between English and Spanish after being
trained with Chinese text!
This phenomenon can be attributed to the development of a so-called
interlingua, i.e., a common representation of meaning between languages.
As the model was trained on more language pairs, it learned to map the
input text from various languages onto a semantic space that was independ-
ent of the specific language. As a result, the model could now also translate
between languages, even if it had not been explicitly trained for this specific
language pair. The latest version of GNMT now supports 109 languages and
can translate between them in any direction, which is over 11,000 possible
language combinations. The model was explicitly trained only on a very
small fraction of all these possibilities.
17 What are AI Developers Researching Today?
165
Conclusion
The mentioned methods open up new possibilities for AI systems to learn
more efficiently and adapt to new situations, thus paving the way for more
versatile and robust AI applications in a wide range of application areas.
Beyond the presented methods, such as how machines learn to learn, inten-
sive research is already being conducted on the next steps of development.
166
P. Krauss
References
De Raedt, L., Dumancic, S., Manhaeve, R., & Marra, G. (2020). From sta-
tistical relational to neuro-symbolic artificial intelligence. arXiv preprint
arXiv:2003.08316.
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast
adaptation of deep networks. In International Conference on Machine Learning
(pp. 1126–1135). PMLR.
Maier, A., Köstler, H., Heisig, M., Krauss, P., & Yang, S. H. (2022). Known opera-
tor learning and hybrid machine learning in medical imaging – a review of the past,
the present, and the future. Progress in Biomedical Engineering.
Maier, A. K., Syben, C., Stimpel, B., Würfl, T., Hoffmann, M., Schebesch, F., &
Christiansen, S. (2019). Learning with known operators reduces maximum error
bounds. Nature Machine Intelligence, 1(8), 373–380.
Marcus, G. F. (2003). The algebraic mind: Integrating connectionism and cognitive sci-
ence. MIT press.
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., … & Dean,
J. (2013). Zero-shot learning by convex combination of semantic embeddings. arXiv
preprint arXiv:1312.5650.
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016, June).
Meta-learning with memory-augmented neural networks. In International confer-
ence on machine learning (pp. 1842–1850). PMLR.
Sarker, M. K., Zhou, L., Eberhart, A., & Hitzler, P. (2021). Neuro-symbolic artifi-
cial intelligence. AI Communications, 34(3), 197–209.
Socher, R., Ganjoo, M., Manning, C. D., & Ng, A. (2013). Zero-shot learning
through cross-modal transfer. Advances in neural information processing systems,
26.
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learn-
ing. Advances in neural information processing systems, 30.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H., & Hospedales, T. M. (2018).
Learning to compare: Relation network for few-shot learning. In Proceedings of the
IEEE conference on computer vision and pattern recognition (pp. 1199–1208).
Torrey, L., & Shavlik, J. (2010). Transfer learning. InHandbook of research on
machine learning applications and trends: algorithms, methods, and techniques (pp.
242–264). IGI global.
17 What are AI Developers Researching Today?
167
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks
for one shot learning. Advances in neural information processing systems, 29.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Dean,
J. (2016). Google’s neural machine translation system: Bridging the gap between
human and machine translation. arXiv preprint arXiv:1609.08144.
Part III
Challenges
Where there is much light, there is naturally often just as much shadow.
Amidst all the spectacular successes in the field of Artificial Intelligence,
particularly in Deep Learning, it should not be overlooked that there are
many smaller and some fundamental challenges—some even speak of major
crises—which still need to be solved.
Of course, brain research is still far from having developed an overarch-
ing theory of the brain, as we have seen particularly in the chapters on con-
sciousness and free will. But even apart from these very big questions, there
are still many aspects of how the brain works that remain ununderstood.
In this third part of the book, the aim is to describe the major challenges
of both disciplines.
18
Challenges of AI
Realistically speaking, deep learning is just a part of the larger challenge associated with
building intelligent machines.
Gary Marcus
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 171
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_18
172
P. Krauss
Hallucinating Chatbots
1 https://www.theverge.com/2016/6/30/12072408/tesla-autopilot-car-crash-death-autonomous-model-s
18 Challenges of AI
173
Don’t worry, then you are in good company. Probably there is not a sin-
gle person on this planet who sees a difference. However, if we give these
two pictures to a classifier (neural network trained on image recognition) as
input, it will correctly recognize the left version of the picture as a “panda”,
while it assigns the middle picture to the category “gibbon” with 99%
certainty.
The middle picture is a so-called Adversarial Example, which was delib-
erately manipulated to mislead the classifier (Goodfellow et al., 2014; Xiao
et al., 2018; Xie et al., 2019). The intention behind this and similar exper-
iments is to identify possible weaknesses and errors in neural networks in
order to overcome them in the future. Similar to hackers who are specifically
commissioned by companies or authorities to break into their own IT net-
work in order to gain insights for optimizing security systems.
Now, in practice, i.e. in the real world, of course, you cannot simply
change every pixel of an image without further ado. But there is also a “solu-
tion” for this: Adversarial Patches. These strangely looking stickers have dra-
matic effects on the recognition performance of our image classifier when we
place them next to a banana, for example.2 While every four-year-old child
correctly recognizes the banana with or without this sticker, the classifier is
99% sure that the banana is a toaster when the adversarial patch is next to
the banana (Brown et al., 2017).
2 https://youtu.be/i1sp4X57TL4
174
P. Krauss
What may seem amusing at first glance and in this example turns out to
be a serious problem upon closer inspection. As already mentioned, auton-
omous driving also relies on image recognition to generate the appropriate
control commands for the vehicle such as “brake”, “accelerate” or “turn left”.
Unfortunately, there are also Adversarial Patches that—from the perspec-
tive of a neural network—turn a stop sign into a sign for a speed limit of 45
miles per hour (Eykholt et al., 2018). The consequences are unimaginable.
This unfortunately opens up entirely new possibilities for terrorist attacks.
If you feel like the author, then you would not want to sit in such a car, and
you probably would not want to live in a city where such vehicles are on the
road.
A Critical Appraisal
In his article “Deep Learning: A Critical Appraisal”, Gary Marcus identifies
ten weaknesses of current deep learning (Marcus, 2018). Some of these we
have already encountered, but for the sake of completeness, they will be
briefly outlined here again.
• Limited ability for transfer learning: Deep learning models struggle to gen-
eralize knowledge across different tasks or domains, unlike human learn-
ing, where skills and knowledge can be easily transferred and adapted.
• Data inefficiency: Deep learning models often require large amounts of
data to achieve high performance, while humans can learn effectively
from just a few examples.
• Lack of unsupervised learning methods: Current deep learning models are
mostly based on supervised learning, which requires labeled data for train-
ing. Human learning, on the other hand, largely occurs unsupervised.
• Inability to learn from explicit rules: Deep learning models usually learn
from patterns in the data and not from explicit rules, making it difficult for
them to acquire knowledge that can be easily expressed in the form of rules.
• Opacity: Deep learning models are often criticized as “black boxes”
because they are not interpretable, making it difficult to understand how
they arrive at their decisions.
• Vulnerability to attacks: As we have already seen, deep learning models
can be easily deceived by adversarial attacks, i.e., input patterns deliber-
ately designed to lead the model to make incorrect predictions.
176
P. Krauss
Conclusion
Adversarial attacks, that is, targeted attacks on a machine learning system
with the aim of manipulating the behavior of the learning system or con-
fusing it and causing it to make incorrect predictions, are a serious problem.
The examples show that even small disturbances in the input data can signif-
icantly influence the behavior of learning systems, which in turn poses a risk
to the security and reliability of such systems.
It is not yet fully understood why adversarial examples or patches can
so easily deceive artificial neural networks, while natural neural networks
(brains) are immune to this type of deception.
Adversarial attacks therefore represent an important area of research in
the field of security of machine learning systems. In particular, adversar-
ial machine learning investigates such attacks and tries to develop effective
defenses against them.
The shortcomings of today’s AI systems mentioned by Gary Marcus high-
light some of the limitations of deep learning and suggest that the combina-
tion of deep learning with other AI techniques (hybrid machine learning) or
the solution of these problems by developing new approaches within deep
learning could lead to more robust and versatile AI systems.
Incorporating insights from brain research can be crucial in overcoming
the shortcomings of current deep learning:
not be such a bad idea to look at how it works in the brain and then transfer
the principles and be motivated or inspired for new machine learning meth-
ods or applications. This would then be neuroscience-inspired AI.
References
Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on
explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160.
Brown, T. B., Mané, D., Roy, A., Abadi, M., & Gilmer, J. (2017). Adversarial
patch. arXiv preprint. arXiv:1712.09665.
Castelvecchi, D. (2016). Can we open the black box of AI? Nature News,
538(7623), 20.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable
machine learning. arXiv preprint. arXiv:1702.08608.
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., …, & Song,
D. (2018). Robust physical-world attacks on deep learning visual classification.
In Proceedings of the IEEE conference on computer vision and pattern recognition
(pp. 1625–1634).
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing
adversarial examples. arXiv preprint. arXiv:1412.6572.
Hutson, M. (2018). Has artificial intelligence become alchemy? Science, 360, 478–
478. https://doi.org/10.1126/science.360.6388.478.
Lipton, Z. C., & Steinhardt, J. (2018). Troubling trends in machine learning schol-
arship. arXiv preprint. arXiv:1807.03341.
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint.
arXiv:1801.00631.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”
Explaining the predictions of any classifier. In Proceedings of the 22nd ACM
SIGKDD international conference on knowledge discovery and data mining
(pp. 1135–1144).
Xiao, C., Li, B., Zhu, J. Y., He, W., Liu, M., & Song, D. (2018). Generating adver-
sarial examples with adversarial networks. arXiv preprint. arXiv:1801.02610.
Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019).
Improving transferability of adversarial examples with input diversity. In
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
(pp. 2730–2739).
Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in
global risk. Global Catastrophic Risks, 1(303), 184.
19
Challenges of Brain Research
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 179
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_19
180
P. Krauss
“The EEG researcher quickly went to work, putting an EEG cap on the mother-
board and measuring voltages at various points all over it, including on the outer
case for a reference point. She found that when the hard disk was accessed, the
disk controller showed higher voltages on average, and especially more power in the
higher frequency bands. When there was a lot of computation, a lot of activity was
seen around the CPU” (Brown, 2014).
182
P. Krauss
“Finally the neuropsychologist comes along. She argues (quite reasonably) that
despite all of these findings of network interactions and voltage signals, we cannot
infer that a given region is necessary without lesion studies. The neuropsychologist
then gathers a hundred computers that have had hammer blows to various parts of
the motherboard, extension cards, and disks. After testing their abilities extensively,
she carefully selects just the few that have a specific problem with the video output.
She finds that among computers that don't display video properly, there is an over-
lapping area of damage to the video card. This means of course that the video card
is necessary for proper video monitor functioning” (Brown, 2014).
Despite all their discoveries, the question remains open as to whether they
really understood how the computer works. This is because they primar-
ily focused on the larger observable patterns and interactions and not on
the underlying mechanisms and processes that make the computer work
(Carandini, 2012).
The moral of the story is that despite the many sophisticated methods in
neuroscience, there is a lack of a unified, mechanistic, and theoretical super-
structure (Platt, 1964) to fully understand how the elements of the brain
work together to form functional units and generate complex cognitive
behavior. There are many different models and approaches, but no unified
theoretical language to evaluate empirical results or make new predictions.
The tale underscores the need for a basic mechanistic framework and
emphasizes how important it is for those conducting empirical research to
understand the premises and implications of the models.
of several billion transistors, the MOS 6502 consisted of only 3510 transis-
tors. In the study, it served as a “model organism” performing three differ-
ent “behaviors”, namely the three classic video games Donkey Kong, Space
Invaders, and Pitfall.
The idea behind this approach is that the microprocessor, as an artificial
information processing system, has three crucial advantages over natural
nervous systems. First, it is fully understood at all levels of description and
complexity, from the global architecture with registers and memory and the
entire data flow, to local circuits, individual logic gates, down to the physical
structure and switching dynamics of a single transistor. Second, its internal
state is fully accessible at any time without restrictions in terms of tempo-
ral or spatial resolution. And third, the (emulated) microprocessor offers
the possibility to perform any invasive experiments on it, which would be
impossible on “natural information processing systems” (brains) for ethical
or technical reasons.
Using this framework, the authors applied a wide range of common
data analysis methods from neuroscience to investigate the structural and
dynamic properties of the microprocessor. They even conducted EEG meas-
urements and lesion studies!
The authors concluded that, although each of the applied methods pro-
vided interesting results, which were strikingly similar to those known from
neuroscience or psychological studies, none of them could actually provide
insight into how the microprocessor actually works, or, more generally,
was suitable for gaining a mechanistic understanding of the system under
investigation.
1 Alan Newell was an American computer scientist and cognitive psychologist and is considered one of
had been developed and built that could perform the same inputs, transfor-
mations, and outputs of information—and thus have the same function—as
the biological model.
A beautiful analogy to this is the problem of flying. For centuries, it was
thought that to fly, one must copy bird flight, and shaky-looking construc-
tions with wings were built, which moved up and down, but were com-
pletely unsuitable for flying. Surely you know the film footage of the men
in their flying machines, whose flight attempts usually ended in the abyss
with a crash landing. Only when we had understood the laws of aerodynam-
ics and fluid mechanics, the principles of dynamic pressure and lift, did we
actually succeed in constructing flying machines. It turned out that wings
fluttering up and down with feathers are not necessary.
The question of when one has truly understood a system can ultimately
be answered with: when one can reproduce the system and its functionality.
In the case of the human brain, this would mean constructing a General
Artificial Intelligence (AGI) at a human level based on the understood prin-
ciples of neural information processing.
References
Brown, J. W. (2014). The tale of the neuroscientists and the computer: Why mech-
anistic theory matters. Frontiers in Neuroscience, 8, 349.
Carandini, M. (2012). From circuits to behavior: A bridge too far? Nature
Neuroscience, 15, 507–509. https://doi.org/10.1038/nn.3043.
Holt, N., Bremner, A., Sutherland, E., Vliek, M., Passer, M., & Smith, R. (2019).
ebook: Psychology: The science of mind and behaviour, 4e. McGraw Hill.
Jonas, E., & Kording, K. P. (2017). Could a neuroscientist understand a micropro-
cessor? PLoS Computational Biology, 13(1), e1005268.
Kriegeskorte, N., & Douglas, P. K. (2018). Cognitive computational neuroscience.
Nature Neuroscience, 21(9), 1148–1160.
Lazebnik, Y. (2002). Can a biologist fix a radio? – Or, what I learned while studying
apoptosis. Cancer Cell, 2(3), 179–182.
Lewin, K. (1943). Defining the ‘field at a given time. Psychological Review, 50(3),
292.
Marr, D. (1982). Vision. MIT Press.
Newell, A. (2012). You Can’t Play 20 Questions with Nature and Win: Projective
Comments on the Papers of This Symposium. In Machine Intelligence (pp.
121–146). Routledge.
19 Challenges of Brain Research
187
As we have seen, despite all the successes in brain research and AI, there are
still some unsolved problems and challenges that are best solved through
close collaboration between both disciplines.
When considering the main goal of brain research, which consists of
understanding how perception, cognition, and behavior are implemented in
the brain, and the ultimate goal of AI research, to create systems capable
of perception, cognition, and behavior at a human level or even beyond, it
becomes apparent that these goals are complementary to each other.
It therefore makes sense to combine the various approaches of the two
disciplines. The integration of theories, methods, and concepts from brain
and AI research allows for a more comprehensive analysis of neural and
mental processes and a better understanding of artificial and natural cogni-
tive information processing systems. By integrating the disciplines, synergies
can be created and new insights can be gained that would not be possible
with a single discipline alone.
There are essentially four different types of integration that can be distin-
guished: The perhaps most obvious is to use AI as a tool for data analysis in
brain research. Secondly, AI and especially artificial neural networks can also
serve as model systems for the brain. Thirdly, there are a multitude of meth-
ods in neuroscience for analyzing biological neural networks, which can nat-
urally also be used to investigate their artificial counterparts, thus opening
the black box. And finally, the brain can serve as an almost inexhaustible
source of inspiration for new algorithms and architectures in AI.
190 Integration:
Each of these four aspects is dedicated to its own chapter in the following
fourth part of the book, in which the integration of AI and brain research is
to be illustrated using some selected examples of current research.
The final chapter is dedicated to the question of whether there can ever be
conscious machines and what we can expect in the future from the integra-
tion of these exciting disciplines of AI and brain research.
20
AI as a Tool in Brain Research
Nobody says it like this, but I believe that Artificial Intelligence is almost a humanities
discipline. It is actually an attempt to understand human intelligence and human
cognition.
Sebastian Thrun
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 191
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_20
192
P. Krauss
Inception Loops
A further exciting application of deep neural networks in brain research
arises from the challenge of finding such sensory stimuli that optimally
activate certain neurons, which is a key aspect for understanding how the
brain processes information. Due to the nonlinear nature of sensory process-
ing and the high dimensionality of the input—e.g., millions of pixels in the
visual system—it has so far been difficult to impossible to optimize the sen-
sory input specifically.
In a sensational study, a method called Inception Loop was developed to
solve this problem (Walker et al., 2019). The basic idea is based on the con-
cept of Deep Dreaming, where not a neural network is adapted to a specific
input, but instead the input is adapted to the neural network (see the chap-
ter on generative AI).
First, a deep neural network is trained as a so-called forward model to
predict with high accuracy neuronal response patterns from the primary
visual cortex of mice to perceived images. The trained model is then used
194
P. Krauss
Conclusion
The combination of Big Data and Artificial Intelligence has proven to be a
crucial tool in brain research (Vogt, 2018). In particular, the ability to pro-
cess and analyze large amounts of data allows us to better understand the
human brain and how it functions. We have seen progress in the classifi-
cation of sleep stages, the ability to study language processing in a more
complex and realistic way, and the development of techniques that help us
explore information processing in the brain. These can serve as a basis for
advanced brain-computer interfaces, which enable the control of prosthetics
or the translation of thoughts into written or spoken language, as well as the
control of vehicles or aircraft. One day, they might even be used for direct
communication between two brains. Such a telepathy interface for exchang-
ing thoughts instead of words is one of the stated long-term goals of Elon
Musk’s company Neuralink.1
Particularly exciting is the method of Inception Loops, which allows for
the identification of sensory stimuli that optimally activate certain neurons.
This technique could revolutionize our understanding of the brain and cog-
nition. A fascinating, albeit speculative, outlook on this development could
enable brain-in-a-vat scenarios in the future, reminiscent of films like Matrix
or Source Code. If we were able to identify and generate the specific stimuli
that trigger certain neuronal activities, we could theoretically create sensory
experiences that are indistinguishable from reality. By simulating optimal
stimuli, an artificial environment could thus be created that completely
convinces the brain. This would be the ultimate virtual reality.
1 https://www.dw.com/en/can-elon-musks-neuralink-tech-really-read-your-mind/a-65227626
20 AI as a Tool in Brain Research
195
References
Akbari, H., Khalighinejad, B., Herrero, J. L., Mehta, A. D., & Mesgarani, N.
(2019). Towards reconstructing intelligible speech from the human auditory cor-
tex. Scientific Reports, 9(1), 1–12.
Anumanchipalli, G. K., Chartier, J., & Chang, E. F. (2019). Speech synthesis from
neural decoding of spoken sentences. Nature, 568(7753), 493–498.
Donoghue, J. P. (2002). Connecting cortex to machines: Recent advances in brain
interfaces. Nature Neuroscience, 5(Suppl 11), 1085–1088.
Guger, C., Harkam, W., Hertnaes, C., & Pfurtscheller, G. (1999, November).
Prosthetic control by an EEG-based brain-computer interface (BCI). In Proceedings
of the 5th European conference for the advancement of assistive technology (pp. 3–6).
Hamilton, L. S., & Huth, A. G. (2020). The revolution will not be controlled:
Natural stimuli in speech neuroscience. Language, Cognition and Neuroscience,
35(5), 573–582.
Hauk, O., & Weiss, B. (2020). The neuroscience of natural language processing.
Language, Cognition and Neuroscience, 35(5), 541–542.
Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L.
(2016). Natural speech reveals the semantic maps that tile human cerebral cor-
tex. Nature, 532(7600), 453–458.
Jääskeläinen, I. P., Sams, M., Glerean, E., & Ahveninen, J. (2020). Movies and nar-
ratives as naturalistic stimuli in neuroimaging. NeuroImage, 117445, 224.
Kangassalo, L., Spapé, M., & Ruotsalo, T. (2020). Neuroadaptive modelling for
generating images matching perceptual categories. Scientific Reports, 10(1), 1–10.
Kemmerer, D. (2014). Cognitive Neuroscience of Language. Psychology Press.
Krauss, P., Metzner, C., Joshi, N., Schulze, H., Traxdorf, M., Maier, A., &
Schilling, A. (2021). Analysis and visualization of sleep stages based on deep
neural networks. Neurobiology of Sleep and Circadian Rhythms, 10, 100064.
Lee, Y. E., & Lee, S. H. (2022). EEG-transformer: Self-attention from transformer
architecture for decoding EEG of imagined speech. In 2022 10th International
Winter Conference on Brain-Computer Interface (BCI) (pp. 1–4). IEEE.
Makin, J. G., Moses, D. A., & Chang, E. F. (2020). Machine translation of cortical
activity to text with an encoder–decoder framework. Nature Neuroscience, 23(4),
575–582.
McFarland, D. J., & Wolpaw, J. R. (2008). Brain-computer interface operation of
robotic and prosthetic devices. Computer, 41(10), 52–56.
196
P. Krauss
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 197
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_21
198
P. Krauss
Visual Processing
Especially with regard to the human visual system, a number of studies have
already shown that artificial neural networks and the brain have striking
similarities in the processing and representation of visual stimuli. The basic
procedure in all these studies is always that on the one hand, subjects are
shown a series of images while their brain activity is measured, usually with
EEG, MEG or fMRI. On the other hand, the same images are presented as
input to deep neural networks that have been trained on image recognition
but have not yet seen these test images. The activation state of the artificial
neurons from all layers is then read out. Advanced statistical methods such
as e.g. Representational Similarity Analysis (RSA, see glossary) are then used
to compare the brain activations with the activations of the neural network.
For example, it has been shown that artificial neural networks have the same
complexity gradient of neuronal representations of images in their interme-
diate layers as is known from the visual system, especially from the visual
cortex areas. Thus, the lower layers deal more with simple features such as
corners and edges, while the upper layers represent more complex features
or whole objects like faces (Kriegeskorte, 2015; Güçlü & van Gerven,
2015; Yamins & DiCarlo, 2016; Cichy et al., 2016; Srinath et al., 2020;
Mohsenzadeh et al., 2020).
An astonishing discovery was that in deep neural networks trained on
object recognition, number detectors spontaneously emerge (Nasr et al.,
2019). These are neurons that become active whenever a certain number of
something is visible, regardless of the shape, color, size, or position of the
objects.
Another groundbreaking new insight was that recurrent connections are
necessary, i.e., feedback from higher to lower layers, to correctly capture
the representation dynamics of the human visual system (Kietzmann et al.,
21 AI as a Model for the Brain
199
2019). In other words: Two identical neural networks were trained on image
recognition, one of which also contained recurrent connections. The sub-
sequent comparison of the activation of the networks on test images with
measured brain activity showed that the representations of the network
with the recurrences were more similar to the brain than those of the other
network.
And finally, it was discovered that a deep neural network, which was
trained to predict the next image of a video sequence, falls for the same opti-
cal illusions as a human (Watanabe et al., 2018). This led to the realization
that not only recurrences, but also learning predictions seem to be an essen-
tial mechanism in visual perception. By the way, this approach of self-su-
pervised learning elegantly explains how the brain learns to process visual
stimuli without a teacher telling it what is visible in each image.
Conclusion
The connection between neuroscience and AI offers the opportunity to
expand our understanding of the human brain. By comparing the workings
of artificial and biological neural networks, we could gain deeper insights
into the processes of information processing and decision-making.
Although research in this field is still in its infancy, the examples men-
tioned clearly show the potential benefit of AI as a model for brain function.
Previous approaches mainly focused on image processing. However, with the
advent of transformer-based large language models like ChatGPT, signifi-
cant progress in the exploration of language processing and representation in
the brain is to be expected in the future.
References
Anderson , A., Kiela, D., Binder, J., Fernandino, L., Humphries, C., Conant, L.,
Raizada, R., Grimm, S., & Lalor, E. (2021). Deep artificial neural networks
reveal a distributed cortical network encoding propositional sentence-level mean-
ing. Journal of Neuroscience, JN-RM-1152-20.
Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., …, &
Wayne, G. (2018). Vector-based navigation using grid-like representations in
artificial agents. Nature, 557(7705), 429–433.
Barak, O. (2017). Recurrent neural networks as versatile tools of neuroscience
research. Current Opinion in Neurobiology, 46, 1–6.
Barrett, D. G., Morcos, A. S., & Macke, J. H. (2019). Analyzing biological and
artificial neural networks: Challenges with opportunities for synergy? Current
Opinion in Neurobiology, 55, 55–64.
Caucheteux, C., & King, J. R. (2020). Language processing in brains and deep
neural networks: Computational convergence and its limits. BioRxiv. https://doi.
org/10.1101/2020.07.03.186288.
Cichy, R. M., & Kaiser, D. (2019). Deep neural networks as scientific models.
Trends in Cognitive Sciences, 23(4), 305–317.
Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016).
Comparison of deep neural networks to spatio-temporal cortical dynamics of
human visual object recognition reveals hierarchical correspondence. Scientific
Reports, 6, 27755.
21 AI as a Model for the Brain
201
Watanabe, E., Kitaoka, A., Sakamoto, K., Yasugi, M., & Tanaka, K. (2018).
Illusory motion reproduced by deep neural networks trained for prediction.
Frontiers in Psychology, 9(345).
Yang, G. R., & Wang, X. J. (2020). Artificial neural networks for neuroscientists: A
primer. Neuron, 107(6), 1048–1070.
Yamins, D. L., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to
understand sensory cortex. Nature Neuroscience, 19(3), 356–365.
22
Understanding AI Better with Brain Research
By far the greatest danger of Artificial Intelligence is that people conclude too early that
they understand it.
Eliezer Yudkowsky
Neuroscience 2.0
As we have seen, there are still various challenges to overcome in the field of
Artificial Intelligence, many of which can be traced back to the black box
problem. Deep neural networks are still poorly understood, difficult to inter-
pret, and it is often unclear why a particular error occurs or how they arrive
at their decisions. For AI to be trustworthy, it must be reliable, transparent,
and explainable (Samek et al., 2019).
The European Union has ordered that companies using AI algorithms
that significantly affect the public must provide explanations for the inter-
nal logic of their models. Similarly, the U.S. Defense Advanced Research
Projects Agency (DARPA) is investing $70 million in a program called
“Explainable AI” (explainable AI) with the aim of interpreting the AI’s deci-
sion-making process (Voosen, 2017).
Neuroscience has developed a broad range of methods to analyze natu-
ral neural networks. It therefore makes sense to apply these methods to
their artificial counterparts as well. This endeavor is sometimes referred to
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 203
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_22
204
P. Krauss
Lesions
In brain research, a lesion is understood to be damage to a part of the nerv-
ous system. Lesions can be caused by tumors, traumas, or during surgeries.
In animal experiments, lesions can also be caused in a controlled manner.
For this purpose, certain areas of the brain are damaged or removed, thereby
gaining valuable insights into the functions of the various brain regions. The
study of behavioral or functional changes associated with lesions of a certain
part of the brain represents an important method in neuroscience and has
significantly contributed to the gain in knowledge about the function of the
brain.
In the context of artificial neural networks, a similar approach can be used
to understand the role of the various components. The lesion in this case
can consist, for example, of removing or altering certain neurons, layers, or
connections in the network and then observing the resulting changes in the
output or performance of the neural network. If, for example, a certain neu-
ron or layer is damaged and the network’s performance in recognizing cat
images significantly decreases, one could conclude that the damaged compo-
nent was important for this task.
In a neural network trained for image classification, the lesion of certain
neurons could, for example, reveal which are crucial for identifying certain
features in images—such as edges, shapes, or colors. If, on the other hand,
entire layers are removed from a trained network, this can reveal the overall
significance of this layer for the performance of the network. This method is
particularly revealing in deep learning architectures, where each layer often
corresponds to different levels of abstraction. Alternatively, connections
between neurons can also be removed or altered. This can reveal the impor-
tance of these connections in the transmission and transformation of infor-
mation within the network.
One of the most important findings of this research so far is that individ-
ual neurons in the network often correspond to recognizable and interpretable
1 Even broader is a discipline for which the name Machine Behavior has been proposed. This involves
the interdisciplinary study of the behavior of machines, especially AI systems, and their impact on
social, cultural, economic, and political interactions (Rahwan et al., 2019).
22 Understanding AI Better with Brain Research
205
visual concepts. In other words: certain neurons in the network are specialized
in recognizing certain features in images, such as textures, colors, shapes, or
even more complex objects like trees or buildings. This discovery challenges
the common belief that representations in deep networks are distributed and
difficult to interpret (Bau et al., 2017; Zhou et al., 2018, 2019).
Visualization
Network Visualization
To analyze the inner workings of deep neural networks, various network vis-
ualization techniques have been developed that provide fascinating insights
into the hidden layers of these complex models. It has been shown that indi-
vidual neurons can specialize in recognizing specific features such as faces,
while others respond to more abstract concepts such as “striped patterns”
(Yosinski et al., 2015).
Zeiler and Fergus (2014) developed a method that can trace back the acti-
vations of layers representing certain features to reveal the role of these layers
in the overall classification task. They uncovered the hierarchical nature of
feature extraction in deep neural networks, from simple edge detection in
early layers to complex object recognition in deeper layers.
This approach has shown that individual neurons often correlate with
recognizable visual concepts: they are specialized in recognizing certain fea-
tures in images.
A technique that has been widely used for visualizing data from imaging
methods in neuroscience is multidimensional scaling (MDS). This method
creates an intuitive visualization of high-dimensional data, e.g., measured
brain activity (Krauss et al., 2018a, b). All data points are projected onto
a two-dimensional plane in such a way that all pairwise distances between
points in the high-dimensional space are preserved. Distance is a measure of
the dissimilarity between two points or patterns. In other words: the closer
two points are in the visualization, the more similar are the data they repre-
sent. This method is also excellent for visualizing the activation of individual
layers of a neural network. For example, it has been shown that the separa-
bility of object classes increases with layer depth up to a characteristic layer
depth (number of layers) that depends on the dataset and beyond which the
separability does not increase further. Thus, for a given dataset, the optimal
number of layers in a deep neural network can be determined (Schilling
et al., 2021).
206
P. Krauss
Feature Visualization
Conclusion
The application of neuroscientific methods to the study of Artificial
Intelligence offers an innovative and promising perspective to better under-
stand the functioning of deep neural networks. With the help of visualiza-
tion techniques, lesion experiments, and concepts such as multidimensional
scaling and feature visualization, it has been possible to reveal the role
and specialization of individual neurons and layers within these networks.
Furthermore, methods like layer-wise relevance propagation allow for a
deeper understanding of how individual features and neurons contribute
to a model’s final decision-making process. All these approaches contribute
to opening the black-box nature of Artificial Intelligence and taking a step
towards more transparent and explainable AI systems.
The future of Artificial Intelligence could significantly benefit from these
neuroscientific methods. The ability to better understand the inner work-
ings of AI models could help develop more efficient and reliable systems
while simultaneously strengthening trust in their application. Moreover, the
explainability of AI could help address regulatory challenges and improve
societal acceptance.
References
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W.
(2015). On pixel-wise explanations for non-linear classifier decisions by lay-
er-wise relevance propagation. PLoS ONE, 10(7), e0130140.
Bau, D., Zhou, B., Khosla, A., Oliva, A., & Torralba, A. (2017). Network
dissection: Quantifying interpretability of deep visual representations. In
Proceedings of the IEEE conference on computer vision and pattern recognition
(pp. 6541–6549).
Binder, A., Bach, S., Montavon, G., Müller, K. R., & Samek, W. (2016a). Layer-
wise relevance propagation for deep neural network architectures. In Information
science and applications (ICISA) 2016 (pp. 913–922). Springer Singapore.
Binder, A., Montavon, G., Lapuschkin, S., Müller, K. R., & Samek, W. (2016b).
Layer-wise relevance propagation for neural networks with local renormali-
zation layers. In Artificial Neural Networks and Machine Learning – ICANN
2016: 25th International Conference on Artificial Neural Networks, Barcelona,
Spain, September 6–9, 2016, Proceedings, Part II 25 (pp. 63–71). Springer
International Publishing.
208
P. Krauss
Krauss, P., Metzner, C., Schilling, A., Tziridis, K., Traxdorf, M., Wollbrink, A., …,
& Schulze, H. (2018a). A statistical method for analyzing and comparing spatio-
temporal cortical activation patterns. Scientific Reports, 8(1), 5433.
Krauss, P., Schilling, A., Bauer, J., Tziridis, K., Metzner, C., Schulze, H., &
Traxdorf, M. (2018b). Analysis of multichannel EEG patterns during human
sleep: A novel approach. Frontiers in Human Neuroscience, 12, 121.
Montavon, G., Binder, A., Lapuschkin, S., Samek, W., & Müller, K. R. (2019).
Layer-wise relevance propagation: An overview. In Explainable AI: Interpreting,
explaining and visualizing deep learning (pp. 193–209).
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., &
Mordvintsev, A. (2018). The building blocks of interpretability. Distill, 3(3), e10.
Rahwan, I., Cebrian, M., Obradovich, N., et al. (2019). Machine behaviour.
Nature, 568, 477–486.
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., & Müller, K. R. (Eds.).
(2019). Explainable AI: interpreting,explaining and visualizing deep learning (Vol.
11700). Springer Nature.
Schilling, A., Maier, A., Gerum, R., Metzner, C., & Krauss, P. (2021). Quantifying
the separability of data classes in neural networks. Neural Networks, 139,
278–293.
Voosen, P. (2017). The AI detectives. Science, 357, 22–27.
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding
neural networks through deep visualization. arXiv preprint arXiv:1506.06579.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional
networks. In Computer Vision – ECCV 2014: 13th European Conference, Zurich,
Switzerland, September 6–12, 2014, Proceedings, Part I 13 (pp. 818–833).
Springer International Publishing.
Zhou, B., Bau, D., Oliva, A., & Torralba, A. (2018). Interpreting deep visual rep-
resentations via network dissection. IEEE transactions on pattern analysis and
machine intelligence, 41(9), 2131–2145.
Zhou, B., Bau, D., Oliva, A., & Torralba, A. (2019). Comparing the interpreta-
bility of deep networks via network dissection. In Explainable AI: Interpreting,
explaining and visualizing deep learning (pp. 243–252).
23
The Brain as a Template for AI
I have always been convinced that artificial intelligence can only work if the calculations
are carried out similarly to those in the human brain.
Geoffrey Hinton
Neuroscience-Inspired AI
The human brain already solves many of the tasks that we are trying to solve
in the field of machine learning and AI. Given the fact that the ultimate
goal of AI is to mimic a real existing system (the brain) to which we have
partial access, it seems obvious to consider the design principles of the brain.
Indeed, we are already using such insights in many cases.
The Perceptron, a basic building block for artificial neural networks, is a
perfect example of this. This algorithm, introduced by Rosenblatt in 1958,
was inspired by our understanding of how biological neurons work. A single
perceptron is a simplified model of a biological neuron and shows how com-
puter systems can learn from nature (Rosenblatt, 1958).
The architecture of convolutional networks is also a good example. The
layers of this network architecture mimic the local connection patterns
found in the visual system of mammals (Fukushima, 1980; LeCun et al.,
1998). Just as neurons in the brain have receptive fields that focus on spe-
cific areas of the visual field, the so-called kernels of convolutional networks
are designed to process local regions of their input space, which represents a
direct parallel between neurobiology and machine learning.
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 209
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_23
210
P. Krauss
Noise in Networks
With noise, in physics and information theory, a random signal, i.e.,
random amplitude fluctuation of any physical quantity, is referred to.
Accordingly, one distinguishes, for example, neural, acoustic, or electrical
noise. The “most random” noise is referred to as white noise. Its autocorrela-
tion is zero, i.e., there is no correlation between the amplitude values of the
respective physical quantity at two different points in time.
Traditionally, noise is considered a disturbance signal that should be min-
imized as much as possible. However, in the context of so-called resonance
phenomena, noise plays an important role and can even be useful for neural
information processing.
Stochastic resonance, for example, is a phenomenon widely observed in
nature, which has been demonstrated in numerous physical, chemical, bio-
logical, and especially neural systems. A signal that is too weak for a receiver
to detect can be amplified by adding noise so that it can still be detected.
There is an optimal noise intensity, dependent on the signal, the receiver,
and other parameters, at which information transmission becomes maximal
(Benzi et al., 1981; Wiesenfeld et al., 1994; Gammaitoni et al., 1998; Moss
et al., 2004; Gammaitoni et al., 2009; McDonnell & Abbott, 2009).
In recent years, evidence has been mounting that the brain deliberately
uses the phenomenon of stochastic resonance, for example in the auditory
system, to optimally maintain information processing even under chang-
ing environmental conditions (Krauss et al., 2016, 2017, 2018; Krauss &
Tziridis, 2021; Schilling et al., 2021, 2022a; Schilling & Krauss, 2022).
If we understand these mechanisms and their role in information process-
ing, we can develop AI systems that mimic this flexibility and robustness.
Some theoretical works have shown that artificial neural networks can also
benefit from having noise added as additional input (Krauss et al., 2019;
Metzner & Krauss, 2022).
Ultimately, this led to the realization that the performance of deep neural
networks trained to recognize spoken language can be improved by adding
noise to the linguistic input (Schilling et al., 2022b).
The algorithm links similar smells with similar neural activity patterns
(tags) and allows the fly to generalize learned behavior from one smell to a
previously unknown one. The fly’s algorithm uses computational strategies
such as dimension expansion (as opposed to compression e.g. in autoencod-
ers) and random connections (as opposed to trainable connections), which
significantly deviate from traditional approaches in AI and computer science.
This discovery has already been successfully transferred to artificial neural
networks. For example, it has been shown that neural networks with fixed
binary random connections improve the accuracy in classifying noisy input
data (Yang et al., 2021). Another study even found the completely counter-
intuitive result that unsupervised learning does not necessarily improve per-
formance compared to fixed random projections (Illing et al., 2019).
There is even a hypothesis as to why networks with random connections
work at all or even better than those with trained connections. According to
the lottery ticket hypothesis, randomly initialized artificial neural networks
contain a large number of subnetworks (Winning Tickets), which, when
trained in isolation, achieve a comparable test accuracy to the original, larger
network at a similar number of iterations. These Winning Tickets have, so
to speak, won in the initialization lottery, i.e., their connections have ini-
tial weights that make training particularly effective or even superfluous. A
downstream trainable layer can then select the useful ones from the random
subnetworks and ignore the others (Frankle & Carbin, 2018).
The lottery ticket hypothesis has important implications for the design
and training of deep neural networks, as it suggests that smaller, more effi-
cient networks can be achieved by identifying and training the Winning
Tickets, rather than training the entire network from scratch. This could
lead to more efficient and faster training as well as smaller and more ener-
gy-efficient models, which is particularly important for applications such as
mobile and embedded devices.
Another approach to using random network architectures is reservoir
computing. Here, a randomly generated highly recurrent neural network
(RNN) is used to process input data and generate output predictions,
with the connections within this so-called reservoir not being trained.
Instead, only the connections between the reservoir and the output layer
are learned through a supervised learning process. Since RNNs are complex
dynamic systems that can generate continuous activity even in the absence
of an external input (Krauss et al., 2019), the dynamics of the reservoir are
merely modulated by external input (Metzner and Krauss, 2022). The idea
214
P. Krauss
is now to use the so-called echo state property, where the current state of
the network always contains an “echo” of past inputs, which is helpful for
tasks that require remembering past states (Jaeger, 2001; Maass et al., 2002;
Lukoševičius & Jaeger, 2009).
In addition, the dimensionality of the reservoir, i.e., the number of neu-
rons, is usually significantly larger than that of the input. Thus, reservoir
computing corresponds to a variant of random dimension expansion. It has
already been successfully used in a variety of applications such as music gen-
eration, signal processing, robotics, speech recognition and processing, and
stock price and weather forecasting (Jaeger & Haas, 2004; Tong et al., 2007;
Antonelo et al., 2008; Triefenbach et al., 2010; Boulanger-Lewandowski
et al., 2012; Tanaka et al., 2019).
Conclusion
The study of noise and randomness in neural networks has significantly
expanded our understanding of information processing and learning.
Random noise can act as a signal amplifier and improve the performance
of neural networks, while random connections and architectures inspired by
natural systems like the olfactory system of the fruit fly can be surprisingly
effective. By applying concepts such as the lottery ticket hypothesis and
reservoir computing, future AI systems could be designed to be both more
powerful and more efficient. These insights raise new questions and open up
fascinating possibilities for future research in artificial intelligence.
Many researchers believe that the frontier of neuroscience and AI will be
crucial in developing the next generation of AI, and that integrating both
disciplines offers the most promising opportunities to overcome the current
limits of AI (Zador et al., 2023).
Let’s revisit the analogy of the problem of flying one last time. The point
is not to exactly copy the biological template—birds, bats or insects—but
instead to recognize the underlying mechanisms and principles. Once we
understood the physical basics and principles of flying, we were able to build
flying machines that bear no external resemblance to their biological models,
but can fly higher, further and faster than any bird. Think of rockets, heli-
copters or jet planes.
A hypothetical highly advanced AI, based on insights from brain research,
could be significantly more powerful than humans, while on the other hand
bearing no resemblance to the brain.
23 The Brain as a Template for AI
215
References
Antonelo, E. A., Schrauwen, B., & Stroobandt, D. (2008). Event detection and
localization for small mobile robots using reservoir computing. Neural Networks,
21(6), 862–871.
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise
training of deep networks. NIPS, 19, 153–160.
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies
with gradient descent is difficult. IEEE transactions on neural networks, 5(2),
157–166.
Benzi, R., Sutera, A., & Vulpiani, A. (1981). The mechanism of stochastic reso-
nance. Journal of Physics A: Mathematical and General, 14(11), L453.
Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal
dependencies in high-dimensional sequences: Application to polyphonic music
generation and transcription. arXiv preprint arXiv:1206.6392.
Dasgupta, S., Stevens, C. F., & Navlakha, S. (2017). A neural algorithm for a fun-
damental computing problem. Science, 358(6364), 793–796.
Erhan, D., Courville, A., Bengio, Y., & Vincent, P. (2010, March). Why does unsu-
pervised pre-training help deep learning?. In Proceedings of the 13th international
conference on artificial intelligence and statistics (pp. 201–208). JMLR Workshop
and Conference Proceedings.
Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in
the primate cerebral cortex. Cerebral Cortex, 1(1), 1–47.
Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse,
trainable neural networks. arXiv preprint arXiv:1803.03635.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for
a mechanism of pattern recognition unaffected by shift in position. Biological
Cybernetics, 36(4), 193–202.
Gammaitoni, L., Hänggi, P., Jung, P., & Marchesoni, F. (1998). Stochastic reso-
nance. Reviews of Modern Physics, 70(1), 223.
Gammaitoni, L., Hänggi, P., Jung, P., & Marchesoni, F. (2009). Stochastic reso-
nance: A remarkable idea that changed our perception of noise. The European
Physical Journal B, 69, 1–3.
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Identity mappings in deep resid-
ual networks. In Computer Vision – ECCV 2016: 14th European Conference,
Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14
(pp. 630–645). Springer International Publishing.
He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 770–778).
216
P. Krauss
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for
deep belief nets. Neural Computation, 18(7), 1527–1554.
Illing, B., Gerstner, W., & Brea, J. (2019). Biologically plausible deep learning —
but how far can we go with shallow networks? Neural Networks, 118, 90–101.
Imam, N., Finlay, L., & B. (2020). Self-organization of cortical areas in the devel-
opment and evolution of neocortex. Proceedings of the National Academy of
Sciences, 117(46), 29212–29220.
Jaeger, H. (2001). The “echo state” approach to analysing and training recur-
rent neural networks-with an erratum note. Bonn, Germany: German National
Research Center for Information Technology GMD Technical Report, 148(34), 13.
Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems
and saving energy in wireless communication. Science, 304(5667), 78–80.
Metzner, C., & Krauss, P. (2022). Dynamics and information import in recurrent
neural networks. Frontiers in Computational Neuroscience, 16, 876315.
Krauss, P., & Tziridis, K. (2021). Simulated transient hearing loss improves audi-
tory sensitivity. Scientific Reports, 11(1), 14791.
Krauss, P., Tziridis, K., Metzner, C., Schilling, A., Hoppe, U., & Schulze, H.
(2016). Stochastic resonance controlled upregulation of internal noise after hear-
ing loss as a putative cause of tinnitus-related neuronal hyperactivity. Frontiers in
Neuroscience, 10, 597.
Krauss, P., Metzner, C., Schilling, A., Schütz, C., Tziridis, K., Fabry, B., & Schulze,
H. (2017). Adaptive stochastic resonance for unknown and variable input sig-
nals. Scientific Reports, 7(1), 2450.
Krauss, P., Tziridis, K., Schilling, A., & Schulze, H. (2018). Cross-modal stochas-
tic resonance as a universal principle to enhance sensory processing. Frontiers in
Neuroscience, 12, 578.
Krauss, P., Schuster, M., Dietrich, V., Schilling, A., Schulze, H., & Metzner, C.
(2019). Weight statistics controls dynamics in recurrent neural networks. PLoS
ONE, 14(4), e0214541.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning
applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recur-
rent neural network training. Computer Science Review, 3(3), 127–149.
Maass, W., Natschläger, T., & Markram, H. (2002). Real-time computing without
stable states: A new framework for neural computation based on perturbations.
Neural Computation, 14(11), 2531–2560.
McDonnell, M. D., & Abbott, D. (2009). What is stochastic resonance?
Definitions, misconceptions, debates, and its relevance to biology. PLoS
Computational Biology, 5(5), e1000348.
Miller, D. J., Duka, T., Stimpson, C. D., Schapiro, S. J., Baze, W. B., McArthur, M.
J., …, & Sherwood, C. C. (2012). Prolonged myelination in human neocortical
evolution. Proceedings of the National Academy of Sciences, 109(41), 16480–16485.
23 The Brain as a Template for AI
217
Moss, F., Ward, L. M., & Sannita, W. G. (2004). Stochastic resonance and sen-
sory information processing: A tutorial and review of application. Clinical
Neurophysiology, 115(2), 267–281.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for
biomedical image segmentation. In Medical Image Computing and Computer-
Assisted Intervention – MICCAI 2015: 18th International Conference, Munich,
Germany, October 5–9, 2015, Proceedings, Part III 18 (pp. 234–241). Springer
International Publishing.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information stor-
age and organization in the brain. Psychological Review, 65(6), 386.
Schilling, A., & Krauss, P. (2022). Tinnitus is associated with improved cognitive
performance and speech perception – Can stochastic resonance explain? Frontiers
in Aging Neuroscience, 14, 1073149.
Schilling, A., Tziridis, K., Schulze, H., & Krauss, P. (2021). The Stochastic
Resonance model of auditory perception: A unified explanation of tinnitus
development, Zwicker tone illusion, and residual inhibition. Progress in Brain
Research, 262, 139–157.
Schilling, A., Sedley, W., Gerum, R., Metzner, C., Tziridis, K., Maier, A., …, &
Krauss, P. (2022a). Predictive coding and stochastic resonance: Towards a unified
theory of auditory (phantom) perception. arXiv preprint arXiv:2204.03354.
Schilling, A., Gerum, R., Metzner, C., Maier, A., & Krauss, P. (2022b). Intrinsic
noise improves speech recognition in a computational model of the auditory
pathway. Frontiers in Neuroscience, 16, 795.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … &
Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the
IEEE conference on computer vision and pattern recognition (pp. 1–9).
Tanaka, G., Yamane, T., Héroux, J. B., Nakane, R., Kanazawa, N., Takeda, S., …,
& Hirose, A. (2019). Recent advances in physical reservoir computing: A review.
Neural Networks, 115, 100–123.
Tong, M. H., Bickett, A. D., Christiansen, E. M., & Cottrell, G. W. (2007).
Learning grammatical structure with echo state networks. Neural Networks,
20(3), 424–432.
Triefenbach, F., Jalalvand, A., Schrauwen, B., & Martens, J. P. (2010). Phoneme
recognition with large hierarchical reservoirs. In Advances in neural information
processing systems, J. Lafferty and C. Williams and J. Shawe-Taylor and R. Zemel
and A. Culotta (eds.), 23. Curran Associates, Inc. https://proceedings.neurips.cc/
paper_files/paper/2010/file/2ca65f58e35d9ad45bf7f3ae5cfd08f1-Paper.pdf.
Van Essen, D. C., Anderson, C. H., & Felleman, D. J. (1992). Information pro-
cessing in the primate visual system: An integrated systems perspective. Science,
255(5043), 419–423.
Wiesenfeld, K., Pierson, D., Pantazelou, E., Dames, C., & Moss, F. (1994).
Stochastic resonance on a circle. Physical Review Letters, 72(14), 2125.
218
P. Krauss
Yang, Z., Schilling, A., Maier, A., & Krauss, P. (2021). Neural networks with
fixed binary random projections improve accuracy in classifying noisy data. In
Bildverarbeitung für die Medizin 2021: Proceedings, German Workshop on Medical
Image Computing, Regensburg, March 7–9, 2021 (pp. 211–216). Springer
Fachmedien Wiesbaden.
Zador, A., Escola, S., Richards, B., Ölveczky, B., Bengio, Y., Boahen, K., …, &
Tsao, D. (2023). Catalyzing next-generation Artificial Intelligence through
NeuroAI. Nature Communications, 14(1), 1597.
24
Outlook
Can you?
Sonny
Conscious Machines?
Not only against the backdrop of the astonishing achievements Large
Language Modelslike ChatGPTor GPT-4 has the question arisen whether
these or similar AI systems could someday develop an own consciousness
(Dehaene et al., 2017) or perhaps even already have it, as Google’s for-
mer lead software engineer Blake Lemoine claimed in 2022 and was sub-
sequently dismissed.1 He was firmly convinced that the Chatbot LaMDA
was sentient, and had described LaMDA’s ability to perceive and express
thoughts and feelings as comparable to that of a human child. However,
Google and many leading scientists have rejected Lemoine’s views, stating
that LaMDA is merely a complex algorithm that is very good at generating
human-like language.
The question of whether machines could become conscious implies
another question. Can we measure consciousness at all? This leads us, among
other things, to the Turing Test.
1 https://www.theguardian.com/technology/2022/jul/23/google-fires-software-engineer-who-claims-ai-
chatbot-is-sentient
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer 219
Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6_24
220
P. Krauss
2 https://www.rnd.de/medien/chatgpt-eroeffnet-markus-lanz-sendung-und-lobt-die-gaeste-so-schleimig-
waren-wir-noch-nie-F424UESQGRFBTL6K7JKXO46KQQ.html
24 Outlook
221
to the fact that these models generate their responses and outputs solely
based on statistical patterns and the random selection of one of the next,
most likely words they have learned from the data they were trained on. At
this point, the author may be allowed the somewhat heretical question of
whether humans ever really do anything different. From what we know from
brain research, the majority of our behavior, including verbal utterances, is
controlled automatically and without conscious control. And this is based
on the rules that our brain has extracted from all our perceptions and expe-
riences and on the basis of which the brain tries to predict what will happen
next.
The following short dialogue between the author and ChatGPT may be
evaluated and decided by the reader whether it is intelligence, problem-solv-
ing, or stochastically generated imitation.
Author: “What is 17 times 3756?”
ChatGPT: “17 times 3756 equals 63852.”
Author: “How do you calculate that?”
ChatGPT: “First, I multiplied the first digit of 3756, which is 6, by 17.
That gives 102. I write the second digit of the number 3756 next to the sec-
ond place of 102 to get 510. Then I multiply the third digit of 3756, which
is 5, by 17 and add the result to 510, which gives 85. I write the last digit of
85 at the third place of the result and the first digit of 85 at the fourth place
of the result, which gives 63852.”
Author: “So you break down the multiplication of large numbers into
smaller steps?”
ChatGPT: “Yes, that’s correct. When I have to multiply large numbers, I
break down the task into smaller steps to make it easier and faster to calcu-
late. The method I used is also called ‘written multiplication’ or ‘written pro-
cedure’. The numbers are written one below the other and then multiplied
step by step, adding the intermediate results. This method is a useful way to
multiply large numbers without making mistakes or having to calculate for a
long time in your head.”
There is now consensus that ChatGPT has passed the Turing Test (at least
its simplest, chat-based variant) (Biever, 2023). Nevertheless, it remains
unclear whether the Turing Test actually measures intelligence, the ability to
think, or even consciousness. What is certain is that it measures the abil-
ity to imitate human behavior, whatever one can conclude from that. One
of the most famous counterarguments to the Turing Test is that we cannot
determine consciousness through behavioral observation. Searle’s famous
thought experiment illustrates this very vividly.
222
P. Krauss
Fig. 24.1 The Chinese Room. Searle’s thought experiment makes it clear that a com-
puter that responds in a human-like manner does not necessarily prove that it under-
stands what it is saying, as it merely applies rules and patterns without having to
understand the actual meaning of the words
the AI also receives sensory input from the body and outputs control signals
to the actuators and thus the body, we are already very close to Damasio’s
concepts of the emergence of consciousness, namely body loop, emotions
and interaction with the body and the environment (Man & Damasio,
2019). In fact, machines with proto- or core selves are already feasible in
principle with today’s algorithms and architectures of deep learning (Krauss
& Maier, 2020).
3 https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316
24 Outlook
225
Initial studies have already shown signs that GPT-4 may have developed
a general intelligence and even a Theory of Mind, i.e., the ability to empa-
thize with the intentions and emotions of other people (Bubeck et al., 2023;
Kosinski, 2023).
AI Apocalypse
The question of sentient or conscious machines is of fundamental impor-
tance, especially with regard to ethical and safety-relevant aspects, so as not
to inadvertently create a scenario of the AI apocalypse. This refers to a hypo-
thetical scenario in which artificial intelligence could be the cause of the end
of human civilization (Barrat, 2013). This could occur if AI systems gain
significant power and then act in a way that is harmful to humanity, either
unintentionally or intentionally.
One of the most well-known examples of this concept is Skynet from
the Terminator franchise. In the film series, Skynet is a military AI system
developed to manage defense networks. It becomes self-aware, realizes that
humans are a threat to its existence, and initiates a nuclear war to annihilate
humanity.
In the TV series NEXT, the AI does not start a war, but instead uses its
extensive knowledge and access to information to manipulate situations to
its advantage and cause chaos and destruction. NEXT is an AI developed
to improve itself. It becomes uncontrollable and begins to predict human
behavior to prevent its deactivation. It manipulates electronic systems, data
networks, and even people to survive and reproduce.
The problem of lack of control, which could occur if a powerful AI sys-
tem operates outside of human control or intent, is referred to as the control
problem. While until recently there was consensus that such a scenario was
theoretically possible but not an immediate problem, this changed with the
release of GPT-4 in March 2023 just a few months after ChatGPT, prompt-
ing some of the most influential thinkers in this field like Gary Marcus or
Elon Musk to call for a temporary pause in the development of AI systems
that are even more powerful than GPT-4 in a widely noted open letter.4 The
authors caution that such systems could pose significant risks to society,
4 https://futureoflife.org/open-letter/pause-giant-ai-experiments/
226
P. Krauss
the user and can be interactively changed or even completely created accord-
ing to their instructions. The individual ingredients for this are already avail-
able today. Large language models can generate entire stories and scripts.
Conditioned diffusion models can generate images from the produced
descriptions, which can then be animated and combined into videos. The
written dialogues can be converted into spoken dialogues in any voice using
speech synthesizers (text-to-speech models). And finally, other generative
models can create the appropriate sounds and music.
The author would not be surprised if the major streaming services were to
offer something similar in the not too distant future.
References
Asimov, I. (2004). I, robot (Vol. 1). Spectra.
Barrat, J. (2013). Our final invention: Artificial intelligence and the end of the human
era. Macmillan.
Biever, C. (2023). ChatGPT broke the Turing test-the race is on for new ways to
assess AI. Nature, 619(7971), 686–689.
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., …,
& Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments
with GPT-4. arXiv preprint arXiv:2303.12712.
Dehaene, S., Lau, H., & Kouider, S. (2017). What is consciousness, and could
machines have it? Science,358(6362), 486–492.
Epstein, R., Roberts, G., & Beber, G. (Eds.). (2009). Parsing the turing test
(pp. 978–1). Springer Netherlands.
Hagendorff, T. (2023). Machine psychology: Investigating emergent capabilities
and behavior in large language models using psychological methods. arXiv pre-
print arXiv:2303.13988.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear
Phenomena,42(1–3), 335–346.
Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large
language models. arXiv preprint arXiv:2302.02083.
Krauss, P., & Maier, A. (2020). Will we ever have conscious machines? Frontiers in
Computational Neuroscience, 14, 116.
Man, K., & Damasio, A. (2019). Homeostasis and soft robotics in the design of
feeling machines. Nature Machine Intelligence,1(10), 446–452.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain
Sciences,3(3), 417–424.
Sejnowski, T. J. (2023). Large language models and the reverse turing test. Neural
Computation,35(3), 309–342.
24 Outlook
229
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer-Verlag 231
GmbH, DE, part of Springer Nature 2024
P. Krauss, Artificial Intelligence and Brain Research,
https://doi.org/10.1007/978-3-662-68980-6
232 Glossary
this concept is the military AI system Skynet from the Terminator franchise. It
becomes self-aware, realizes that humans are a threat to its existence, and starts
a nuclear war to annihilate humanity. In the TV series NEXT, the eponymous
AI uses its extensive knowledge and access to information to manipulate situ-
ations to its advantage and cause chaos and destruction. Both stories illustrate
the fears of the lack of control that could occur if a powerful AI system oper-
ates outside of human control or intent. These fears are based on real philo-
sophical and practical considerations about the development of AI. While until
recently there was consensus that such a scenario is theoretically possible but not
an immediate problem, this changed with the release of GPT-4 in March 2023
just a few months after ChatGPT, which prompted some of the most influential
thinkers in this field, such as Gary Marcus or Elon Musk, to call for a temporary
halt in the development of AI systems that are even more powerful than GPT-4
in a widely noticed open letter. See also Asimov’s Laws of Robotics, Control
Problem and Open Letter Controversy.
Alchemy Problem Reference to the stage of chemistry before it established itself as
a natural science with a theoretical superstructure (such as the periodic table
of elements). In alchemy, the synthesis of new substances was characterized by
erratic procedures, anecdotal evidence, and trial and error. The development of
AI is currently in a similar stage. The development and adaptation of AI algo-
rithms is largely based on trial and error. The term “alchemy problem” empha-
sizes the lack of a systematic scientific understanding of how AI models work
and why some models work better than others. See also Black Box Problem and
Reproducibility Crisis.
Algorithm Step-by-step guide for solving a problem or performing a specific task. It
consists of an ordered sequence of instructions that are formulated in such a way
that they can be executed by a machine, a computer, or a human.
AlphaGo An AI system from the company DeepMind based on deep learning,
which was trained on the strategy game Go using historical example matches.
AlphaGo was the first AI system that can play at an advanced human level, and
it defeated the Go grandmaster and then world champion Lee Sedol in 2016.
This is considered a milestone in artificial intelligence.
AlphaGoZero Successor to AlphaGo. Unlike its predecessor, this AI system was not
trained with example games. Instead, the system played countless Go games
against itself, essentially teaching itself how to play. AlphaGoZero significantly
surpassed its predecessor and was able to defeat it in 100 games just as often.
AlphaStar AlphaZero variant, which masters the massive parallel online player strat-
egy game StarCraft at a human level.
AlphaZero Generalization of AlphaGoZero. This system can teach itself any game,
such as chess, shogi (Japanese chess), or checkers.
Artificial Intelligence Intelligent behavior exhibited by machines as opposed to nat-
ural intelligence of animals and humans. Specifically, the simulation of intelli-
gent (human) behavior in machines that are programmed to perform tasks that
Glossary 233
refer to the difference between the desired and actual output. The sum of the
errors for all inputs of the training data set is calculated with the cost function.
Backpropagation Learning fundamentally suffers from the problem of vanish-
ing gradients, even though this is mitigated by modern optimization algorithms.
However, Backpropagation Learning is considered largely biologically implausi-
ble. A special case is Backpropagation Through Time, which is used in recurrent
neural networks (RNNs).
BERT Short for Bidirectional Encoder Representations from Transformers. A large lan-
guage model based on the Transformer architecture.
Big Data Refers to extremely large and complex data sets that cannot be managed
with conventional application software for data processing. The term stands
for the challenges associated with the capture, storage, analysis, search, sharing,
transmission, visualization, and updating of massive amounts of data.
Black-Box-Problem Refers to the fact that AI models—especially deep neural net-
works—are complex systems whose decision-making is not always fully compre-
hensible and whose internal dynamics are poorly understood. See also Alchemy
Problem and Reproducibility Crisis.
BOLD Signal see MRI.
Bottom-up From hierarchically lower to higher processing levels.
Brain-computer analogy A frequently used metaphor in cognitive science to describe
the brain as a kind of computer. Like a computer, the brain is capable of receiv-
ing, storing, processing, and outputting information. However, this analogy does
not mean that the brain is actually a computer, but that it fulfills similar func-
tions. By considering the brain as a computer, one can abstract from biological
details and focus on the way it processes information in order to develop mathe-
matical models for learning, memory, and other cognitive functions.
Brain-Computer Interface (BCI). A device that enables direct communication
between a brain and a computer. Applications include controlling prosthetics
or translating thoughts into written or spoken language, enabling completely
paralyzed (Locked-in) patients to reestablish contact with their environment.
In perspective, brain-computer interfaces could someday also be used to control
vehicles or aircraft, or perhaps even one day for direct communication between
two brains, without having to rely on the “detour” of spoken language.
Brain-in-a-Vat A classic thought experiment in the philosophy of mind that ques-
tions the nature of perception and the relationship between the external world
and our mental states. In the thought experiment, a human brain is removed
from the body by a scientist and kept in a tank filled with nutrient fluid.
Electrodes are attached to the brain to measure and stimulate neuronal activity.
Using computers and electronic impulses, the scientist provides the brain with
a virtual environment that deceives the brain into believing it is in a physical
world with all the sensory information we usually receive through our senses.
The central question of this thought experiment is whether the brain in the vat
would be able to recognize that its experiences are not real and that it is in a
Glossary 235
simulated and not a real physical world. In other words, the thought experi-
ment asks whether it is possible that our sensory experiences are an illusion, and
whether it is possible that our mental reality does not necessarily coincide with
physical reality. The thought experiment has many important implications, espe-
cially in connection with questions about the nature of consciousness and per-
ception and how our mental states interact with the physical world. It has also
been featured in numerous science fiction stories and films such as Matrix or
Source Code. See also Brain-Computer Interface.
Chaos theory A branch of mathematics that deals with the study of chaotic systems,
i.e., systems that are very sensitive to initial conditions. This sensitivity is often
referred to as the butterfly effect. The concept is based on an analogy where
a butterfly flapping its wings in one part of the world can trigger a tornado in
another part of the world. Chaotic systems are deterministic, i.e., their future
behavior is completely determined by their initial conditions, without any ele-
ments of chance involved. However, even tiny changes in the initial state can
lead to very different outcomes, making long-term prediction practically impos-
sible in practice. The chaos theory is applied in various fields such as physics,
engineering, economics, biology, and meteorology. It has profound implica-
tions for the way we understand and predict natural systems. For example, it has
shown us that even seemingly simple systems can behave in complex ways and
that deterministic systems can still be unpredictable. See also Determinism.
Cerebrum The cerebrum is the largest part of the human brain and most higher
mammals. It consists of two halves (hemispheres) that are connected by the cor-
pus callosum. The surface of the cerebrum is highly folded with ridges (gyri)
and grooves (sulci). These structures increase the surface area of the cerebrum,
thereby allowing a higher number of neurons in a limited volume. The outer
layer of the cerebrum is the cortex (cerebral cortex), which is responsible for
many complex mental functions such as perception, language, thinking, mem-
ory, movement, emotions and consciousness.
Cerebral cortex see Cortex.
Classification Process of categorizing or grouping data, objects, or units based on
their common features or attributes. Each data point is assigned to a specific cat-
egory or class based on a set of predefined criteria. Classification is a standard
problem in machine learning and biology. In machine learning, classification
refers to the process of training a model to automatically categorize new data
points based on their similarity to previously learned data points. For example,
in image classification, an algorithm can be trained to recognize and accordingly
tag different objects, animals, or people in an image. In text classification, an
algorithm can be trained to categorize documents, emails, or social media posts
by topics, sentiments, or language.
ChatGPT Also GPT-3.5, short for Chat Generative Pre-trained Transformer. A large
generative language model with approximately 175 billion internal parameters,
developed by the company OpenAI. It is based on the transformer architecture
236 Glossary
of GPT-3 and, after training with an extremely large text corpus, was addition-
ally trained to conduct dialogues. ChatGPT was released on November 30,
2022, and initially made available online for free use as a so-called chatbot.
ChatGPT is capable of remembering the course of a dialogue and referring to
it in later inquiries. The generated texts are of a surprisingly high level and are
generally indistinguishable from texts written by humans. Thus, ChatGPT passes
the Turing test. However, appropriately trained AI systems can recognize with
astonishing accuracy whether a text was written by a bot or a human. In addi-
tion, like all generative models, ChatGPT tends to hallucinate, i.e., to invent
facts freely. ChatGPT is considered a milestone and a decisive breakthrough in
artificial intelligence. It is believed that ChatGPT will have enormous and not
yet fully foreseeable impacts on education, science, journalism, and many other
areas. The successor GPT-4 was released in March 2023 and surpasses the per-
formance of ChatGPT even more significantly. See also GPT-4.
Chinese Room Famous thought experiment by American philosopher John Searle
as a critique of artificial intelligence and as a counter-argument to the Turing
Test. Searle argues that a computer, even if it is capable of seemingly intelli-
gently providing human-like responses to questions, does not truly “understand”
what it is saying. In the thought experiment, one imagines a person sitting in a
room receiving Chinese characters as input, without speaking or understanding
Chinese themselves. With the help of a book containing all the rules and charac-
ters, the person could still formulate seemingly meaningful responses in Chinese.
From the outside, it then appears as if the room or the mechanism in the room
could actually understand and speak Chinese, even though in reality nothing
and no one in the room is capable of doing so. The conclusion is that the com-
puter, even if the result is human-like, does not truly understand the language, as
it only applies rules and patterns without understanding the actual meaning of
the words.
Cognitive Computational Neuroscience A discipline proposed by Nikolaus
Kriegeskorte and Pamela Douglas at the intersection of Computational
Neuroscience, Cognitive Science, and Artificial Intelligence. The basic idea: To
understand how cognition works in the brain, computer models must be created
that are capable of performing cognitive tasks; these models must then be tested
for their biological plausibility in experiments.
Cognitive Linguistics Area of linguistics that deals with the relationship between
language and cognition. It assumes that language is an essential part of human
thinking and that our understanding and production of language are based on
cognitive processes. Unlike other approaches in linguistics that focus on formal
rules and structures, cognitive linguistics investigates how language is processed
and represented in our brain. It also examines how language and cognition inter-
act and how they are influenced by factors such as culture and social interaction.
Cognitive linguistics views language as a complex system of constructions, which
Glossary 237
are formed on the basis of experience and perception and shape our linguistic
abilities and our understanding of language. See also Construction.
Cognitive Science Began in the 1950s as an intellectual movement known as the
cognitive revolution. Today, it is understood as an interdisciplinary scientific
endeavor that attempts to understand the various aspects of cognition. These
include language, perception, memory, attention, logical thinking, intelligence,
behavior, and Emotions. The focus is on the way natural or artificial nervous
systems represent, process, and transform information. The disciplines involved
include linguistics, psychology, philosophy, computer science, artificial intelli-
gence, neuroscience, biology, anthropology, and physics.
Computationalism Philosophical position in cognitive science that assumes cogni-
tion is synonymous with information processing and that mental processes can
be understood as calculations. Computationalism is based on the brain-com-
puter analogy. It is assumed that mental processes such as perception, mem-
ory, and logical thinking involve the manipulation of mental representations
that correspond to the symbols and data structures used in computer programs.
Computationalism has greatly influenced the way cognitive scientists and artifi-
cial intelligence researchers think about mind and intelligence. Many researchers
believe that computer models of the mind can help us understand how the brain
processes information and that they can lead to the development of more intel-
ligent machines. However, computationalism is also controversial and has been
the subject of numerous debates in philosophy and cognitive science. Some crit-
ics argue that the computer model of the mind is too simplistic and cannot fully
capture the complexity and richness of human cognition. Others argue that it
is unclear whether mental processes can really be understood as calculations or
whether they fundamentally differ from the type of processes that occur in com-
puters. See also Functionalism.
Commissures Nerve fibers that connect the two hemispheres of the cortex. Most
commissures run over the beam (Corpus callosum).
Compatibilism see Determinism.
Complete weight matrix see Weight matrix.
Connectome The entirety of all neural connections of a nervous system or the com-
plete weight matrix of an artificial neural network.
Construction In the usage-based view of Cognitive Linguistics, the term refers to
patterns of language use or form-meaning pairs that are composed of various
linguistic elements such as words, phrases, and sentences. Constructions can be
learned through repeated contact and use, and range from simple to complex
structures. They can also encompass multiple levels of language structure and
often include grammatical and lexical elements. The idea of construction learn-
ing emphasizes that children learn language by acquiring constructions that they
observe and repeatedly use in their linguistic environment.
Control problem Problem of lack of control, which could occur if a powerful AI sys-
tem operates outside of human control or intention. Based on real philosophical
238 Glossary
However, CT imaging does not provide information about the functional activ-
ity of the brain, which can be obtained with other imaging techniques such as
PET, fMRI, EEG and MEG.
Dale’s Principle A rule attributed to the English neuroscientist Henry Hallett Dale,
which states that in the brain, a neuron performs the same chemical action at all
its synaptic connections to other cells, regardless of the identity of the target cell.
Simply put, in the brain, each neuron has either only excitatory or only inhibi-
tory effects on all its successor neurons. This is in contrast to the neurons in arti-
ficial neural networks, whose weight vectors can have both negative and positive
entries.
DALL-E 2 Short for Dali Large Language Model Encoder 2. An AI model developed
by OpenAI, specifically designed for generating high-quality photorealistic
images from natural language descriptions. DALL-E 2 uses a combination of
image and text processing to create abstract image representations from linguistic
descriptions of objects, scenes, or concepts. These image representations are then
used by a so-called decoder network to generate photorealistic images. DALL-E
2 is considered a significant advancement in artificial intelligence and has the
potential to be used in many areas.
Damasio’s model of consciousness A model of consciousness proposed by Antonio
Damasio, according to which consciousness arises from the interaction between
three levels of brain processing. The first level is the proto-self, i.e., the basic level
of body sensations and emotions generated by internal processes of the body. The
second level is the core self, a more complex representation of the self. The third
level, finally, is the autobiographical self, a higher level of consciousness that
includes the ability to reflect on one’s own thoughts and experiences. Damasio’s
model suggests that consciousness arises from the dynamic interaction between
these three processing levels. He also proposes that consciousness is closely linked
to the brain’s ability to integrate information across various regions and process-
ing levels. Damasio’s model is influential in neuroscience and has shaped our
understanding of the neural mechanisms underlying consciousness.
Dataset Splitting In the field of Machine Learning, the entire available dataset is
usually randomly split into a training and a test dataset. The idea behind this is
that one wants to test how well the model generalizes, i.e., how well it copes with
previously unseen data.
Deep Belief Networks (DBN) A class of generative probabilistic artificial neural net-
works that are trained in a layer-by-layer supervised manner. Each layer, like a
type of Autoencoder, has the task of encoding its input from the previous layer
as efficiently as possible so that it can be reconstructed. A special feature is that
successive layers are connected symmetrically, i.e., the information can flow in
both directions, bottom-up and top-down. The training algorithm for DBNs
uses a method called contrastive divergence to train the layers individually and
in sequence, starting with the bottom layer. Once a layer has been fully trained,
its weights are no longer changed and its activations are used as training data for
240 Glossary
the next layer. This type of layer-by-layer training was one of the key innovations
that helped revive interest in deep learning. Through layer-by-layer pre-training,
it is possible to train very deep networks that would otherwise be difficult to
train due to the problem of vanishing gradients. If trained DBNs are run back-
wards from any layer, they can also “dream” completely new patterns.
Deep Dreaming A computer-aided process for creating new, unique, and dream-like
images. It is based on artificial neural networks that have been pre-trained for
image recognition. However, in Deep Dreaming, the network is used in reverse
to optimize the input image and bring out certain patterns or features. The
degree of abstraction of the image can be controlled by selecting the network
layer from which the image is “dreamed” back to the input layer, with earlier lay-
ers producing simpler patterns and deeper layers producing more complex fea-
tures. See also Inception Loop.
Deep Fake Artificially generated images or videos that are often indistinguishable
from real ones.
Deep Learning Area of machine learning that refers to artificial neural networks
composed of multiple layers of interconnected neurons. The more layers a neural
network has, the “deeper” it is. Modern architectures can be built from hundreds
of layers. These deep neural networks can process large amounts of data and
learn from it by recognizing complex patterns and establishing abstract relation-
ships. Deep learning has made enormous progress in recent years and is used in a
variety of applications, from speech and image recognition to autonomous vehi-
cles and robotics. It has also contributed to significant breakthroughs in medical
research, natural sciences, and other areas. A major advantage is that meaningful
patterns can be recognized and learned even in complex and unstructured data,
without the need for human expertise.
Deep Neural Network A neural network that is composed of many layers. The more
layers a network contains, the deeper it is. See also Deep Learning.
Dendrite Extensions of the cell surface of neurons. Dendrites are the receiving chan-
nels through which a neuron receives signals from other neurons.
density see dense.
Determinism Philosophical concept that assumes all events or states are the inevita-
ble result of preceding events or causes, and that every event is conditioned by
preceding events and conditions as well as by the laws of nature. Causal deter-
minism assumes that everything that happens is caused by preceding events in
accordance with the laws of nature. If we knew all physical conditions and laws
at a certain point in time, we could theoretically predict everything that will
happen in the future, and undo everything that has happened in the past. This
corresponds to the Laplace’s Demon. The concept of biological determinism
means that an individual’s behavior, beliefs, and desires are determined by his
genetic predisposition. Psychological determinism assumes that human behavior
is caused by underlying psychological laws that may result from our upbring-
ing, our environment, our experience, or our subconscious. Determinism is
Glossary 241
contrasted with Indeterminism, which assumes that not all events are predeter-
mined and that a certain degree of randomness or chance is at play. In contrast,
Compatibilism, reconciles determinism with the concept of free will, assuming
that our actions can be both determined by previous events and free. See also
Chaos Theory and Free Will.
Dense Network property. It means that a high proportion of the theoretically
possible connections exist or that many weights have a value other than zero.
Diffusion model Type of machine learning system that can be used to uncover hid-
den patterns in data. Diffusion models are used for a variety of tasks, including
image generation or the removal of noise in existing images.
Dualism Philosophical viewpoint that assumes the universe consists of two funda-
mentally different substances or principles. This contrasts with monism, which
assumes that everything in the universe can be traced back to a single substance.
In dualism, the two substances or principles are usually considered as mind and
matter. This means that there is a fundamental difference between the physical
world of objects and the mental world of thoughts and consciousness. Dualism
also implies that mind and body interact in some way, although the exact nature
of this interaction is disputed. See also Monism, Mind-body problem.
EEG Electroencephalography, Electroencephalogram. A non-invasive method to
measure the electrical activity of the brain using electrodes placed on the scalp.
The patterns of the brain’s electrical activity differ in their frequency spectrum
and amplitude and are often used to diagnose neurological diseases such as epi-
lepsy, sleep disorders, brain tumors, and head injuries. The EEG can also be used
in research to investigate brain functions and behavior. The EEG is character-
ized by its extremely high temporal resolution, i.e., brain activity can be recorded
with up to 100,000 measurements per second. However, the spatial resolution is
rather poor. In addition, the electrical fields are strongly attenuated by the brain
tissue, so the EEG works best for measuring the activity of the cerebral cortex
(and especially the gyri), as it is very close to the skull bone. Electrical activity
from deeper brain regions, on the other hand, can be measured very poorly with
the EEG. For this, the MEG is better suited.
Encoding The manner of representing a specific piece of information. One of the
goals of brain research is, among other things, to decode the encoding used by
nerve cells, that is, the way information is stored and processed in the brain.
ERF Event-related field. The magnetic analogue to ERPs measured with MEG. See
also MEG and ERP.
ERP Event-related potential. A measure of brain activity that is recorded by placing
electrodes on a person’s scalp (EEG) while they perform a specific task or are
confronted with certain stimuli. ERPs are temporally aligned with the presenta-
tion of the stimulus or event and represent the neural activity associated with
the cognitive or sensory processing of this event. ERPs are typically characterized
by their polarity, latency, and amplitude. Polarity refers to whether the electri-
cal potential recorded at the scalp is positive or negative relative to a reference
242 Glossary
electrode. Latency is the time interval between stimulus presentation and the
occurrence of the peak of the ERP waveform. Amplitude reflects the strength or
size of the electrical potential recorded at the scalp. ERPs are often used in cog-
nitive neuroscience to investigate cognitive processes such as attention, memory,
language, perception, and decision-making. They can provide insight into the
neural mechanisms underlying these processes and can also be used as biomark-
ers for various neurological and psychiatric disorders. See also EEG.
Elman Network A recurrent neural network proposed by Jeffrey Elman. In the sim-
plest case, it is a three-layer network with an input, intermediate, and output
layer, where the intermediate layer is extended by a so-called context layer, which
stores the state of the intermediate layer from the previous time step and then
passes it on to the intermediate layer. As a result, the intermediate layer receives
the new input from the input layer at each time step and additionally its own
activation state from the previous time step. Thus, Elman networks are capable
of processing input sequences and generating output sequences.
Embedding Also Latent Space Embedding. Compressed, more abstract representa-
tion of the input. Usually generated by reading out the activations of a hidden
layer of an autoencoder or classifier.
Explainable AI (XAI). Development of AI systems that can provide clear and under-
standable explanations for their decision-making processes. The goal of explaina-
ble AI is to make it more transparent, interpretable, and trustworthy for humans,
especially when AI systems are used in critical applications such as healthcare,
finance, and national security. In particular, explainable AI aims to address the
black-box problem, which refers to the difficulty of understanding how an AI
system arrives at its decisions or recommendations. Some examples of XAI tech-
niques include visualizing the internal dynamics of neural networks, generating
natural language explanations for decisions, and providing interactive interfaces
for users to explore and understand AI models.
Error-Backpropagation see Backpropagation Learning.
Feedforward network Neural network in which information is propagated only for-
ward from the input layer to the output layer. There are no feedback connections
and usually no horizontal connections.
Few-Shot Learning A machine learning approach that aims to train models that can
quickly adapt to new tasks with a small amount of training data. The conven-
tional approach of supervised learning requires a large amount of labeled data for
each new task, the procurement of which can be time-consuming and expensive.
In few-shot learning, the model is trained on a smaller dataset that contains a
few examples for each class or task, and then tested on a new set of examples.
The idea is to teach the model to learn from a few examples and generalize to
new examples, rather than needing large amounts of data for each task. Humans
are also very good few-shot learners. A child does not need to be shown thou-
sands of pictures of apples to learn the concept of “apple”. Usually a few, often
Glossary 243
even a single example is sufficient. See also One-Shot Learning and Zero-Shot
Learning.
fMRI see MRI.
Fruit fly algorithm A well-known example of how insights from neurobiology can
contribute to the inspiration of new algorithms in computer science and artifi-
cial intelligence. The olfactory system of the fruit fly uses a variant of the locali-
ty-sensitive hashing algorithm to solve the problem of identifying similar smells.
The algorithm associates similar smells with similar neural activity patterns,
allowing the fly to generalize learned behavior from one smell to an unknown
one. The fly’s algorithm uses computational strategies such as dimension expan-
sion and binary, fixed random connections, which deviate from traditional
approaches in AI and computer science.
Functionalism Theoretical perspective in cognitive science, based on the assump-
tion that cognition is a form of information processing that relies on the intake,
storage, processing, and output of information. Accordingly, mental states and
processes are defined by their functions or their relationship to behavior, not by
their physical or biochemical properties. This is associated with the concept of
multiple realizability, according to which the same mental state or process can
in principle be realized by completely different natural (extraterrestrial) or arti-
ficial systems (robots). See also Tri-Level Hypothesis, Computationalism and
Brain-Computer Analogy.
Galactica Large language model of the Meta corporation, formerly known as
Facebook, which was taken off the network after three days due to criticism and
concerns about its reliability and the spread of misinformation. The AI was sup-
posed to support scientists in research and writing by creating scientific articles
on command. However, it was found that Galactica had partly invented con-
tent, but presented it as factual and even mixed real and false information. In the
process, fictitious articles were attributed to real authors and articles on ethically
controversial topics were created. Despite Meta’s chief developer Yann LeCun
insisting that Galactica was still in development, the criticisms led to its removal.
This incident is reminiscent of Microsoft’s chatbot Tay from 2016, which turned
into a racist and homophobic program within 16 hours due to its sensitivity to
user preferences.
Generative Adversarial Network (GAN) A system of two coupled neural networks, a
generator and a discriminator, used to create deceptively real images or videos.
The generator continually produces new candidate images or videos, while the
discriminator simultaneously tries to distinguish real images and videos from
artificially created ones. Over the course of training, both networks iteratively
improve in their respective tasks. The Deep Fakes thus created are often indis-
tinguishable from real images and videos.
Generative Artificial Intelligence see Generative Model.
244 Glossary
methods such as EEG or MEG are not precise enough to accurately locate the
source of epileptic activity.
Image Style Transfer A technique from machine learning that can transfer the paint-
ing style of one image to another image.
Imitation Game see Turing Test.
Inception Loop A method from brain research based on the concept of Deep
Dreaming. Deep neural networks are used to generate optimal sensory stim-
uli that evoke a specific neural activation. To do this, a neural network is first
trained to predict brain activity in response to certain stimuli. The trained model
is then used to generate optimal stimuli that trigger specific activation patterns in
the model. These stimuli can then be shown to living brains again, and the meas-
ured neural activity can be compared with the model’s prediction. The method
of Inception Loops could expand our understanding of the brain and cognition
and theoretically make it possible to create sensory experiences that are indistin-
guishable from reality. This could mean the creation of an ultimate virtual reality,
similar to scenarios in movies like Matrix or Source Code.
Input layer see Layer.
Input vector A vector whose components correspond to the inputs of a neuron or a
layer of upstream neurons or layers.
Integrated Information Theory A theoretical framework in neuroscience that attempts
to explain the nature of consciousness. It was proposed by the neuroscientist
Giulio Tononi in the early 2000s. According to this, consciousness arises from
the integration of information from different parts of the brain. The theory
assumes that consciousness is not a binary all-or-nothing phenomenon, but
occurs on a continuum, with different degrees of integrated information leading
to different degrees of conscious experience.
Known Operator Learning Area of hybrid machine learning, where individual
layers of a neural network are replaced by so-called operators, e.g., a Fourier
transformation.
Label Essentially a label, which as additional information for each data, input,
or training example indicates the affiliation to a category or object class. In an
image dataset, “cat”, “apple”, or “car” could be possible labels. In supervised
learning, the labels correspond to the desired output of the model.
Laplace’s Demon see Determinism.
Labeled Data see Label.
Layer A functional unit of a neural network, which is typically composed of neu-
rons. One distinguishes between input layer, output layer and (usually several)
intermediate layers (hidden layers). However, the concept of “layer” can also be
more broadly defined. For example, in convolutional networks there are so-called
pooling layers, which reduce the size of the predecessor layer by averaging the
activity of several neurons of the previous layer (Average Pooling) or by forward-
ing only the strongest activation (Max Pooling).
Glossary 247
to ten seconds after the start of the actual stimulus and then slowly decreases
again. Compared to MEG and EEG, the MRI has a rather poor temporal resolu-
tion of about one recording per second. However, the spatial resolution is many
times higher and is in the range of about one cubic millimeter.
Multidimensional Scaling (MDS) Method for the intuitive visualization of high-di-
mensional data, e.g., measured brain activity or the internal dynamics of neural
networks. In this process, all data points are projected onto a two-dimensional
plane in such a way that all pairwise distances between points in the high-dimen-
sional space are preserved. Distance is a measure of the dissimilarity between two
points or patterns.
Multiple Realizability See Functionalism.
Neuralink American company, which was founded in 2016, among others, by Elon
Musk. The goal of Neuralink is the development of so-called brain-computer
interfaces.
Neuron In biology, a neuron is a specialized cell that forms the basic unit of the
nervous system. Neurons are responsible for the reception, processing, and trans-
mission of information throughout the body. They communicate with each
other via electrochemical signals, thus enabling complex functions such as sen-
sation, perception, movement, and thinking. In Artificial Intelligence (AI), a
neuron is a computing unit that is modeled after a biological neuron. It is also
referred to as an artificial neuron or node. Neurons are used in artificial neural
networks, which are computational models intended to simulate the behavior of
biological neurons.
Neural Correlates of Consciousness Patterns of neural activity that are associated with
conscious experiences. It is assumed that these patterns form the physical basis
for subjective experiences, such as the experience of seeing a red apple or feel-
ing pain. The study of the neural correlates of consciousness is a central theme
in neuroscience and has significant implications for understanding the nature of
consciousness.
Neural Network In a neural network, a neuron receives input signals from other
neurons or from external sources. It then processes these signals using an acti-
vation function, which determines the output signal of the neuron. The output
signal can then be sent to other neurons or to an output layer of the neural net-
work. By combining many neurons in complex networks, artificial or biological
neural networks can learn to perform tasks such as pattern recognition, classifica-
tion, and prediction.
Neuroplasticity Experience-dependent change in the connection structure and activ-
ity of neuronal networks in the brain.
Neuroscience 2.0 Part of a broader approach to exploring the behavior of intel-
ligent machines to solve the black-box problem on the way to explainable AI.
Application of neuroscience methods and theories to better understand and opti-
mize artificial neural networks. The methods used include, for example, mul-
tidimensional scaling (MDS) for visualizing the internal dynamics of neural
Glossary 251
One-Hot Encoding Type of vector encoding in which for each data point, only one
component or dimension of the vector takes the value 1, while all others take the
value 0.
One-Shot Learning A machine learning or artificial intelligence algorithm that allows
a system to learn from a single example. In biology, one-shot learning can be
observed in animals that are capable of quickly recognizing and responding
to new stimuli or situations without having been previously exposed to them.
For instance, some bird species are able to quickly recognize and avoid danger-
ous prey after a single experience. See also Few-Shot Learning and Zero-Shot
Learning.
Output layer see Layer.
Output vector A vector whose components correspond to the activations or outputs
of a (usually the last) layer of a neural network.
Pattern recognition Process of recognizing patterns in data to make predictions or
decisions. In artificial intelligence, this includes the development of algorithms
and models that learn from data and recognize patterns. Applications include,
among others, image and speech recognition. The brain also uses pattern rec-
ognition for processing and interpreting sensory impressions and has specialized
regions responsible for certain types of pattern recognition. The brain’s ability to
learn patterns and adapt to new patterns is crucial for intelligent, goal-directed
behavior and our ability to interact with the world.
Perceptron A simple, two-layer neural network that consists only of an input and an
output layer. The perceptron is a so-called binary classifier, i.e., a function that
can decide whether a given input vector belongs to a certain class or not. The
perceptron cannot solve classification tasks whose classes are not linearly sepa-
rable, such as the XOR problem. Networks with multiple layers like the Multi-
Layer Perceptron (MLP) are capable of doing this, however.
PET Positron Emission Tomography. A medical imaging technique that can visu-
alize the metabolic activity of cells and tissues in the body. In PET imaging, a
small amount of a radioactive substance, a so-called radiotracer, is injected into
the body. The radiotracer emits positrons, positively charged particles that inter-
act with electrons in the body. When a positron encounters an electron, they
annihilate each other and generate gamma rays, which can be detected with the
PET scanner. The PET scanner captures the gamma rays and creates a three-di-
mensional image of the brain’s metabolic activity. However, since the brain is
constantly active, meaningful PET data can only be obtained by subtracting two
images. Typically, an image is taken during a specific cognitive task or stimu-
lus and another image of the brain’s background activity, and then the difference
image is calculated.
Predictive Coding Predictive Coding. A neuroscience theory that suggests the brain
processes sensory information using a top-down approach. The idea behind
predictive coding is that the brain constantly makes predictions about what it
Glossary 253
expects to see or hear next based on previous experiences, and then uses these
predictions to interpret incoming sensory information.
Prose Style Transfer A technique from machine learning that can transfer the writing
style of one text to another text.
Pruning Removal of unimportant connections in biological or artificial neural net-
works. During development, the brain establishes far more (random) connec-
tions between neurons than are needed, and then removes the superfluous ones.
This process is important for the formation of the brain’s neural networks. In
machine learning, the complexity of an artificial neural network is reduced by
removing unimportant connections that contribute little to the overall perfor-
mance of the network. See also Lottery Ticket Hypothesis.
Qualia Subjective first-person experiences that we have when we perceive or inter-
act with the world. These experiences include sensations such as color, taste, and
sound, but also more complex experiences like emotions and thoughts. Qualia
are often described as ineffable, meaning they cannot be fully captured or con-
veyed by language or other forms of representation. This has led some philoso-
phers to claim that qualia represent a special kind of phenomena that cannot be
reduced to or explained by the physical or objective properties of the world. See
also mind-body problem.
Recurrent neural network (RNN) A neural network in which information does not
flow exclusively forward, i.e., from input to output. Instead, there are additional
feedback or top-down connections as well as horizontal connections. The recur-
rence can be pronounced in different ways, from Long-Short-Term Memories
(LSTMs), where each neuron has its own connection, to Jordan and Elman net-
works with feedback context layers, to fully recurrent networks like Hopfield
networks or in Reservoir Computing. In contrast to pure feedforward networks,
RNNs can only be trained by a trick (Backpropagation Through Time) with
gradient descent methods and error feedback. This is due to the problem of van-
ishing/exploding gradients. However, RNNs can be trained evolutionarily, unsu-
pervised, and self-organized.
Reinforcement Learning Reinforcement Learning (RL). A type of machine learn-
ing where a model or agent is trained to learn useful input-output functions,
i.e., to make a series of decisions in an uncertain environment that maximize
a cumulative reward. Unlike supervised learning, no outputs are given to the
model. Instead, the agent receives feedback in the form of rewards or penal-
ties, and its goal is to learn a strategy (policy) that maps states to actions, leading
to a maximum long-term reward. The agent uses trial and error to learn from
its experiences in the environment, and tries out different actions to find out
which actions lead to the highest rewards, by reinforcing those actions that have
already proven successful. Over time, the agent’s strategy is refined and opti-
mized, so that it can make better decisions and achieve higher rewards. There
are two types of reinforcement learning. In model-based reinforcement learn-
ing, a model of the environment is also learned, which can predict the feedback
254 Glossary
presynaptic neuron is active shortly before the postsynaptic neuron. The syn-
apse is weakened if the presynaptic neuron is active shortly after the postsynaptic
neuron.
Stochastic Resonance A phenomenon widely observed in nature, which has already
been demonstrated in numerous physical, chemical, biological, and especially
neuronal systems. A weak signal, which is too weak for a given detector or sensor
to measure, can nevertheless be made measurable by adding noise. There exists
an optimal noise intensity, dependent on the signal, the sensor, and other param-
eters, at which the information transfer becomes maximal.
Style Transfer A technique from machine learning that can transfer the painting or
writing style of an image or text to another image or text.
Supervised Learning Type of machine learning in which a model learns to extract
useful features or representations from data, and uses these to generate a desired
output. This type of learning requires so-called labeled data (label-data pairs),
for example, in image classification, in addition to each image, a label (tag) with
information about what is seen in the image or to which category the image
belongs. Supervised learning is usually carried out with Backpropagation
Learning.
Synapse A synapse is a connection between two neurons or between a neuron and
a target cell, e.g., a muscle cell or a gland cell. Characteristic of the structure of
the synapse is a small gap, the so-called synaptic gap, which separates the presyn-
aptic neuron, which sends signals, from the postsynaptic neuron or the target
cell, which receives signals. When an electrical signal, a so-called action poten-
tial, reaches the end of the presynaptic neuron, it triggers the release of chemi-
cals, known as neurotransmitters, into the synaptic gap. These neurotransmitters
diffuse through the synaptic gap and then bind to receptors on the postsynaptic
neuron or the target cell, which can either stimulate or inhibit the activity of the
postsynaptic cell. The strength and efficiency of synapses can change over time,
a process known as synaptic plasticity, which is central to learning, memory, and
other cognitive functions. Dysfunctions of synapses are associated with a number
of neurological and psychiatric disorders, including Alzheimer’s, schizophrenia,
and depression.
Synaptic weight see Weight.
Test accuracy Accuracy. A metric used in machine learning to measure the perfor-
mance of a model. It is defined as the ratio between the correctly predicted or
classified objects and the total number of objects in the dataset. For example, if
a model that has been trained to classify images correctly classifies 90 out of 100
images, then the test accuracy of the model is 90%.
Test dataset Part of a dataset that is used to test an already trained neural network
or model. Typically 20% of the total dataset. See also Dataset splitting.
Top-down From hierarchically higher to lower processing levels.
Glossary 257
Training dataset Part of a dataset that is used for training a neural network or other
machine learning model. Typically 80% of the total dataset. See also Dataset
splitting.
Transfer learning A technique of machine learning where a model is first trained
on a large dataset and then refined on a smaller dataset for a specific task (Fine
Tuning). The idea of transfer learning is that the knowledge gained in solving
one problem can be transferred to another, related problem, thereby reducing the
amount of data and time required for training a new model.
Transformer Neural network architecture that is particularly suitable for processing
natural language, e.g., translation and text generation. Unlike recurrent neural
networks (RNN) and convolutional networks (CNN), the Transformer uses a
so-called attention mechanism, which allows the model to selectively focus on
different parts of the input sequence to make predictions. A Transformer con-
sists of an encoder and a decoder, both of which are made up of multiple neu-
ral network and attention layers. The encoder processes the input sequence and
generates internal representations from it, which the decoder uses to generate the
output sequence. Transformers have several advantages over traditional neural
networks, for example, the ability to process input sequences in parallel, handle
sequences of variable length, or capture long-range dependencies in sequences
without suffering from the problem of vanishing/exploding gradients. See also
ChatGPT, GPT-3, GPT-4 and Large Language Model.
Tri-Level Hypothesis Theoretical framework proposed by David Marr in the field of
cognitive science and artificial intelligence. According to this, every natural or
artificial system that performs a cognitive task can be described on three levels
of analysis. The computational level describes the problem to be solved, the
goal of the system (brain or AI), and the constraints imposed by the environ-
ment. It specifies what information needs to be processed, what output needs
to be generated, and why the system needs to solve the problem. The algorith-
mic level describes the rules and procedures, i.e., the algorithm, that the system
must follow to solve the problem specified at the computational level. It specifies
how the input data is transformed into output data and how the system pro-
cesses information. The implementation level describes the physical implemen-
tation of the system, e.g., the hardware and software used to build the AI. It
specifies the details of how the algorithmic level is implemented, including the
data structures used, programming languages, and computational resources. In
neurobiology, the anatomical and physiological details of the nervous system are
described at this level. According to Marr, it is necessary to understand a system
at all three levels in order to fully grasp its behavior and possibly develop more
efficient systems.
Turing Test A method proposed by Alan Turing and originally called the Imitation
Game by him to test a machine’s ability for intelligent behavior. In the simplest
variant, one or more human examiners communicate in natural language text-
based, i.e., in the form of a chat, both with a human (as a control) and with the
258 Glossary
Weight Also Synaptic weight. Strength of the connection between two artificial or
natural neurons. The weight can in principle be any real number, with the mag-
nitude representing the size of the effect on the successor neuron and the sign
representing the quality of the effect (excitation or inhibition).
Weight matrix Matrix whose entries contain the weights between a set of neurons.
The matrix can contain all pairwise weights of all neurons of a neural network,
including all self-connections (autapses). In this case, it is referred to as a com-
plete weight matrix. However, a weight matrix can also only contain the for-
ward-directed weights between the neurons of two successive layers of a network.
The columns or rows of the weight matrix correspond to the input or output
weight vectors of the neurons.
Weight vector Vector whose entries contain all the weights of a neuron. A distinc-
tion is made between input weight vectors, which contain the weights of a neu-
ron with which its input is weighted, and output weight vectors, which contain
the weights to the successor neurons of a neuron.
Word vector Representation of the meaning of a word as a vector. The more dif-
ferent the meaning of two words is, the more different are the corresponding
word vectors. If you interpret the word vectors as points in a semantic space,
the distance between the points corresponds to the similarity or dissimilarity of
the underlying words. The smaller the distance, the more similar the meaning.
Synonyms, i.e., words with the same meaning, are mapped to the same word
vector or point, so they have a distance of zero. Similarly, there are also sentence
vectors, which represent the meaning of an entire sentence.
XOR Problem A classic problem in the field of Artificial Intelligence and Machine
Learning that illustrates the limitations of certain types of models. XOR stands
for eXclusive OR, a binary operation in which the output is true (or 1) only
when the number of true inputs is odd. With two binary inputs, XOR is true
only when exactly one of the inputs is true. The XOR problem refers to the chal-
lenge of correctly classifying these four situations using a linear classifier, e.g., a
single-layer perceptron. The problem is that the XOR function is not linearly
separable, i.e., there is no straight line (in 2D space) that can separate the inputs
that yield a 1 from those that yield a 0. This illustrates the inability of linear clas-
sifiers to handle certain types of patterns. Multi-layer neural networks, however,
can solve the XOR problem by creating non-linear decision boundaries. This is
typically achieved by introducing hidden layers and non-linear activation func-
tions into the network.
Zero-Shot Learning A form of machine learning where a model is trained to rec-
ognize objects or categories it has never seen before. It can classify new input
patterns even when no labeled data for the relevant class were available dur-
ing training. Unlike supervised learning, where a model is trained with a spe-
cific set of labeled data examples, zero-shot learning is based on the transfer of
260 Glossary
knowledge from related or similar classes that were seen during training. This
is achieved by using semantic representations such as word vectors that capture
the meaning and relationships between different classes. For example, if a model
has been trained to recognize images of animals and has never seen an image of
a zebra, it can still classify it as an animal because it has learned the relationships
between different animal species. Zero-shot learning enables more efficient and
flexible training of models and generalization to new and unknown categories.