article1

Uploaded by

denizcakir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

article1

Uploaded by

denizcakir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

AAAI Fall Symposium Series (FSS-23)

Augmenting Cognitive Architectures with Large Language Models

Himanshu Joshi1, Volkan Ustun2
1
Wireless R&D Qualcomm Research, Qualcomm Technologies Inc
2
University of Southern California, Institute for Creative Technologies
[email protected], [email protected]

Abstract with cognitive architectures is discussed in the context on a

A particular fusion of generative models and cognitive particular kind of task: interactive task learning (Kirk J.,
architectures is discussed with the help of the Soar and Sigma 2019). This is then analyzed with a discussion on the
cognitive architectures. After a brief introduction to cognitive potential merits and issues this approach yields. Finally,
architecture concepts and Large Language Models as conclusion and future work are discussed.
exemplar generative AI models, one approach towards their
fusion is discussed. This is then analyzed with a summary of
potential benefits and extensions needed to existing cognitive
architectures that is closest to the proposal. Cognitive Architectures
Cognitive architectures model fixed structures and
Introduction processes underlying human intelligence. Cognitive
architectures have a rich history and are a result of Newell’s
Generative AI models such as Large Language Models strategy (Newell, 1973) combined with early work on
(LLM), Vision Transformer (ViT) (Dosovitskiy, et al., human problem solving. The key premise is that the problem
2020), etc. models have captured much interest in the recent of general human intelligence is difficult because humans
years. Several proposals have been made towards leveraging act in a variety of diverse environments and that this is made
strengths of generative AI fused with other models such as possible because of the ability of the human mind to
LSTMs (Lester, Al-Rfou, & Constant, 2021), cognitive combine a variety of knowledge sources dynamically on
architectures (Wray, Kirk, & Laird, 2021), or planning and demand (Newell, 1990). The focus then – as a research
reflection with LLMs (Park, et al., 2023) to list a few. The strategy – is to model the common cognitive architecture
approach of fusing planning and reflection with an LLM that enables the combination of large knowledge sources.
yielded good results for the tasks in simulated environment. Cognitive architectures can be characterized in terms of
This has fueled more interest in the question of whether the structures – memories – that hold the agent’s beliefs,
LLMs can be leveraged in cognitive architectures in a goals, knowledge etc., the representations for these and the
principled fashion with the aim of yielding an agent that can processes that act on them (Langley, Laird, & Rogers,
utilize the strengths of each approach in different situations 2009). Here, we briefly discuss relevant aspects of cognitive
yielding an integrated agent that is greater than the sum of architectures from the perspective of Soar and Sigma like
its parts. This work proposes one possible fusion between architectures. Very briefly, memories can be either: long
LLMs and cognitive architectures. The attempt here is to term memory, which is further categorized as declarative
propose a creative fusion that connects ideas from both (knowledge capturing assertions of the agent), procedural
disciplines to yield an approach that yields a result that is (encoding knowledge about how a certain task is
potentially better than when either approach – cognitive performed), episodic (encoding knowledge about
architectures or LLMs – is used in isolation. The rest of the history/episodes), and semantic (encoding facts the agent
paper is organized as follows: first cognitive architectures believes), and short-term working memory, which holds
are discussed summarizing key aspects relevant in this knowledge relevant to the agent in its current situation.
context, this is followed by a discussion on generative AI Knowledge is typically represented in terms of predicates
with a focus on LLMs. LLMs are presented in cognitive and patterns over predicates. Predicates and predicate
architecture terminology. Subsequently a fusion of LLMs patterns can be purely symbolic – as in Soar (Laird, 2012) –

281
or have sub-symbolic aspects as in Sigma (Rosenbloom, (Rosenbloom, Demski, & Ustun, 2017). When the factors in
Demski, & Ustun, 2016). In Sigma’s case predicates can a factor graph (Kschischang, Frey, & Loeliger, 2001) – such
designate learnable functions. In addition to predicates, as the kind of network Sigma’s processing and knowledge
there are operators in both Sigma and Soar. Operators are grounded in – are fully differentiable, factor graphs
represent actions that the agent can take. Central to the reduce to deep networks and message passing with suitable
processes that act on this knowledge arranged/organized modifications for regularization can yield learning similar to
across various memories is the cognitive cycle – gradient descent with backprop. Sigma has also shown
eponymously named after the human cognitive cycle. The learning of fixed word embeddings using random
cognitive cycle represents about 50ms of human mental projections.
activity and involves four phases: integrating new The rest of this work assumes an architecture that is
perception, elaborating current state, selecting the next similar in spirit to Soar and Sigma, with a tri-level
action to take – by selection of an operator – to be finally processing that supports PSCM and a cognitive cycle that is
followed by effecting any changes in the working memory grounded in a form of graphical models that is similar to
(learning) as well as those requiring any output via the motor Sigma and supports chunking like Soar.
system. In Sigma operator selection is aided by numerical
metadata – in the form of utilities and in Soar operator
selection is aided by numeric preferences that can be learnt Large Language Models
(as a case of reinforcement learning). The cognitive cycle Large Language Models (LLMs) have gained tremendous
implements a parallel to serial bottleneck where parallel popularity over the last few years due to their ability to be
processing – of multiple rule firings (Soar) or message versatile problem solvers in across several natural language
passing (Sigma) – is followed by deliberate selection of an (NL) tasks and even beyond NL domain. LLMs are trained
operator analogous to the human cognitive cycle. on a very large dataset and require significant investment of
Knowledge is organized according to the Problem Space resources to train. LLMs are made of layers of stacked
Computation Model (PSCM) in Soar and Sigma. The PSCM transformer models (Vaswani, et al., 2017) which operate on
is defined as a goal, associated state in the working memory the concept of ‘attention’ i.e. each word in the input
and available operators – all relevant to a particular problem sequence determines how much influence or ‘attention’ to
domain. Processing in this setting can be divided into: pay to the other words in the input sequence. Attention
• reactive: the ‘mindless’ aspects of cognition that involves calculating three intermediate quantities – the
represent the activity within a cognitive cycle, query, key, and value vectors – for each word in the input
• deliberative: the ‘mindful’ aspects of cognition that sequence. Each word is ‘embedded’ in a low dimensional
represent a series or sequence of decision cycles, and, space and then input to the transformer stack. At the top of
• reflective: the reflective or meta-aspects of cognition the transformer stack are classifier ‘heads’ that generate a
where the agent examines its own state and makes distribution over the predicted next word. To begin with, the
modification to its internal state. input vocabulary is transformed into sub-word units called
This tri-level processing model is supported by the ‘tokens’ using some method such as byte pair encoding
decision cycle via the impasse mechanism, where, upon (BPE) (Shibata, et al., 1999). This helps the model handle
failure to select an operator in the operator selection phase, out of vocabulary words so that any word the model may
an impasse results bringing to bear more knowledge by encounter in the future can be tokenized. These tokens are
creating a sub-goal to the current goal and a sub-state to the embedded. Training then involves using gradient descent to
current state. predict a (set of) target word(s) – such as the next word(s),
Learning occurs at multiple levels. Procedural learning or a masked word(s) – in the context of the last several
entails learning rules that prevent future impasses by tokens in the input sequence. Training then learns the model
creating a knowledge fragment with preconditions that led parameters along with embeddings for the input tokens.
to the impasse and predicate change – as the action part of In the context of previously discussed concepts of
the rule – that resolved the impasse. Soar supports chunking different memories in the cognitive architecture, we can
but Sigma does not yet support chunking. Sigma’s cognitive think of the LLM transformer stack to be a form of
cycle is based in graphical models – modified factor graphs declarative memory clubbed with a procedural ‘classifier
– and the elaboration phase involves a form of message head.’ The lower layers in the transformer stack learn lexical
passing. Learning in this context involves updating the features, the middle layers learn syntactic features and the
factors with posterior after message passing. Sigma has top layers near the head learn context sensitive semantic
demonstrated learning of acoustic models, language models, features of the target token – in the context of the input
various forms of deep learning such as feedforward tokens – while the classifier head can be understood as
multilayer perceptrons, recurrent neural networks predicate rules that act to choose the next word (‘choose’

282
here means generating a distribution over the target token). memory in a Sigma/Soar like impasse driven architecture
The predicted tokens can be sampled using a sampling where the architecture can prompt the LLM with a task
technique and then the sampled tokens can be converted to specific prompt to extract knowledge from the LLM coupled
words using a decoding strategy such as top-k decoder. with task specific operators with learnable continuous
Once an LLM is trained on a large dataset, there are several embeddings that are inserted in the LLM prompt based on
ways to use it in a downstream specialized task: the agent’s goals, knowledge of the task, contents of the
• finetune (Peters, et al., 2018): The LLM is optionally working memory – that include the current situation – and
“frozen” and a new classifier head is trained tailored to the current operators that are proposed.
the new task specific dataset, and, The cognitive cycle supports the ability to learn this
• prompting (Reynolds & McDonell, 2021): when the LLM continuous representation by using an algorithm similar to
is provided with an input prompt, it generates an output Sigma’s message passing algorithm. An impasse can be
sequence of tokens that when converted to words appears triggered by proposing an operator to impasse which will
very coherent and meaningful. There are several forms of then create a substate with the subgoal to bring knowledge
prompting methods, including analogical prompting, from LLM to bear on current situation. The task specific
templatized prompting etc. operator embeddings in each such substate can be initialized
Prompting is very popular because the model is frozen after from parent state’s corresponding embeddings in
initial training and not subsequently updated. Prompting conjunction with lexical embeddings that will be used to
templates were initially hand tuned but some work has prompt the model in subsequent step. Prompting the LLM
attempted to search good task specific prompts (Shin, involves inserting these task specific learnable soft tokens
Razeghi, Logan IV, Wallace, & Singh, 2020). Here a more as described in previous section. Multiple prompts can be
relevant approach is that of soft prompt training (Lester, Al- generated based on the goal and various ways in which the
Rfou, & Constant, 2021), (Liu, et al.) where task specific soft tokens can be embedded with prompt text.
‘soft tokens’ – i.e., tokens that were not in the original token The prompts themselves can be stored as task specific or
space of the model when it was trained – are inserted in the general knowledge and several prompt templates can be
prompt of the model. The soft tokens are encoded using an selected to be used in parallel i.e., in a reactive manner in
LSTM and then inserted with the other tokens in the prompt. the elaboration phase of the model. Several relevant
Then the soft tokens are learnt in a supervised fashion on a templates have been identified in (Wray, Kirk, & Laird,
per task basis. Once training results in soft token 2021) in the context of Soar’s interactive task learning
embeddings, the LSTM is no longer needed, and the soft problem formulation. Knowledge obtained from the LLM
token embeddings can be used in the same fashion by can be problematic due to several reasons – LLMs
inserting them in the task prompts using the same template. hallucinate (McKenna, et al., 2023), and their output is not
The advantage of this method is two-fold: the LLM is frozen always reliable (Wray, Kirk, & Laird, 2021). Once multiple
and does not need to be updated, the task specific tokens can responses are retrieved in parallel, one response can be
be saved, and new tokens initialized and trained for a new selected by combining knowledge from working memory –
task. This results in two benefits: firstly the amount of that elaborates the current situation – and curated knowledge
training data required is lower, and secondly, the number of from curated long-term memories such as episodic memory,
parameters trained is far lower than what would be required semantic memory etc. In Sigma, this can potentially be
if the LLM was being fine-tuned with no loss in implemented as a simple classifier that scores the responses.
performance for very large LLM sizes (Lester, Al-Rfou, & During the selection phase of the cognitive cycle learning
Constant, 2021). updates the continuous soft token embeddings and when the
impasse resolves, the learnt soft tokens embeddings
represent a description of the knowledge that was required
LLM Usage in Cognitive Architecture
from the LLM to resolve the impasse as a function of the
There are multiple ways in which LLMs can be used in current state of the agent and its goals for every task
cognitive architectures – as a model of the world, as a operator. Having a labeled dataset which can propagate a
reasoning agent that can select actions when prompted with signal back from the LLM to the soft token embeddings will
the current state of the agent etc. Here one potential method help. However, it is important to note the soft token
of LLM integration is proposed. While the LLM itself may embeddings are learnt not just from the signal from the LLM
not be trained in a cognitively plausible fashion, the but from the classifier that scores the responses and
integration of the LLM with cognitive architecture is potentially accounts for operator utilities derived from task
attempted in a cognitively plausible manner. specific knowledge. If the output of the LLM is not usable
To begin with, it is assumed here that the LLM itself is because the agent does not have actions available – either
not updated as this is prohibitively costly. The core vision because it does not know how to perform the suggested
here is that an LLM can be used as a prompt-able declarative

283
action, or its current state indicates the suggested action is itself – did not improve the agent’s task completion
impossible – another impasse can be created to resolve this. performance in the task that was evaluated. In the scheme
When a new action operator is created by the agent proposed here, action selection is done by the cognitive
because it does not know how to perform an action architecture with the aid of embeddings, utilities on
suggested by the LLM, the associated soft token embedding operators, and the contents of the working memory i.e., the
with it will be learnt by further querying the LLM to break LLM is used to elicit knowledge in a reactive manner only
the complex operator action into a set of simplified actions and captured in the task specific operator embeddings which
until an action or set of actions are found that the agent can are subsequently used reactively (generator selection score
perform. The operator specific embedding that was learnt based on utilities on operators as well as the embedding) as
for all action operators can be used to determine semantic well as deliberatively (operator selection). It is unclear how
closeness of actions and the agent can try to substitute such the STARS phases work in the context of Soar’s tri-level
actions. Unlike semantic embeddings for words that indicate control. Furthermore, it is unclear whether the beam search
semantic similarity in the embedded space, these operator involved in the STARS is cognitively plausible and to what
embeddings will be a function not just of the words but the extent Soar’s cognitive cycle supports handling probabilistic
current state of the agent as well. When action operators are processing.
available to perform the action, the embeddings can aid – in The idea to use embeddings to aid search is not new,
conjunction with other numerical metadata such as utilities neither is the idea to use LLM’s to aid in planning (learning
– in planning required to generate a policy over them. When a policy over actions), or reflection (impasse processing).
the impasse that led to querying the LLM is resolved, the top What is new here is the integration of LLM in a cognitive
state shall have potentially created (action) operators or setting with soft tokens on task and state specific operators
updates to predicates with associated embeddings and these that can be used to prompt and extract knowledge from the
can be used to update the predicate/operator embeddings in LLM. In the p-tuning work where soft tokens were
the top state. These embeddings can be subsequently used introduced, they experimented with a few prompt insertion
as task and state specific embeddings and brought to bear templates. A cognitive architecture can improve upon this
either to prevent future impasse in a potentially different manual search of finding suitable prompt insertion template
situation. by bringing to bear its mechanisms and knowledge from
The work closest in approach to the proposed scheme is other memories – such as episodic or semantic – to guide
(Kirk, Lindes, & Peter, 2023) where the authors propose and this search potentially improving upon the back and forth
evaluate a Soar based framework (STARS) to query and use required with the LLM i.e., the data required to learn the soft
knowledge from an LLM in the ITL task set. STARS stands token embeddings.
for Search Tree, Analyze, Repair, and Select corresponding To evaluate the proposal presented here, the ITL domain
to the phases of the framework where LLM is prompted with seems the most natural. Sigma is the most natural candidate
hierarchical tree templatized prompts and a beam search is to consider for evaluating. Sigma’s decision cycle is both
performed to narrow the responses to the most probable set mixed (symbolic+subsymbolic, including capable of neural
of responses. These are analyzed and evaluated for their processing) and hybrid (discrete+continuous) as required by
usefulness in the current situation and the best one is proposal. Sigma’s cognitive cycle does not yet support
selected by querying the LLM again. The key difference embeddings on operators and this is an extension that will
here is that the whole scheme is working with discrete token have to be added.
prompts derived from discrete words. As discussed
previously and based on results from (Lester, Al-Rfou, &
Constant, 2021), more task specific data is needed when Conclusion and Future Work
working in the discrete prompt domain without soft tokens. A method to augment cognitive architectures with
In this context, this would mean the ST, A, and R phases generative LLM memory was proposed. The integration
have more work to do. The hierarchical tree-based templates assumes a cognitive cycle that is capable of simultaneously
are simulating structural properties of the task domain as processing symbolic and sub-symbolic information.
encoded in the English language. The interspersing of soft Various aspects of the integration have been independently
tokens with state prompt using templates in the scheme demonstrated in Sigma or Soar but some extensions to
proposed in this paper corresponds to this ST phase. Sigma will have to be made to support the proposal.
Analysis will take place similar to what is proposed in
STARS but aided by availability of embeddings. Finally,
repair will take place via an impasse mechanism in this
proposal. Selection of the next action is left to the cognitive
architecture in this proposal. In STARS evaluation, the ‘S’
selection phase – where the action is selected by the LLM

284
Acknowledgments Reynolds, L., & McDonell, K. 2021. rompt programming for large
language models: Beyond the few-shot paradigm. . CHI
The authors would like to thank Taesang Yoo and June Conference on Human Factors in Computing Systems Extended
Abstracts of the 2021 , pp 1-7.
Namgoong of Wireless R&D, Qualcomm Research for
several fruitful discussions on LLMs and cognitive Rosenbloom. 2012a. Deconstructing reinforcement learning in
Sigma. Proceedings of the 5th Conference on Artificial General
architectures. Intelligence, (pp. 262-271).
Rosenbloom. 2013. The Sigma cognitive architecture and system.
AISB Quarterly, 136, 4-13.
References
Rosenbloom, P. S., Demski, A., & Ustun, V. 2016. The Sigma
Dosovitskiy, A., Beyer, L., Kolsenikov, A., Weissenborn, D., Zhai, cognitive architecture and system: Towards functionally elegant
X., Unterthiner, T., & Houlsby, N. 2020. An image is worth 16x16 grand unification. Journal of Artificial General Intelligence, 7, 1-
words: Transformers for image recognition at scale. arXiv preprint, 103.
arXiv:2010.11929. Rosenbloom, P. S., Demski, A., & Ustun, V. 2016. The Sigma
Joshi, H., Rosenbloom, P. S., & Ustun, V. 2014. Isolated word cognitive architecture and system: Towards functionally elegant
recognition in the Sigma cognitive architecture. Biologically grand unification. Journal of Artificial General Intelligence, 7, 1-
Inspired Cognitive Architectures(10), 1-9. 103.
Kirk, J., 2019. Learning Hierarchical Compositional Task Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A.,
Definitions through Online Situated Interactive Language Shinohara, T., & Arikawa, S., 1999. Byte Pair encoding: A text
Instruction. PhD dissertation, Department of Computer Science, compression scheme that accelerates pattern matching.
University of Michigan, Ann Arbor, MI. https://www.researchgate.net/profile/Takeshi-
Kirk, J. R., Lindes, P., & Wray, R. 2023. Improving Knowledge Shinohara/publication/2310624_Byte_Pair_Encoding_A_Text_C
Extraction from LLMs for Robotic Task Learning through Agent ompression_Scheme_That_Accelerates_Pattern_Matching/links/0
Analysis . arXiv preprint, arXiv:2306.06770 [cs.AI]. Ithaca, NY: 2e7e522f8ea00c318000000/Byte-Pair-Encoding-A-Text-
Cornell University Library. Compression-Scheme-That-Accelerates-Pattern-Matching.pdf.
Accessed: 2023-08-01.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. 2001. Factor
graphs and the sum-product algorithm. IEEE Transactions on Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S.
Information Theory, 47, 498-519. 2020. Autoprompt: Eliciting knowledge from language models
with automatically generated prompts. arXiv preprint,
Laird. 2012. The Soar Cognitive Architecture. Cambridge, MA:
arXiv:2010.15980 [cs.CL]. Ithaca, NY: Cornell University
MIT Press.
Library.
Langley, P., Laird, J. E., & Rogers, S. 2009. Langley, Pat, John E.
Wray, R. E., Kirk, J. R., & Laird, J. E. 2021. Language Models as
Laird, and Seth Rogers. Cognitive Systems Research, 10(2), 141-
a Knowledge Source for Cognitive Agents. arXiv preprint,
160.
arXiv:2109.08270 [cs.AI]. Ithaca, NY: Cornell University Library.
Lester, B., Al-Rfou, R., & Constant, N. 2021. The power of scale
for parameter-efficient prompt tuning. arXiv preprint,
arXiv:2104.08691 [cs.CL]. Ithaca, NY: Cornell University
Library.
McKenna, N., Li, T., Cheng, L., Hosseini, M. J., Johnson, M., &
Steedman, M. 2023. Sources of Hallucination by Large Language
Models on Inference Tasks. arXiv preprint, arXiv:2305.14552
[cs.CL]. Ithaca, NY: Cornell University Library.
Newell. 1973. You can't play 20 questions with nature and win:
Projective comments on the papers of this symposium. (W. G.
Chase, Ed.) Visual Information processing.
Newell. 1978. Harpy, production systems and human cognition.
Perception and production of fluent speech, 299-380.
Newell. 1990. Unified theories of cognition. Cambridge, MA:
Harvard University Press.
Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., &
Bernstein, M. S. 2023. Generative agents: Interactive simulacra of
human behavior. arXiv preprint arXiv:2304.03442 [cs.HC]. Ithaca,
NY: Cornell University Library.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C.,
Lee, K., & Zettlemoyer, L. 2018. Deep contextualized word
representations. CoRR abs/1802.05365.
Raschka, S. 2023 Understanding Large Language Models -- A
Transformative Reading List.
https://sebastianraschka.com/blog/2023/llm-reading-list.html.
Accessed: 2023-08-01.