article1
article1
281
or have sub-symbolic aspects as in Sigma (Rosenbloom, (Rosenbloom, Demski, & Ustun, 2017). When the factors in
Demski, & Ustun, 2016). In Sigma’s case predicates can a factor graph (Kschischang, Frey, & Loeliger, 2001) – such
designate learnable functions. In addition to predicates, as the kind of network Sigma’s processing and knowledge
there are operators in both Sigma and Soar. Operators are grounded in – are fully differentiable, factor graphs
represent actions that the agent can take. Central to the reduce to deep networks and message passing with suitable
processes that act on this knowledge arranged/organized modifications for regularization can yield learning similar to
across various memories is the cognitive cycle – gradient descent with backprop. Sigma has also shown
eponymously named after the human cognitive cycle. The learning of fixed word embeddings using random
cognitive cycle represents about 50ms of human mental projections.
activity and involves four phases: integrating new The rest of this work assumes an architecture that is
perception, elaborating current state, selecting the next similar in spirit to Soar and Sigma, with a tri-level
action to take – by selection of an operator – to be finally processing that supports PSCM and a cognitive cycle that is
followed by effecting any changes in the working memory grounded in a form of graphical models that is similar to
(learning) as well as those requiring any output via the motor Sigma and supports chunking like Soar.
system. In Sigma operator selection is aided by numerical
metadata – in the form of utilities and in Soar operator
selection is aided by numeric preferences that can be learnt Large Language Models
(as a case of reinforcement learning). The cognitive cycle Large Language Models (LLMs) have gained tremendous
implements a parallel to serial bottleneck where parallel popularity over the last few years due to their ability to be
processing – of multiple rule firings (Soar) or message versatile problem solvers in across several natural language
passing (Sigma) – is followed by deliberate selection of an (NL) tasks and even beyond NL domain. LLMs are trained
operator analogous to the human cognitive cycle. on a very large dataset and require significant investment of
Knowledge is organized according to the Problem Space resources to train. LLMs are made of layers of stacked
Computation Model (PSCM) in Soar and Sigma. The PSCM transformer models (Vaswani, et al., 2017) which operate on
is defined as a goal, associated state in the working memory the concept of ‘attention’ i.e. each word in the input
and available operators – all relevant to a particular problem sequence determines how much influence or ‘attention’ to
domain. Processing in this setting can be divided into: pay to the other words in the input sequence. Attention
• reactive: the ‘mindless’ aspects of cognition that involves calculating three intermediate quantities – the
represent the activity within a cognitive cycle, query, key, and value vectors – for each word in the input
• deliberative: the ‘mindful’ aspects of cognition that sequence. Each word is ‘embedded’ in a low dimensional
represent a series or sequence of decision cycles, and, space and then input to the transformer stack. At the top of
• reflective: the reflective or meta-aspects of cognition the transformer stack are classifier ‘heads’ that generate a
where the agent examines its own state and makes distribution over the predicted next word. To begin with, the
modification to its internal state. input vocabulary is transformed into sub-word units called
This tri-level processing model is supported by the ‘tokens’ using some method such as byte pair encoding
decision cycle via the impasse mechanism, where, upon (BPE) (Shibata, et al., 1999). This helps the model handle
failure to select an operator in the operator selection phase, out of vocabulary words so that any word the model may
an impasse results bringing to bear more knowledge by encounter in the future can be tokenized. These tokens are
creating a sub-goal to the current goal and a sub-state to the embedded. Training then involves using gradient descent to
current state. predict a (set of) target word(s) – such as the next word(s),
Learning occurs at multiple levels. Procedural learning or a masked word(s) – in the context of the last several
entails learning rules that prevent future impasses by tokens in the input sequence. Training then learns the model
creating a knowledge fragment with preconditions that led parameters along with embeddings for the input tokens.
to the impasse and predicate change – as the action part of In the context of previously discussed concepts of
the rule – that resolved the impasse. Soar supports chunking different memories in the cognitive architecture, we can
but Sigma does not yet support chunking. Sigma’s cognitive think of the LLM transformer stack to be a form of
cycle is based in graphical models – modified factor graphs declarative memory clubbed with a procedural ‘classifier
– and the elaboration phase involves a form of message head.’ The lower layers in the transformer stack learn lexical
passing. Learning in this context involves updating the features, the middle layers learn syntactic features and the
factors with posterior after message passing. Sigma has top layers near the head learn context sensitive semantic
demonstrated learning of acoustic models, language models, features of the target token – in the context of the input
various forms of deep learning such as feedforward tokens – while the classifier head can be understood as
multilayer perceptrons, recurrent neural networks predicate rules that act to choose the next word (‘choose’
282
here means generating a distribution over the target token). memory in a Sigma/Soar like impasse driven architecture
The predicted tokens can be sampled using a sampling where the architecture can prompt the LLM with a task
technique and then the sampled tokens can be converted to specific prompt to extract knowledge from the LLM coupled
words using a decoding strategy such as top-k decoder. with task specific operators with learnable continuous
Once an LLM is trained on a large dataset, there are several embeddings that are inserted in the LLM prompt based on
ways to use it in a downstream specialized task: the agent’s goals, knowledge of the task, contents of the
• finetune (Peters, et al., 2018): The LLM is optionally working memory – that include the current situation – and
“frozen” and a new classifier head is trained tailored to the current operators that are proposed.
the new task specific dataset, and, The cognitive cycle supports the ability to learn this
• prompting (Reynolds & McDonell, 2021): when the LLM continuous representation by using an algorithm similar to
is provided with an input prompt, it generates an output Sigma’s message passing algorithm. An impasse can be
sequence of tokens that when converted to words appears triggered by proposing an operator to impasse which will
very coherent and meaningful. There are several forms of then create a substate with the subgoal to bring knowledge
prompting methods, including analogical prompting, from LLM to bear on current situation. The task specific
templatized prompting etc. operator embeddings in each such substate can be initialized
Prompting is very popular because the model is frozen after from parent state’s corresponding embeddings in
initial training and not subsequently updated. Prompting conjunction with lexical embeddings that will be used to
templates were initially hand tuned but some work has prompt the model in subsequent step. Prompting the LLM
attempted to search good task specific prompts (Shin, involves inserting these task specific learnable soft tokens
Razeghi, Logan IV, Wallace, & Singh, 2020). Here a more as described in previous section. Multiple prompts can be
relevant approach is that of soft prompt training (Lester, Al- generated based on the goal and various ways in which the
Rfou, & Constant, 2021), (Liu, et al.) where task specific soft tokens can be embedded with prompt text.
‘soft tokens’ – i.e., tokens that were not in the original token The prompts themselves can be stored as task specific or
space of the model when it was trained – are inserted in the general knowledge and several prompt templates can be
prompt of the model. The soft tokens are encoded using an selected to be used in parallel i.e., in a reactive manner in
LSTM and then inserted with the other tokens in the prompt. the elaboration phase of the model. Several relevant
Then the soft tokens are learnt in a supervised fashion on a templates have been identified in (Wray, Kirk, & Laird,
per task basis. Once training results in soft token 2021) in the context of Soar’s interactive task learning
embeddings, the LSTM is no longer needed, and the soft problem formulation. Knowledge obtained from the LLM
token embeddings can be used in the same fashion by can be problematic due to several reasons – LLMs
inserting them in the task prompts using the same template. hallucinate (McKenna, et al., 2023), and their output is not
The advantage of this method is two-fold: the LLM is frozen always reliable (Wray, Kirk, & Laird, 2021). Once multiple
and does not need to be updated, the task specific tokens can responses are retrieved in parallel, one response can be
be saved, and new tokens initialized and trained for a new selected by combining knowledge from working memory –
task. This results in two benefits: firstly the amount of that elaborates the current situation – and curated knowledge
training data required is lower, and secondly, the number of from curated long-term memories such as episodic memory,
parameters trained is far lower than what would be required semantic memory etc. In Sigma, this can potentially be
if the LLM was being fine-tuned with no loss in implemented as a simple classifier that scores the responses.
performance for very large LLM sizes (Lester, Al-Rfou, & During the selection phase of the cognitive cycle learning
Constant, 2021). updates the continuous soft token embeddings and when the
impasse resolves, the learnt soft tokens embeddings
represent a description of the knowledge that was required
LLM Usage in Cognitive Architecture
from the LLM to resolve the impasse as a function of the
There are multiple ways in which LLMs can be used in current state of the agent and its goals for every task
cognitive architectures – as a model of the world, as a operator. Having a labeled dataset which can propagate a
reasoning agent that can select actions when prompted with signal back from the LLM to the soft token embeddings will
the current state of the agent etc. Here one potential method help. However, it is important to note the soft token
of LLM integration is proposed. While the LLM itself may embeddings are learnt not just from the signal from the LLM
not be trained in a cognitively plausible fashion, the but from the classifier that scores the responses and
integration of the LLM with cognitive architecture is potentially accounts for operator utilities derived from task
attempted in a cognitively plausible manner. specific knowledge. If the output of the LLM is not usable
To begin with, it is assumed here that the LLM itself is because the agent does not have actions available – either
not updated as this is prohibitively costly. The core vision because it does not know how to perform the suggested
here is that an LLM can be used as a prompt-able declarative
283
action, or its current state indicates the suggested action is itself – did not improve the agent’s task completion
impossible – another impasse can be created to resolve this. performance in the task that was evaluated. In the scheme
When a new action operator is created by the agent proposed here, action selection is done by the cognitive
because it does not know how to perform an action architecture with the aid of embeddings, utilities on
suggested by the LLM, the associated soft token embedding operators, and the contents of the working memory i.e., the
with it will be learnt by further querying the LLM to break LLM is used to elicit knowledge in a reactive manner only
the complex operator action into a set of simplified actions and captured in the task specific operator embeddings which
until an action or set of actions are found that the agent can are subsequently used reactively (generator selection score
perform. The operator specific embedding that was learnt based on utilities on operators as well as the embedding) as
for all action operators can be used to determine semantic well as deliberatively (operator selection). It is unclear how
closeness of actions and the agent can try to substitute such the STARS phases work in the context of Soar’s tri-level
actions. Unlike semantic embeddings for words that indicate control. Furthermore, it is unclear whether the beam search
semantic similarity in the embedded space, these operator involved in the STARS is cognitively plausible and to what
embeddings will be a function not just of the words but the extent Soar’s cognitive cycle supports handling probabilistic
current state of the agent as well. When action operators are processing.
available to perform the action, the embeddings can aid – in The idea to use embeddings to aid search is not new,
conjunction with other numerical metadata such as utilities neither is the idea to use LLM’s to aid in planning (learning
– in planning required to generate a policy over them. When a policy over actions), or reflection (impasse processing).
the impasse that led to querying the LLM is resolved, the top What is new here is the integration of LLM in a cognitive
state shall have potentially created (action) operators or setting with soft tokens on task and state specific operators
updates to predicates with associated embeddings and these that can be used to prompt and extract knowledge from the
can be used to update the predicate/operator embeddings in LLM. In the p-tuning work where soft tokens were
the top state. These embeddings can be subsequently used introduced, they experimented with a few prompt insertion
as task and state specific embeddings and brought to bear templates. A cognitive architecture can improve upon this
either to prevent future impasse in a potentially different manual search of finding suitable prompt insertion template
situation. by bringing to bear its mechanisms and knowledge from
The work closest in approach to the proposed scheme is other memories – such as episodic or semantic – to guide
(Kirk, Lindes, & Peter, 2023) where the authors propose and this search potentially improving upon the back and forth
evaluate a Soar based framework (STARS) to query and use required with the LLM i.e., the data required to learn the soft
knowledge from an LLM in the ITL task set. STARS stands token embeddings.
for Search Tree, Analyze, Repair, and Select corresponding To evaluate the proposal presented here, the ITL domain
to the phases of the framework where LLM is prompted with seems the most natural. Sigma is the most natural candidate
hierarchical tree templatized prompts and a beam search is to consider for evaluating. Sigma’s decision cycle is both
performed to narrow the responses to the most probable set mixed (symbolic+subsymbolic, including capable of neural
of responses. These are analyzed and evaluated for their processing) and hybrid (discrete+continuous) as required by
usefulness in the current situation and the best one is proposal. Sigma’s cognitive cycle does not yet support
selected by querying the LLM again. The key difference embeddings on operators and this is an extension that will
here is that the whole scheme is working with discrete token have to be added.
prompts derived from discrete words. As discussed
previously and based on results from (Lester, Al-Rfou, &
Constant, 2021), more task specific data is needed when Conclusion and Future Work
working in the discrete prompt domain without soft tokens. A method to augment cognitive architectures with
In this context, this would mean the ST, A, and R phases generative LLM memory was proposed. The integration
have more work to do. The hierarchical tree-based templates assumes a cognitive cycle that is capable of simultaneously
are simulating structural properties of the task domain as processing symbolic and sub-symbolic information.
encoded in the English language. The interspersing of soft Various aspects of the integration have been independently
tokens with state prompt using templates in the scheme demonstrated in Sigma or Soar but some extensions to
proposed in this paper corresponds to this ST phase. Sigma will have to be made to support the proposal.
Analysis will take place similar to what is proposed in
STARS but aided by availability of embeddings. Finally,
repair will take place via an impasse mechanism in this
proposal. Selection of the next action is left to the cognitive
architecture in this proposal. In STARS evaluation, the ‘S’
selection phase – where the action is selected by the LLM
284
Acknowledgments Reynolds, L., & McDonell, K. 2021. rompt programming for large
language models: Beyond the few-shot paradigm. . CHI
The authors would like to thank Taesang Yoo and June Conference on Human Factors in Computing Systems Extended
Abstracts of the 2021 , pp 1-7.
Namgoong of Wireless R&D, Qualcomm Research for
several fruitful discussions on LLMs and cognitive Rosenbloom. 2012a. Deconstructing reinforcement learning in
Sigma. Proceedings of the 5th Conference on Artificial General
architectures. Intelligence, (pp. 262-271).
Rosenbloom. 2013. The Sigma cognitive architecture and system.
AISB Quarterly, 136, 4-13.
References
Rosenbloom, P. S., Demski, A., & Ustun, V. 2016. The Sigma
Dosovitskiy, A., Beyer, L., Kolsenikov, A., Weissenborn, D., Zhai, cognitive architecture and system: Towards functionally elegant
X., Unterthiner, T., & Houlsby, N. 2020. An image is worth 16x16 grand unification. Journal of Artificial General Intelligence, 7, 1-
words: Transformers for image recognition at scale. arXiv preprint, 103.
arXiv:2010.11929. Rosenbloom, P. S., Demski, A., & Ustun, V. 2016. The Sigma
Joshi, H., Rosenbloom, P. S., & Ustun, V. 2014. Isolated word cognitive architecture and system: Towards functionally elegant
recognition in the Sigma cognitive architecture. Biologically grand unification. Journal of Artificial General Intelligence, 7, 1-
Inspired Cognitive Architectures(10), 1-9. 103.
Kirk, J., 2019. Learning Hierarchical Compositional Task Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A.,
Definitions through Online Situated Interactive Language Shinohara, T., & Arikawa, S., 1999. Byte Pair encoding: A text
Instruction. PhD dissertation, Department of Computer Science, compression scheme that accelerates pattern matching.
University of Michigan, Ann Arbor, MI. https://www.researchgate.net/profile/Takeshi-
Kirk, J. R., Lindes, P., & Wray, R. 2023. Improving Knowledge Shinohara/publication/2310624_Byte_Pair_Encoding_A_Text_C
Extraction from LLMs for Robotic Task Learning through Agent ompression_Scheme_That_Accelerates_Pattern_Matching/links/0
Analysis . arXiv preprint, arXiv:2306.06770 [cs.AI]. Ithaca, NY: 2e7e522f8ea00c318000000/Byte-Pair-Encoding-A-Text-
Cornell University Library. Compression-Scheme-That-Accelerates-Pattern-Matching.pdf.
Accessed: 2023-08-01.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. 2001. Factor
graphs and the sum-product algorithm. IEEE Transactions on Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S.
Information Theory, 47, 498-519. 2020. Autoprompt: Eliciting knowledge from language models
with automatically generated prompts. arXiv preprint,
Laird. 2012. The Soar Cognitive Architecture. Cambridge, MA:
arXiv:2010.15980 [cs.CL]. Ithaca, NY: Cornell University
MIT Press.
Library.
Langley, P., Laird, J. E., & Rogers, S. 2009. Langley, Pat, John E.
Wray, R. E., Kirk, J. R., & Laird, J. E. 2021. Language Models as
Laird, and Seth Rogers. Cognitive Systems Research, 10(2), 141-
a Knowledge Source for Cognitive Agents. arXiv preprint,
160.
arXiv:2109.08270 [cs.AI]. Ithaca, NY: Cornell University Library.
Lester, B., Al-Rfou, R., & Constant, N. 2021. The power of scale
for parameter-efficient prompt tuning. arXiv preprint,
arXiv:2104.08691 [cs.CL]. Ithaca, NY: Cornell University
Library.
McKenna, N., Li, T., Cheng, L., Hosseini, M. J., Johnson, M., &
Steedman, M. 2023. Sources of Hallucination by Large Language
Models on Inference Tasks. arXiv preprint, arXiv:2305.14552
[cs.CL]. Ithaca, NY: Cornell University Library.
Newell. 1973. You can't play 20 questions with nature and win:
Projective comments on the papers of this symposium. (W. G.
Chase, Ed.) Visual Information processing.
Newell. 1978. Harpy, production systems and human cognition.
Perception and production of fluent speech, 299-380.
Newell. 1990. Unified theories of cognition. Cambridge, MA:
Harvard University Press.
Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., &
Bernstein, M. S. 2023. Generative agents: Interactive simulacra of
human behavior. arXiv preprint arXiv:2304.03442 [cs.HC]. Ithaca,
NY: Cornell University Library.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C.,
Lee, K., & Zettlemoyer, L. 2018. Deep contextualized word
representations. CoRR abs/1802.05365.
Raschka, S. 2023 Understanding Large Language Models -- A
Transformative Reading List.
https://sebastianraschka.com/blog/2023/llm-reading-list.html.
Accessed: 2023-08-01.
285