Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
Artificial Intelligence
Methods
Lecture 10b
Large Language Model
Learning Outcomes
IDENTIFY
1. Types of Knowledge
2. Representation Methods
3. Case-Based Reasoning
4. Decision Making, Decision Support
5. LLMs in Knowledge
Representation and Reasoning
ACK: Prof. Ender Özcan, UNUK, Prof. Tomas Maul & Dr Chen ZhiYuan, UNM
IDENTIFY
1. Introduction to LLM
2. Transformer Architecture
3. Training process for LLMs
4. Fine Tuning
5. Model Evaluation
6. Capabilities and Roles of LLM in
Knowledge and Reasoning
ACK: Prof. Ender Özcan, UNUK, Prof. Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 5
Large Language Model (LLM)
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 6
The Evolution of Large Language Models
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 9
Introduction to LLMs
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 11
Probability-based language models
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 13
Probability-based language models
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 14
Probability-based language models
Probability
sampling
Large language model
Language
Large language model architecture
model
relationships
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 16
Transformer
Architecture
Language Model and Transformer
for example:
Transformer
• Translation model
**********
(Embedding)
Encoder Decoder
Attention
mechanism
學生都快睡著Input 了, The studentsOutput
were almost falling
TheThe
students
The
students
The
students
were
students
Thewere
almost
were
almost
falling
因為他們聽老師講課很無聊 asleep…
Simon Lau Boung Yew
asleep 47
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 19
Encoder and Decoder
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 21
After Transformer was proposed in 2017, both Google and OpenAI have leveraged certain
part of Transformer to develop BERT and GPT models leading to significant achievements in
the NLP field.
**********
(Embedding)
BERT GPT
Encoder Decoder
pre-trained model Text generation
for various NLP model
tasks
Unidirectional: Predicts the
Bidirectional: Looks at next word based only on
both past and future past words. 48
context to understand
meaning.
Understanding the Transformer Structure
BERT is great for understanding text (NLP tasks like search and classification).
GPT is powerful for generating text (chatbots, creative writing).
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 23
Encoder
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 24
Embedding
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 25
Word Embedding
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 26
Input Embedding
Embedding dimension is the length of the vector
used to represent each token
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 27
Word / Token Embedding and Word2vec
ACK: Prof. Ender Özcan, UNUK, Dr Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 28
Embedding
1 apple
2 green
3 blue
4 cat
5 fish
6 sky
7 grass
8 glass
⋮
Embeddings in LLMs are numerical representations of text that
35
capture semantic meaning in a high-dimensional vector space.
Similar concept to latitude and longitude
• Washington DC is at [ 38.9, 77 ]
• New York is at [ 40.7, 74 ]
• London is at [ 51.5, 0.1 ]
• Paris is at [ 48.9, -2.4 ]
• Taipei is at [ 25, 121.6 ]
Distance、Position ...
36
Word2vec - Shallow neural network models used to
learn word embeddings
I 0.1 0.4 0.9 0.7
NLP NLP
now now
37
Predict the target word from Predict surrounding context
surrounding context words words from the target word
Word2vec
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 32
Embedding Property
• Calculate similarity
• The similarity between "man" and "woman" is higher than the similarity
between "man" and "apple."
38
Transformer Architecture
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 35
Positional Encoding
- "What is the order of the words?"
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 36
Positional Encoding
Positional Encoding:
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 37
Positional Encoding
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 38
Example: Positional Encoding
Each word gets a unique pattern of numbers that helps the Transformer understand its position.
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 39
Self-Attention
- "Who should I pay attention to?"
▪ In a sentence, not all words are equally important to each other. Self-attention
lets the model look at all the words in the sentence at once, and decide
which ones matter most for understanding a specific word.
▪ The “self” part of self-attention refers to the "egocentric" focus of each token in a
corpus. Effectively, on behalf of each token of input, self-attention asks, "How
much does every other token of input matter to me?“
▪ "The cat sat on the mat.“ → To understand "sat" in context, the model might pay
attention to:
▪ "cat" (to know who sat)
▪ maybe "mat" (to know where it sat)
▪ The self-attention mechanism computes how much attention "sat" should give to
each word (including itself) — and forms a weighted sum of all word vectors.
Simon Lau Boung Yew
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 40
Self-Attention
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 41
Single Head Attention
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 45
Multi-Head Attention
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 46
Multi-Head Attention
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 47
Multi-Head Attention
Multi-head attention:
head function:
concatenation of h heads:
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 48
Multi-Head Attention
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 49
Simon Lau Boung Yew
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 50
Feed Forward Layer
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 51
Decoder
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 53
What is the training
process for LLMs
Deep Learning Training Process
Input Output
Image Image
Audio Audio
Numerical
Text Text
Simon Lau Boung Yew … …
22
Deep Learning Training Process
Input Output
Tokenization
Text Text
Embedding
Simon Lau Boung Yew
23
Vector Database
ACK: Prof. Ender Özcan, UNUK, Dr Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 59
ACK: Prof. Ender Özcan, UNUK, Dr Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 60
Vector Database
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 61
Why is Vector Database Necessary?
Database
Question:
What is the artificial intelligence?
difficult to
search and compare
Embedding
Consideration:
● Storage format
● Query operations
● Performance requirements
What is Vector Database?
Vector
Database
Question:
What is the artificial intelligence?
[0.1, 0.3, -0.2, …]
Embedding [0.9, -0.1, -0.4, …]
Embedding
፧
[0.8, 0.7, -0.6, …]
T enT
Enterprise-specific
LLM
Common Vector Databases
▪ Pinecone
▪ FAISS (Facebook AI Similarity Search)
▪ Chroma
▪ Milvus
▪ Redis
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 65
How to train/fine-tune
your own LLM
Pretrained LLMs
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 67
Pre-trained LLM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 68
Learning of LLMs
Pre-training Instruction
Alignment Prompting
model Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 70
Language Model Size
Model size
Dataset
Computing resources
Time
Source: https://epochai.org/blog/tracking-large-scale-ai-models
Pretraining
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 72
How to Train a Deep Learning Model
Forward
Prediction Evaluation
loss
Backward
Fine Tuning
What is Transfer Learning?
• Transfer existing knowledge to a new domain without the need to relearn!
Knowledge transfer
Transfer learning (TL) is a machine learning (ML) technique that uses a model
pre-trained on one task to improve its performance on a related task.
This technique is used to retrain existing models with new data instead of training
a new model from scratch. 61
Fine Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 82
Fine Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 84
ACK: Prof. Ender Özcan, UNUK, Dr Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 85
Fine Tuning
• Full Fine-Tuning
• Fine-tuning with limited resources
• In-context learning: Instruction Tuning (Hard Prompt, strictly not
fine tuning)
• Parameter-Efficient Fine-Tuning (PEFT)
• Distilled Training
• Data Efficient Training
• Alternative: Retrieval Augmented Generation (RAG)
5. Model evaluation
4. Model Training 6. Iterate until the goal is achieved
Response
Query
Pretrained LLM Fine-tuned LLM Users
3. Model Building
Specific-domain dataset
1. Dataset preparation
2. Data preprocess
59
Soft vs Hard Prompts
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 89
In-context Learning: Hard Prompt
• LLMs have the ability to quickly learn and apply new concepts
or skills based on the provided context, without requiring explicit
fine-tuning or retraining.
5. Model evaluation
Pretrained LLM 4. Model Training 6. Iterate until the goal is achieved
Response
Query
Users
Specific-domain dataset
1. Dataset preparation
3. Model building 2. Data preprocess
62
PEFT
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 98
PEFT Techniques
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 99
Soft Prompting
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 101
Soft Prompting
▪ Prompt Tuning
▪ Prefix Tuning
▪ P-Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 102
ACK: Prof. Ender Özcan, UNUK, Dr Tomas Maul & Dr Chen ZhiYuan, UNM
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 103
LoRA (Low-Rank Adaptation)
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 104
LoRA (Low-Rank Adaptation)
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 105
Quantization-Aware Fine-Tuning - QLoRA
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 106
What is Quantization?
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 107
Distilled Training
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 108
Knowledge Distillation Process
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 109
Knowledge Distillation Process
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 110
Knowledge Distillation Process
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 112
Alternative to fine-tuning:
Retrieval Augmented Generation (RAG)
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 114
How RAG Differs from Fine-Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 115
When to Use RAG Instead of Fine-Tuning
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 116
Model Evaluation
Evaluation: Different Way to Compute Metric Scores
Source:
https://www.confident-ai.com/blog/llm-evaluation-metrics-e
verything-you-need-for-llm-evaluation
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 118
How to evaluate
https://huggingface.co/docs/evaluate/index
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 119
High-level categories of metrics
12
1
Key Metrics in the Leaderboard
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 122
Key Metrics in the Leaderboard
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 123
Capabilities and Roles
of LLM in Knowledge
and Reasoning
Instruction following
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 126
Chain-of-Thought
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 129
Role of LLMs in Knowledge Representation
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 130
Reasoning with Knowledge
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 131
Reasoning with Knowledge
COMP 2024 / COMP 2039 Components of Heuristics Search Methods and Hill Climbing 132
Applications
▪ Knowledge Augmentation
• LLMs can integrate external knowledge bases during inference, combining
their implicit knowledge with structured information for better accuracy and
interpretability.
▪ Question Answering (QA) Systems
• Applications like ChatGPT demonstrate how LLMs perform reasoning to
answer queries, synthesize knowledge, or even offer multi-hop reasoning
over connected facts.
▪ Automated Reasoning
• Assisting in formal logic tasks like theorem proving, solving puzzles, or
verifying logical consistency in complex systems.
▪ Semantic Search and Retrieval
• Supporting search engines and recommendation systems through context-
Simon Lau Boung Yew
aware retrieval
COMP 2024 / COMP 2039
and ranking.
Components of Heuristics Search Methods and Hill Climbing 133
Limitations of LLMs
• The generation are based on the next token prediction, not from solid
facts nor from logical inference LLMs Knowledge cutoff date Provider
• Its knowledge is up to the date of training data Google Gemini Pro April 2023 Google
• In-context learning
• Instruction following
• Step-by-step reasoning (chain-of-thought)
1. Types of Knowledge
2. Representation Methods
3. Case-Based Reasoning
4. Decision Making, Decision Support
5. LLMs in Knowledge
Representation and Reasoning
ACK: Prof. Ender Özcan, UNUK, Prof. Tomas Maul & Dr Chen ZhiYuan, UNM
Modelling and
Simulation