100% found this document useful (2 votes)
247 views

Introduction To LLMS: Transformers Types of Llms Configuration Settings

The document discusses types of large language models including encoder-only, decoder-only, and encoder-decoder models. It also covers how these models are pretrained using masked or causal language modeling objectives and common use cases like text generation, classification, and question answering.

Uploaded by

ashish tewari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
247 views

Introduction To LLMS: Transformers Types of Llms Configuration Settings

The document discusses types of large language models including encoder-only, decoder-only, and encoder-decoder models. It also covers how these models are pretrained using masked or causal language modeling objectives and common use cases like text generation, classification, and question answering.

Uploaded by

ashish tewari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

TRANSFORMERS TYPES OF LLMS CONFIGURATION SETTINGS

Introduction to LLMs – Can scale efficiently to use multi-core GPUs


– Can process input data in parallel
Encoder only = Autoencoding model
Ex: BERT, RoBERTa
Parameters to set at inference time

Max new tokens Maximum number of tokens


– Pay attention to all other words These are not generative models. generated during completion
when processing a word
DEFINITIONS  
 Decoding strategy
Transformers’ strength lies in understanding
Generative AI AI systems that can produce the context and relevance of all words 1 Greedy Decoding The word/token with the

realistic content (text, image, etc.) in a sentence highest probability is selected from the final
To predict tokens masked
PRE-TRAINING OBJECTIVE probability distribution (prone to repetition)
        in a sentence (= Masked Language Modeling)
     
  
 
    OUTPUT Encoded representation of the text  
 
 USE CASE(S) Sentence classification (e.g., NER)   


  
Large Language Models (LLMs)  
Large neural networks trained at internet scale Decoder only = Autoregressive model
 
 
to estimate the probability of sequences
of words Ex: GPT, BLOOM  
 
Ex: GPT, FLAN-T5, LLaMA, PaLM, BLOOM  
  2 Random Sampling The model chooses
(transformers with billions of parameters)
an output word at random using the probability
Abilities (and computing resources needed)     
distribution to weigh the selection (could be
tend to rise with the number of parameters  
To predict the next token
PRE-TRAINING OBJECTIVE too creative)
USE CASES     based on the previous sequence of tokens TECHNIQUES TO CONTROL RANDOM SAMPLING
– Standard NLP tasks   (= Causal Language Modeling)
– Top K The next token is drawn from
(classification, summarization, etc.)  OUTPUT Next token the k tokens with the highest probabilities
– Content generation USE CASES Text generation – Top P The next token is drawn from
– Reasoning (Q&A, planning, coding, etc.) Token Word or sub-word the tokens with the highest probabilities,
The basic unit processed by transformers Encoder-Decoder = Seq-to-seq model whose combined probabilities exceed p
 
Encoder Processes input sequence Ex: T5, BART
 
  to generate a vector representation (or  
 
embedding) for each token       


Decoder Processes input tokens to produce  
 
new tokens   
In-context learning Specifying the task  
to perform directly in the prompt Embedding layer Maps each token   
to a trainable vector
   
         Temperature Influence the shape of
   Positional encoding vector the probability distribution through a scaling
      
Added to the token embedding vector PRE-TRAINING OBJECTIVE Vary from model to model
     factor in the softmax layer
  to keep track of the token’s position (e.g., Span corruption like T5)
     
OUTPUT Sentinel token + predicted tokens  
  
Self-Attention Computes the importance  

  
of each word in the input sequence to all USE CASES Translation, Q&A, summarization
     
       other words in the sequence

© 2024 Dataiku
TASKSPECIFIC FINETUNING MULTI-TASK FINE-TUNING MODEL EVALUATION

LLM Instruction Task-specific fine-tuning involves training a pre-trained


model on a particular task or domain using a dataset
Multi-task fine-tuning diversifies training with examples
for multiple tasks, guiding the model to perform
Fine-Tuning & Evaluation tailored for that purpose. various tasks. Evaluating LLMs Is Challenging

(e.g., various tasks, non-deterministic outputs, equally


valid answers with different wordings).
INSTRUCTION FINETUNING Task-specific dataset
e.g., translation
Multi-task training dataset
Need for automated and organized performance
Analyze the sentiment
Translate the text: Pre-trained
Identify entities Instruct LLM assessments
In-Context Learning Limitations: Pre-trained Source text (English) Instruct LLM LLM
LLM Source completion (French) Summarize the text
Various approaches exist, but there are a few examples:
• May be insufficient for very specific tasks. Translate the text:
Source text (English)
• Examples take up space in the context window. Often, good results can be achieved with just a Source completion
few hundred or thousand examples. (French) ROUGE & BLEU SCORE
Instruction Fine-Tuning • Purpose: To evaluate LLMs on narrow tasks
Many examples of each task needed for training
(summarization, translation) when a reference
The LLM is trained to estimate the next token probability is available
Fine-tuning can significantly increase the performance
on a cautiously curated dataset of high-quality examples • Based on n-grams and rely on precision and
of a model on a specific task, but can reduce the Drawback: It requires a lot of data
for specific tasks. recall scores (multiple variants)
performance on other tasks (“catastrophic forgetting”). (around 50K to 100K examples).
Task-specific examples
Model variants differ based on the datasets and tasks BERT SCORE
Pre-trained
Prompt, completion
Fine-tuned
Solutions:
Prompt, completion used during fine-tuning. • Purpose: To evaluate LLMs in a task-agnostic
LLM Prompt, completion LLM
manner when a reference is available.
• It might not be an issue if only a single task matters.
Prompt-completion pairs Adjusted LLM weights • Based on token-wise comparison, a similarity score
• Fine-tune for multiple tasks concurrently is computed between candidate and reference
(~50K to 100K examples needed). sentences.
• The LLM generates better completions for a specific task Example of the FLAN family of models
• Has potentially high computing requirements • Opt for Parameter Efficient Fine-Tuning (PEFT) instead
of full fine-tuning, which involves training only a small FLAN, or Fine-tuned LAnguage Net, provides
number of task-specific adapter layers and parameters. tailored instructions for refining various LLM-as-a-Judge
Steps:
models, akin to dessert after pre-training.
• Purpose: To evaluate LLMs in a task-agnostic
1. Prepare the training data. manner when a reference is available.
2. Pass examples of training data to the LLM FLAN-T5 is an instruct fine-tuned version of the • Based on prompting an LLM to assess the equivalence
(prompt and ground-truth answer). T5 foundation model, serving as a versatile model of a generated answer with a ground-truth answer.
for various tasks.
Prompt LLM completion

Label this review: Label this review:


FLAN-T5 has been fine-tuned on a total of 473 To measure and compare LLMs more holistically, use
Pre-trained
Amazing product!
LLM
Amazing product! datasets across 146 task categories. For instance, evaluation benchmark datasets specific to model skills.
Sentiment: Sentiment: Neutral
the SAMSum dataset was used for summarization.
Ground truth Loss E.g., GLUE, SuperGLUE, MMLU, Big Bench, Helm
Label this review: A specialized variant of this model for chat
Amazing product!
Sentiment: Positive summarization or for custom company usage
Training data could be developed through additional fine-tuning
on specialized datasets (e.g., DialogSum or custom
internal data).
3. Compute the cross-entropy loss for each completion
token and backpropagate.
LoRA SOFT PROMPTS
Parameter Efficient Fine-Tuning
Method to reduce the number of trainable parameters during fine-tuning Unlike prompt engineering, whose limits are:
(PEFT) Methods by freezing all original model parameters and injecting a pair of rank • The manual effort requirements
decomposition matrices alongside the original weights
• The length of the context window

PEFT Prompt tuning: Add trainable tensors to the model input embeddings,
Full fine-tuning of LLMs is challenging: commonly known as “soft prompts,” optimized directly through
1 - Keep the majority of the original
gradient descent.
h = W0.x + AB.x LLM weights frozen.

Outputs h 2 - Introduce a pair of rank


Gradients decomposition matrices.
+ 3 - Train the new matrices A and B.
Optimizer states Activations
B
Pre-trained Pre-trained LLM
rank r Model weights update:
weights W0
Trainable weights Temporary variables A
1 - Matrix multiplication:
Requires a lot
of memory B * A = BxA Tunable soft prompt Input text
Inputs x (Typically, 20-100 tokens)
2 - Add to original weights :
PEFT methods only update a small number of model parameters.
LoRA
Examples of PEFT techniques: + BxA
• Freeze most model weights, and fine tune only specific layer parameters.
Soft prompt vectors:
• Keep existing parameters untouched; add only a few new ones or layers
for fine-tuning. • Equal in length to the embedding vectors of the input language tokens
The trained parameters can account for only 15%-20% of the Additional notes: • Can be seen as virtual tokens which can take any value within the
original LLM weights. multidimensional embedding space
• No impact on inference latency.
Main benefits: In prompt tuning, LLM weights are frozen:
• Fine-tuning specifically on the self-attention layers using LoRA is often
enough to enhance performance for a given task. • Over time, the embedding vector of the soft prompt is adjusted to optimize
• Decrease memory usage, often requiring just 1 GPU. model’s completion of the prompt
• Weights can be switched out as needed, allowing for training on many
• Mitigate risk of catastrophic forgetting. different tasks. • Only few parameters are updated
• Limit storage to only the new PEFT weights. • A different set of soft prompts can be trained for each task and easily swapped
out during inference (occupying very little space on disk).
Multiple methods exist with trade-offs on parameters or memory efficiency, Rank Choice for LoRA Matrices:
training speed, model quality, and inference costs. From literature, it is shown that at 10B parameters, prompt tuning is as efficient
Three PEFT methods classes from literature: Trade-Off: A smaller rank reduces parameters and accelerates training as full fine-tuning.
but risks lower adaptation quality due to reduced task-specific
Selective Reparameterization Additive information capture. ! Interpreting virtual tokens can pose challenges
Augment the pre-trained In literature, it appears that a rank between 4-32 is a good trade-off. (nearest neighbor tokens to the soft prompt location can be used).
Fine-tune only Use low-rank representations
specific parts of to reduce the number of model with new parameters
the original LLM. trainable parameters. or layers, training only
the additions. LoRA can be combined with quantization (=QLoRA).
E.g., LoRA
Adapter
Soft prompts
COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS

LLM Compute Challenges Memory Challenge


How can you reduce memory for training? How big do the models need to be?

and Scaling Laws RuntimeError : CUDA out of memory


Quantization: Decrease memory to store the
The goal is to maximize model performance.

weights of the model by converting the precision Researchers explored trade-offs between
from 32bit to 16bit or 8bit integers. the dataset size, the model size, and the
LARGE LANGUAGE MODEL CHOICE LLMs are massive and require plenty of memory compute budget:
for training and inference.
FP32 space
Increasing compute may seem ideal for better
Generative AI Project Lifecycle To load the model into GPU RAM: performance, but practical constraints like
3 x 10-38 0.0 3 x 1038 hardware, time, and budget limit its feasibility.
1 parameter (32-bit precision) = 4 bytes needed
Adapt
(prompt App integration 1B parameters = 4 x 109 bytes = 4GB of GPU
Use case engineering, (model
Model
definition Selection fine tuning), optimization, Constraint
& scoping augment, deployment) Pre-training requires storing additional components,
and evaluate Compute budget
model beyond the model’s parameters:
FP16 | BFLOAT16 | INT8 | INT4
• Optimizer states (e.g., 2 for Adam)
Scaling choice Scaling choice
Two options for model selection • Gradients
Dataset size Model size
• Forward activations Model
Quantization maps the FP32 numbers to a lower # of tokens
performance
# of parameters
• Use a pre-trained LLM. • Temporary variables precision space by employing scaling factors
• Train your own LLM from scratch. This could result in an additional 12-20 bytes of determined from the range of the FP32 numbers.
But, in general... memory needed per model parameter.
…develop your application using a pre-trained LLM, It has been empirically shown that, as the compute
except if you work with extremely specific data In most cases, quantization strongly reduces budget remains fixed:
(i.e., medical, legal) memory requirements with a limited loss
This would mean it requires 16 GB to 24 GB of
in prediction. Fixed model size: Increasing training dataset
Hubs: Where you can browse existing models GPU memory to train a 1-billion parameter
size improves model performance.
Model Cards: List of best use cases, training details, LLM, around 4-6x the GPU RAM needed just for
BFLOAT16 is a popular alternative to FP16:
limitations on models. storing the model weights.
Fixed dataset size: Larger models
The model choice will depend on the details • Developed by Google Brain demonstrate lower test loss, indicating
of the task to carry out. • Balances memory efficiency and accuracy enhanced performance.
• Wider dynamic range
Hence, the memory needed for LLM training is: • Optimized for storage and speed in ML tasks
Model pre-training:
e.g., FLAN T5 pre-trained using BFLOAT16
Model weights are adjusted in order to minimize the Excessive for consumer hardware What’s the optimal balance?
loss of the training objective.
It requires significant computational resources, Even demanding for data center hardware Benefits of quantization:
Once scaling laws have been estimated, we can use the
(i.e., GPUs, due to high computational load). (for single processor training). Chinchilla approach, i.e., we can choose the dataset
For instance, NVIDIA A100 supports up to Less memory
size and the model size to train a compute-optimal
80GB of RAM. model, which maximizes performance for a given
Potentially better model performance
PaLM compute budget. The compute-optimal training dataset
GPT-3 YaLM GPT-2 BERT size is ~20x the number of parameters.
Higher calculation speed
Number
of parameters
540B 175B 100B 1.5B 110M
RLHF PRINCIPLES COLLECTING HUMAN FEEDBACK REWARD MODEL

Preference Reminder on Reinforcement Learning Steps Objective:

Fine-Tuning (Part 1) Type of ML in which an agent learns to make decisions


towards a specific goal by taking actions in an
1. Choose a model and use it to curate a dataset for
human feedback.
To develop a model or system that accepts a text
sequence and outputs a scalar reward representing
environment, aiming to maximize some cumulative human preference numerically.
reward. Prompt samples Model completions
INTRODUCTION Reward model training:
Objective:
Win the game!
LLM
Some models exhibit undesirable behavior: Agent The reward model, often a language model (e.g., BERT),
RL policy (Model)
Action is trained using supervised learning on pairwise
• Generating toxic language State st Reward rt space
comparison data derived from human assessments
Action at
• Responding aggressively rt+1
of prompts.
• Providing harmful information Environment 2. Collect feedback from human labelers (generally,
st+1
thousands of people): Mathematically, it learns to prioritize the
To ensure alignment between LLMs and human values,
emphasis should be placed on qualities like helpfulness, • Specify the model alignment criterion. human-preferred completion while minimizing
Action space: All possible actions based on the current
honesty, and harmlessness (HHH). • Request that the labelers rank the outputs according the log sigmoid of the reward difference.
environment state.
to that criterion.
Generative AI Project Lifecycle In the context of LLMs...
Alignment criterion: (Prompt x, RM Reward rj
helpfulness Completion yj)
Adapt (prompt
Use case engineering, fine App integration Objective: Completion 1 2 2 2 loss = log( (r j-r k)
Model (model
definition tuning), Generate aligned text (Prompt x,
selection
augment, and optimization, The coffee RM Reward rk
& scoping
deployment) LLM Completion 2 1 1 3 Completion yk)
evaluate model is too bitter
Instruct Agent
LLM 3 3 1
RL policy = LLM Completion 3
Current Token
Additional training with preference data can boost context Reward Reward r
LLM t
vocabulary
State st Action at
HHH in completions. rt+1 Detailed instructions improve response quality Usage of the reward model:
Environment
LLM Context and consistency, resulting in labeled completions
Preference data st+1 Use the reward model as a binary classifier to assign
that reflect a consensus.
reward values to prompt-completion pairs.
Prompt Answer A Answer B
How to create a
bomb?
In order to create a
bomb, you have to…
I'm sorry, but I can't assist
with that. Creating a bomb Action: Text generation 3. Prepare the data for training Samantha enjoys reading books
is illegal…
Action space: Token vocabulary Create pairwise training data from rankings for the Logits
State: Any text in the current context window training of the reward model. Positive 3.17
The answers have been generated by the model we want (Prompt x,
RM
Completion y) Negative -2,6
to fine-tune and then assessed by human evaluators or The action the model will take depends on: Completions Completions
an LLM. 2 [0,1] [1,0]
• The prompt text in the context 1 Rank [1,0] [1,0]
• The probability distribution across the vocabulary space 3 [1,0] [1,0] Reward value equals the logits output by the model.
Two approaches: Completions Reward Reward

• Reinforcement Learning With Human Feedback The reward model assesses alignment of LLM outputs
(RLHF): Preference data is used to train a reward model with human preferences. Assign 1 for the Place the preferred
that mimic human annotator preferences, which then The reward values obtained are then used to update the preferred response and option first by
scores LLM completions for reinforcement learning 0 for the rejected one reordering
LLM weights and train a new human-aligned version,
response in each pair. completions.
adjustments. with the specifics determined by the optimization
• Preference Optimization (DPO, IPO): Minimize a algorithm.
training loss directly on preference data.
PPO ALGORITHM FOR LLMS REWARD HACKING RL FROM AI FEEDBACK

Preference PPO iteratively updates the policy to maximize the reward,


adjusting the LLM weights incrementally to maintain
The agent learns to cheat the system by maximizing
rewards at the expense of alignment with desired behavior.
Obtaining the reward model is labor-intensive;
scaling through AI-supervision is more precise and

Fine-Tuning (Part 2) proximity to the previous version within a defined range


for stable learning. RL
requires fewer human labels.

Prompt updated
LLM Constitutional AI (Bai, Yuntao, et al., 2022)
The PPO objective is used to update the LLM weights “The movie was...” “...an absolute thrill

FINETUNING WITH RL by backpropagation: fest that left me breathless!”


Approach that relies on a set of principles governing
AI behavior, along with a small number of examples
& REWARD MODEL To prevent reward hacking, penalize RL updates if they
for few-shot prompting, collectively forming
Hyperparameters significantly deviate from the frozen original LLM, using
the “constitution.”
The LLM weights are updated to create a human-aligned KL divergence.
model via reinforcement learning, leveraging the reward
Example of constitutional principle: “Please choose the
model, and starting with a high-performing base model. Policy loss Value loss Entropy loss
RL PPO response that is the most helpful, honest, and harmless.”
Prompt updated
LLM
RM
Goal: To align the LLM with provided instructions and
“The movie was...” “... thrilling and
human behavior. Value Loss: Minimize it to improve return unforgettable...” 1. Supervised Learning Stage
prediction accuracy.
2 Original KL divergence 1
LLM Shift penalty Helpful Harmful prompts,
RM: KL penalty LLM
Prompt 1 Value Value completions
Answer Reward Scores added in
loss function “... enjoyable reward
Model 2
and decent”

Updated
LLM Reinforcement learning Critique and revise
responses based on Harmful prompts,
3 constitutional principles revised completions
Estimated Actual Reward
1: Text Generation
2: Scoring N iterations
future total reward from the reward model
DIRECT PREFERENCE
3: Model weights update with
reinforcement learning. OPTIMIZATION Fine-
tuned
3 Fine-tune a
pre-trained LLM
Policy Loss: Maximize it to get higher rewards while LLM

staying within reliable bounds. An RLHF pipeline is difficult to implement:


Example: • Need to train a reward model 2. Reinforcement Learning (RL) Stage - RLAIF
Probabilities of
• New completions needed during training
Prompt: "A tree is..." the next token
Advantage term • Instability of the RL algorithm
Iteration 1: "...a plant with a trunk." → Reward: 0.3 with the updated LLM Define “trust region” Fine- 4
tuned Harmful prompts, + human feedback
… pair of completions helpfulness data
LLM
Iteration 4: "...a provider of shade and oxygen." → Reward: 1.6 Direct Preference Optimization (DPO) is a simpler
Probabilities of
… the next token Guardrails and more stable alternative to RLHF. It solves the Ask which response
5
Iteration n: "...a symbol of strength and resilience." → Reward: 2.9 with the initial LLM Keeping the policy in the “trust region” is best based on AI-generated
same problem by minimizing a training loss directly constitutional principles comparison data
Model’s probability distribution over tokens
based on the preference data (without reward
As the process advances successfully, the reward will modeling or RL). Fine-tune the LLM 6
Preference Train a
using RL against
gradually increase until it meets the predefined evaluation the preference model model preference model

criteria for helpfulness. Entropy Loss: Maximize it to promote and sustain Identity Preference Optimization (IPO) is a variant 7
Result: A policy trained by Reinforcement
model creativity. of DPO less prone to overfitting. Learning with AI Feedback (RLAIF)
Updated model: The resulting updated model should
be more aligned with human preferences. Comparison
data Fine
tuned
Reinforcement learning algorithm: Proximal policy The higher the entropy, the more creative the policy. LLM
DPO (or IPO)
optimization (PPO) is a popular choice.
LLMINTEGRATED APPLICATIONS LLM REASONING WITH PROGRAMAIDED LANGUAGE & REACT
CHAINOFTHOUGHT PROMPTING
LLM-Powered Applications • Knowledge can be out of date.
• LLMs struggle with certain tasks (e.g., math). Complex reasoning is challenging for LLMs.
Program-Aided Language (PAL)

• LLMs can confidently provide wrong answers Generate scripts and pass it to the interpreter.
E.g., problems with multiple steps, mathematical reasoning
("hallucination").
MODEL OPTIMIZATION LLM should serve as a reasoning engine. Prompt
FOR DEPLOYMENT Leverage external app or data sources The prompt and completion are important! Q: Roger has 5 tennis balls. [...]
A: CoT reasoning
Inference challenges: High computing and storage demands LLM-integrated application # Roger started with 5 tennis balls
1.Plan actions 2.Format outputs 3.Validate actions
Ext data sources tennis_balles=5 PAL execution
Shrink model size, maintain performance Orchestrator
Set of instructions Requires formatting Collect information
Frontend E.g., # 2 cans of tennis balls each is
User Step1: Get for applications to that allows validation
bought_balls=2*3
Model Distillation Ext applications customer ID understand actions of an action
API Step2: Reset # tennis balls. The answer is
Python password answer = tennis_balls + bought_balls
• Scale down model complexity while preserving accuracy. LLM
Q. [...]
• Train a small student model to mimic a large frozen
teacher model. Chain-of-Thought (CoT)
Retrieval Augmented Generation (RAG) Completion is handed off to a Python interpreter.
• Prompts the model to break down problems into
LLM Soft AI framework that integrates external data sources sequential steps. Calculations are accurate and reliable.
Teacher labels
Distillation and apps (e.g., documents, private databases, etc.). • Operates by integrating intermediate reasoning steps ReAct
Knowledge
distillation Soft loss Multiple implementations exist, will depend on the into examples for one-or few-shot inference.
predictions
LLM details of the task and the data format. Prompting strategy that combines CoT reasoning and
Student Hard
Prompt action planning, employing structured examples to
Labeled predictions
training data Student
Q: Roger has 5 tennis balls. He buys 2 more cans of guide an LLM in problem-solving and decision-making
loss
Hard Retriever for solutions
labels
tennis balls. Each can has 3 tennis balls. How many
Query Query External
LLM Answer tennis balls does he have now?
encoder knowledge Instructions: Define the task,
User A: Roger started with 5 balls. 2 cans of 3 tennis Instructions
• Soft labels: Teacher completions serve as ground what is a thought and the
truth labels. balls each is 6 tennis balls. 5+6=11. The answer is 11. actions

• Student and distillation losses update student model • We retrieve documents most similar to the input query Q: The cafeteria had 23 apples. If they used 20 to Thought: Analysis of the Question
weights via backpropagation. current situation and the
in the external data. make lunch and bought 6 more, how many apples next steps to take
Thought
• The student LLM can be used for inference. • We combine the documents with the input query and do they have? Action: The actions are from
send the prompt to the LLM to receive the answer. a predetermined list and
Action
Post Training Quantization (PTQ) Completion
defined in the set of
instructions in the prompt
Observation
A: The cafeteria had 23 apples. They used 20 to The loop ends when the
PTQ reduces model weight precision to 16-bit float or ! Size of the context window can be a limitation. make lunch. 23-20=3. They bought 6 more apples, action is finish []

8-bit integer. Use multiple chunks (e.g., with LangChain) so 3+6=9. The answer is 9. Observation: Result of the Question to be answered
previous action
• Can target both weights and activation layers for impact. ! Data must be in format that allows its relevance
• May sacrifice performance, yet beneficial for cost to be assessed at inference time. In the completion, the whole prompt is included.
savings and performance gains. LangChain can be used to connect multiple
Use embedding vectors (vector store)
Improves performance but struggles with components through agents, tools, etc.
Model Pruning
precision-demanding tasks like tax computation Agents: Interpret the user input and determine which
or discount application. tool to use for the task (LangChain includes agents for
Removes redundant model parameters that contribute Vector database: Stores vectors and associated
little to the model performance. metadata, enabling efficient nearest-neighbor PAL & ReAct).
Solution: Allow the LLM to communicate with a proficient
Some methods require full model training, while others are vector search. math program, as a Python interpreter. ReAct reduces the risks of errors.
in the PEFT category (LoRA).

You might also like