100% found this document useful (2 votes)

408 views7 pages

Introduction To LLMS: Transformers Types of Llms Configuration Settings

The document discusses types of large language models including encoder-only, decoder-only, and encoder-decoder models. It also covers how these models are pretrained using masked or causal language modeling objectives and common use cases like text generation, classification, and question answering.

Uploaded by

ashish tewari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

408 views7 pages

Introduction To LLMS: Transformers Types of Llms Configuration Settings

Uploaded by

ashish tewari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

TRANSFORMERS TYPES OF LLMS CONFIGURATION SETTINGS

Introduction to LLMs – Can scale efficiently to use multi-core GPUs

– Can process input data in parallel
Encoder only = Autoencoding model
Ex: BERT, RoBERTa
Parameters to set at inference time

Max new tokens Maximum number of tokens

– Pay attention to all other words These are not generative models. generated during completion
when processing a word
DEFINITIONS
Decoding strategy
Transformers’ strength lies in understanding
Generative AI AI systems that can produce the context and relevance of all words 1 Greedy Decoding The word/token with the

realistic content (text, image, etc.) in a sentence highest probability is selected from the final
To predict tokens masked
PRE-TRAINING OBJECTIVE probability distribution (prone to repetition)
in a sentence (= Masked Language Modeling)

OUTPUT Encoded representation of the text

USE CASE(S) Sentence classification (e.g., NER)

Large Language Models (LLMs)
Large neural networks trained at internet scale Decoder only = Autoregressive model

to estimate the probability of sequences
of words Ex: GPT, BLOOM

Ex: GPT, FLAN-T5, LLaMA, PaLM, BLOOM
2 Random Sampling The model chooses
(transformers with billions of parameters)
an output word at random using the probability
Abilities (and computing resources needed)
distribution to weigh the selection (could be
tend to rise with the number of parameters
To predict the next token
PRE-TRAINING OBJECTIVE too creative)
USE CASES based on the previous sequence of tokens TECHNIQUES TO CONTROL RANDOM SAMPLING
– Standard NLP tasks (= Causal Language Modeling)
– Top K The next token is drawn from
(classification, summarization, etc.) OUTPUT Next token the k tokens with the highest probabilities
– Content generation USE CASES Text generation – Top P The next token is drawn from
– Reasoning (Q&A, planning, coding, etc.) Token Word or sub-word the tokens with the highest probabilities,
The basic unit processed by transformers Encoder-Decoder = Seq-to-seq model whose combined probabilities exceed p

Encoder Processes input sequence Ex: T5, BART

to generate a vector representation (or

embedding) for each token

Decoder Processes input tokens to produce

new tokens
In-context learning Specifying the task
to perform directly in the prompt Embedding layer Maps each token
to a trainable vector

Temperature Influence the shape of
Positional encoding vector the probability distribution through a scaling

Added to the token embedding vector PRE-TRAINING OBJECTIVE Vary from model to model
factor in the softmax layer
to keep track of the token’s position (e.g., Span corruption like T5)

OUTPUT Sentinel token + predicted tokens

Self-Attention Computes the importance

of each word in the input sequence to all USE CASES Translation, Q&A, summarization

other words in the sequence

LLM Instruction Task-specific fine-tuning involves training a pre-trained

model on a particular task or domain using a dataset
Multi-task fine-tuning diversifies training with examples
for multiple tasks, guiding the model to perform
Fine-Tuning & Evaluation tailored for that purpose. various tasks. Evaluating LLMs Is Challenging

(e.g., various tasks, non-deterministic outputs, equally

valid answers with different wordings).
INSTRUCTION FINETUNING Task-specific dataset
e.g., translation
Multi-task training dataset
Need for automated and organized performance
Analyze the sentiment
Translate the text: Pre-trained
Identify entities Instruct LLM assessments
In-Context Learning Limitations: Pre-trained Source text (English) Instruct LLM LLM
LLM Source completion (French) Summarize the text
Various approaches exist, but there are a few examples:
• May be insufficient for very specific tasks. Translate the text:
Source text (English)
• Examples take up space in the context window. Often, good results can be achieved with just a Source completion
few hundred or thousand examples. (French) ROUGE & BLEU SCORE
Instruction Fine-Tuning • Purpose: To evaluate LLMs on narrow tasks
Many examples of each task needed for training
(summarization, translation) when a reference
The LLM is trained to estimate the next token probability is available
Fine-tuning can significantly increase the performance
on a cautiously curated dataset of high-quality examples • Based on n-grams and rely on precision and
of a model on a specific task, but can reduce the Drawback: It requires a lot of data
for specific tasks. recall scores (multiple variants)
performance on other tasks (“catastrophic forgetting”). (around 50K to 100K examples).
Task-specific examples
Model variants differ based on the datasets and tasks BERT SCORE
Pre-trained
Prompt, completion
Fine-tuned
Solutions:
Prompt, completion used during fine-tuning. • Purpose: To evaluate LLMs in a task-agnostic
LLM Prompt, completion LLM
manner when a reference is available.
• It might not be an issue if only a single task matters.
Prompt-completion pairs Adjusted LLM weights • Based on token-wise comparison, a similarity score
• Fine-tune for multiple tasks concurrently is computed between candidate and reference
(~50K to 100K examples needed). sentences.
• The LLM generates better completions for a specific task Example of the FLAN family of models
• Has potentially high computing requirements • Opt for Parameter Efficient Fine-Tuning (PEFT) instead
of full fine-tuning, which involves training only a small FLAN, or Fine-tuned LAnguage Net, provides
number of task-specific adapter layers and parameters. tailored instructions for refining various LLM-as-a-Judge
Steps:
models, akin to dessert after pre-training.
• Purpose: To evaluate LLMs in a task-agnostic
1. Prepare the training data. manner when a reference is available.
2. Pass examples of training data to the LLM FLAN-T5 is an instruct fine-tuned version of the • Based on prompting an LLM to assess the equivalence
(prompt and ground-truth answer). T5 foundation model, serving as a versatile model of a generated answer with a ground-truth answer.
for various tasks.
Prompt LLM completion

Label this review: Label this review:

FLAN-T5 has been fine-tuned on a total of 473 To measure and compare LLMs more holistically, use
Pre-trained
Amazing product!
LLM
Amazing product! datasets across 146 task categories. For instance, evaluation benchmark datasets specific to model skills.
Sentiment: Sentiment: Neutral
the SAMSum dataset was used for summarization.
Ground truth Loss E.g., GLUE, SuperGLUE, MMLU, Big Bench, Helm
Label this review: A specialized variant of this model for chat
Amazing product!
Sentiment: Positive summarization or for custom company usage
Training data could be developed through additional fine-tuning
on specialized datasets (e.g., DialogSum or custom
internal data).
3. Compute the cross-entropy loss for each completion
token and backpropagate.
LoRA SOFT PROMPTS
Parameter Efficient Fine-Tuning
Method to reduce the number of trainable parameters during fine-tuning Unlike prompt engineering, whose limits are:
(PEFT) Methods by freezing all original model parameters and injecting a pair of rank • The manual effort requirements
decomposition matrices alongside the original weights
• The length of the context window

PEFT Prompt tuning: Add trainable tensors to the model input embeddings,
Full fine-tuning of LLMs is challenging: commonly known as “soft prompts,” optimized directly through
1 - Keep the majority of the original
gradient descent.
h = W0.x + AB.x LLM weights frozen.

Outputs h 2 - Introduce a pair of rank

Gradients decomposition matrices.
+ 3 - Train the new matrices A and B.
Optimizer states Activations
B
Pre-trained Pre-trained LLM
rank r Model weights update:
weights W0
Trainable weights Temporary variables A
1 - Matrix multiplication:
Requires a lot
of memory B * A = BxA Tunable soft prompt Input text
Inputs x (Typically, 20-100 tokens)
2 - Add to original weights :
PEFT methods only update a small number of model parameters.
LoRA
Examples of PEFT techniques: + BxA
• Freeze most model weights, and fine tune only specific layer parameters.
Soft prompt vectors:
• Keep existing parameters untouched; add only a few new ones or layers
for fine-tuning. • Equal in length to the embedding vectors of the input language tokens
The trained parameters can account for only 15%-20% of the Additional notes: • Can be seen as virtual tokens which can take any value within the
original LLM weights. multidimensional embedding space
• No impact on inference latency.
Main benefits: In prompt tuning, LLM weights are frozen:
• Fine-tuning specifically on the self-attention layers using LoRA is often
enough to enhance performance for a given task. • Over time, the embedding vector of the soft prompt is adjusted to optimize
• Decrease memory usage, often requiring just 1 GPU. model’s completion of the prompt
• Weights can be switched out as needed, allowing for training on many
• Mitigate risk of catastrophic forgetting. different tasks. • Only few parameters are updated
• Limit storage to only the new PEFT weights. • A different set of soft prompts can be trained for each task and easily swapped
out during inference (occupying very little space on disk).
Multiple methods exist with trade-offs on parameters or memory efficiency, Rank Choice for LoRA Matrices:
training speed, model quality, and inference costs. From literature, it is shown that at 10B parameters, prompt tuning is as efficient
Three PEFT methods classes from literature: Trade-Off: A smaller rank reduces parameters and accelerates training as full fine-tuning.
but risks lower adaptation quality due to reduced task-specific
Selective Reparameterization Additive information capture. ! Interpreting virtual tokens can pose challenges
Augment the pre-trained In literature, it appears that a rank between 4-32 is a good trade-off. (nearest neighbor tokens to the soft prompt location can be used).
Fine-tune only Use low-rank representations
specific parts of to reduce the number of model with new parameters
the original LLM. trainable parameters. or layers, training only
the additions. LoRA can be combined with quantization (=QLoRA).
E.g., LoRA
Adapter
Soft prompts
COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS

LLM Compute Challenges Memory Challenge

How can you reduce memory for training? How big do the models need to be?

and Scaling Laws RuntimeError : CUDA out of memory

Quantization: Decrease memory to store the
The goal is to maximize model performance.

weights of the model by converting the precision Researchers explored trade-offs between
from 32bit to 16bit or 8bit integers. the dataset size, the model size, and the
LARGE LANGUAGE MODEL CHOICE LLMs are massive and require plenty of memory compute budget:
for training and inference.
FP32 space
Increasing compute may seem ideal for better
Generative AI Project Lifecycle To load the model into GPU RAM: performance, but practical constraints like
3 x 10-38 0.0 3 x 1038 hardware, time, and budget limit its feasibility.
1 parameter (32-bit precision) = 4 bytes needed
Adapt
(prompt App integration 1B parameters = 4 x 109 bytes = 4GB of GPU
Use case engineering, (model
Model
definition Selection fine tuning), optimization, Constraint
& scoping augment, deployment) Pre-training requires storing additional components,
and evaluate Compute budget
model beyond the model’s parameters:
FP16 | BFLOAT16 | INT8 | INT4
• Optimizer states (e.g., 2 for Adam)
Scaling choice Scaling choice
Two options for model selection • Gradients
Dataset size Model size
• Forward activations Model
Quantization maps the FP32 numbers to a lower # of tokens
performance
# of parameters
• Use a pre-trained LLM. • Temporary variables precision space by employing scaling factors
• Train your own LLM from scratch. This could result in an additional 12-20 bytes of determined from the range of the FP32 numbers.
But, in general... memory needed per model parameter.
…develop your application using a pre-trained LLM, It has been empirically shown that, as the compute
except if you work with extremely specific data In most cases, quantization strongly reduces budget remains fixed:
(i.e., medical, legal) memory requirements with a limited loss
This would mean it requires 16 GB to 24 GB of
in prediction. Fixed model size: Increasing training dataset
Hubs: Where you can browse existing models GPU memory to train a 1-billion parameter
size improves model performance.
Model Cards: List of best use cases, training details, LLM, around 4-6x the GPU RAM needed just for
BFLOAT16 is a popular alternative to FP16:
limitations on models. storing the model weights.
Fixed dataset size: Larger models
The model choice will depend on the details • Developed by Google Brain demonstrate lower test loss, indicating
of the task to carry out. • Balances memory efficiency and accuracy enhanced performance.
• Wider dynamic range
Hence, the memory needed for LLM training is: • Optimized for storage and speed in ML tasks
Model pre-training:
e.g., FLAN T5 pre-trained using BFLOAT16
Model weights are adjusted in order to minimize the Excessive for consumer hardware What’s the optimal balance?
loss of the training objective.
It requires significant computational resources, Even demanding for data center hardware Benefits of quantization:
Once scaling laws have been estimated, we can use the
(i.e., GPUs, due to high computational load). (for single processor training). Chinchilla approach, i.e., we can choose the dataset
For instance, NVIDIA A100 supports up to Less memory
size and the model size to train a compute-optimal
80GB of RAM. model, which maximizes performance for a given
Potentially better model performance
PaLM compute budget. The compute-optimal training dataset
GPT-3 YaLM GPT-2 BERT size is ~20x the number of parameters.
Higher calculation speed
Number
of parameters
540B 175B 100B 1.5B 110M
RLHF PRINCIPLES COLLECTING HUMAN FEEDBACK REWARD MODEL

Preference Reminder on Reinforcement Learning Steps Objective:

Fine-Tuning (Part 1) Type of ML in which an agent learns to make decisions

towards a specific goal by taking actions in an
1. Choose a model and use it to curate a dataset for
human feedback.
To develop a model or system that accepts a text
sequence and outputs a scalar reward representing
environment, aiming to maximize some cumulative human preference numerically.
reward. Prompt samples Model completions
INTRODUCTION Reward model training:
Objective:
Win the game!
LLM
Some models exhibit undesirable behavior: Agent The reward model, often a language model (e.g., BERT),
RL policy (Model)
Action is trained using supervised learning on pairwise
• Generating toxic language State st Reward rt space
comparison data derived from human assessments
Action at
• Responding aggressively rt+1
of prompts.
• Providing harmful information Environment 2. Collect feedback from human labelers (generally,
st+1
thousands of people): Mathematically, it learns to prioritize the
To ensure alignment between LLMs and human values,
emphasis should be placed on qualities like helpfulness, • Specify the model alignment criterion. human-preferred completion while minimizing
Action space: All possible actions based on the current
honesty, and harmlessness (HHH). • Request that the labelers rank the outputs according the log sigmoid of the reward difference.
environment state.
to that criterion.
Generative AI Project Lifecycle In the context of LLMs...
Alignment criterion: (Prompt x, RM Reward rj
helpfulness Completion yj)
Adapt (prompt
Use case engineering, fine App integration Objective: Completion 1 2 2 2 loss = log( (r j-r k)
Model (model
definition tuning), Generate aligned text (Prompt x,
selection
augment, and optimization, The coffee RM Reward rk
& scoping
deployment) LLM Completion 2 1 1 3 Completion yk)
evaluate model is too bitter
Instruct Agent
LLM 3 3 1
RL policy = LLM Completion 3
Current Token
Additional training with preference data can boost context Reward Reward r
LLM t
vocabulary
State st Action at
HHH in completions. rt+1 Detailed instructions improve response quality Usage of the reward model:
Environment
LLM Context and consistency, resulting in labeled completions
Preference data st+1 Use the reward model as a binary classifier to assign
that reflect a consensus.
reward values to prompt-completion pairs.
Prompt Answer A Answer B
How to create a
bomb?
In order to create a
bomb, you have to…
I'm sorry, but I can't assist
with that. Creating a bomb Action: Text generation 3. Prepare the data for training Samantha enjoys reading books
is illegal…
Action space: Token vocabulary Create pairwise training data from rankings for the Logits
State: Any text in the current context window training of the reward model. Positive 3.17
The answers have been generated by the model we want (Prompt x,
RM
Completion y) Negative -2,6
to fine-tune and then assessed by human evaluators or The action the model will take depends on: Completions Completions
an LLM. 2 [0,1] [1,0]
• The prompt text in the context 1 Rank [1,0] [1,0]
• The probability distribution across the vocabulary space 3 [1,0] [1,0] Reward value equals the logits output by the model.
Two approaches: Completions Reward Reward

• Reinforcement Learning With Human Feedback The reward model assesses alignment of LLM outputs
(RLHF): Preference data is used to train a reward model with human preferences. Assign 1 for the Place the preferred
that mimic human annotator preferences, which then The reward values obtained are then used to update the preferred response and option first by
scores LLM completions for reinforcement learning 0 for the rejected one reordering
LLM weights and train a new human-aligned version,
response in each pair. completions.
adjustments. with the specifics determined by the optimization
• Preference Optimization (DPO, IPO): Minimize a algorithm.
training loss directly on preference data.
PPO ALGORITHM FOR LLMS REWARD HACKING RL FROM AI FEEDBACK

Preference PPO iteratively updates the policy to maximize the reward,

adjusting the LLM weights incrementally to maintain
The agent learns to cheat the system by maximizing
rewards at the expense of alignment with desired behavior.
Obtaining the reward model is labor-intensive;
scaling through AI-supervision is more precise and

Fine-Tuning (Part 2) proximity to the previous version within a defined range

for stable learning. RL
requires fewer human labels.

Prompt updated
LLM Constitutional AI (Bai, Yuntao, et al., 2022)
The PPO objective is used to update the LLM weights “The movie was...” “...an absolute thrill

FINETUNING WITH RL by backpropagation: fest that left me breathless!”

Approach that relies on a set of principles governing
AI behavior, along with a small number of examples
& REWARD MODEL To prevent reward hacking, penalize RL updates if they
for few-shot prompting, collectively forming
Hyperparameters significantly deviate from the frozen original LLM, using
the “constitution.”
The LLM weights are updated to create a human-aligned KL divergence.
model via reinforcement learning, leveraging the reward
Example of constitutional principle: “Please choose the
model, and starting with a high-performing base model. Policy loss Value loss Entropy loss
RL PPO response that is the most helpful, honest, and harmless.”
Prompt updated
LLM
RM
Goal: To align the LLM with provided instructions and
“The movie was...” “... thrilling and
human behavior. Value Loss: Minimize it to improve return unforgettable...” 1. Supervised Learning Stage
prediction accuracy.
2 Original KL divergence 1
LLM Shift penalty Helpful Harmful prompts,
RM: KL penalty LLM
Prompt 1 Value Value completions
Answer Reward Scores added in
loss function “... enjoyable reward
Model 2
and decent”

Updated
LLM Reinforcement learning Critique and revise
responses based on Harmful prompts,
3 constitutional principles revised completions
Estimated Actual Reward
1: Text Generation
2: Scoring N iterations
future total reward from the reward model
DIRECT PREFERENCE
3: Model weights update with
reinforcement learning. OPTIMIZATION Fine-
tuned
3 Fine-tune a
pre-trained LLM
Policy Loss: Maximize it to get higher rewards while LLM

staying within reliable bounds. An RLHF pipeline is difficult to implement:

Example: • Need to train a reward model 2. Reinforcement Learning (RL) Stage - RLAIF
Probabilities of
• New completions needed during training
Prompt: "A tree is..." the next token
Advantage term • Instability of the RL algorithm
Iteration 1: "...a plant with a trunk." → Reward: 0.3 with the updated LLM Define “trust region” Fine- 4
tuned Harmful prompts, + human feedback
… pair of completions helpfulness data
LLM
Iteration 4: "...a provider of shade and oxygen." → Reward: 1.6 Direct Preference Optimization (DPO) is a simpler
Probabilities of
… the next token Guardrails and more stable alternative to RLHF. It solves the Ask which response
5
Iteration n: "...a symbol of strength and resilience." → Reward: 2.9 with the initial LLM Keeping the policy in the “trust region” is best based on AI-generated
same problem by minimizing a training loss directly constitutional principles comparison data
Model’s probability distribution over tokens
based on the preference data (without reward
As the process advances successfully, the reward will modeling or RL). Fine-tune the LLM 6
Preference Train a
using RL against
gradually increase until it meets the predefined evaluation the preference model model preference model

criteria for helpfulness. Entropy Loss: Maximize it to promote and sustain Identity Preference Optimization (IPO) is a variant 7
Result: A policy trained by Reinforcement
model creativity. of DPO less prone to overfitting. Learning with AI Feedback (RLAIF)
Updated model: The resulting updated model should
be more aligned with human preferences. Comparison
data Fine
tuned
Reinforcement learning algorithm: Proximal policy The higher the entropy, the more creative the policy. LLM
DPO (or IPO)
optimization (PPO) is a popular choice.
LLMINTEGRATED APPLICATIONS LLM REASONING WITH PROGRAMAIDED LANGUAGE & REACT
CHAINOFTHOUGHT PROMPTING
LLM-Powered Applications • Knowledge can be out of date.
• LLMs struggle with certain tasks (e.g., math). Complex reasoning is challenging for LLMs.
Program-Aided Language (PAL)

• LLMs can confidently provide wrong answers Generate scripts and pass it to the interpreter.
E.g., problems with multiple steps, mathematical reasoning
("hallucination").
MODEL OPTIMIZATION LLM should serve as a reasoning engine. Prompt
FOR DEPLOYMENT Leverage external app or data sources The prompt and completion are important! Q: Roger has 5 tennis balls. [...]
A: CoT reasoning
Inference challenges: High computing and storage demands LLM-integrated application # Roger started with 5 tennis balls
1.Plan actions 2.Format outputs 3.Validate actions
Ext data sources tennis_balles=5 PAL execution
Shrink model size, maintain performance Orchestrator
Set of instructions Requires formatting Collect information
Frontend E.g., # 2 cans of tennis balls each is
User Step1: Get for applications to that allows validation
bought_balls=2*3
Model Distillation Ext applications customer ID understand actions of an action
API Step2: Reset # tennis balls. The answer is
Python password answer = tennis_balls + bought_balls
• Scale down model complexity while preserving accuracy. LLM
Q. [...]
• Train a small student model to mimic a large frozen
teacher model. Chain-of-Thought (CoT)
Retrieval Augmented Generation (RAG) Completion is handed off to a Python interpreter.
• Prompts the model to break down problems into
LLM Soft AI framework that integrates external data sources sequential steps. Calculations are accurate and reliable.
Teacher labels
Distillation and apps (e.g., documents, private databases, etc.). • Operates by integrating intermediate reasoning steps ReAct
Knowledge
distillation Soft loss Multiple implementations exist, will depend on the into examples for one-or few-shot inference.
predictions
LLM details of the task and the data format. Prompting strategy that combines CoT reasoning and
Student Hard
Prompt action planning, employing structured examples to
Labeled predictions
training data Student
Q: Roger has 5 tennis balls. He buys 2 more cans of guide an LLM in problem-solving and decision-making
loss
Hard Retriever for solutions
labels
tennis balls. Each can has 3 tennis balls. How many
Query Query External
LLM Answer tennis balls does he have now?
encoder knowledge Instructions: Define the task,
User A: Roger started with 5 balls. 2 cans of 3 tennis Instructions
• Soft labels: Teacher completions serve as ground what is a thought and the
truth labels. balls each is 6 tennis balls. 5+6=11. The answer is 11. actions

• Student and distillation losses update student model • We retrieve documents most similar to the input query Q: The cafeteria had 23 apples. If they used 20 to Thought: Analysis of the Question
weights via backpropagation. current situation and the
in the external data. make lunch and bought 6 more, how many apples next steps to take
Thought
• The student LLM can be used for inference. • We combine the documents with the input query and do they have? Action: The actions are from
send the prompt to the LLM to receive the answer. a predetermined list and
Action
Post Training Quantization (PTQ) Completion
defined in the set of
instructions in the prompt
Observation
A: The cafeteria had 23 apples. They used 20 to The loop ends when the
PTQ reduces model weight precision to 16-bit float or ! Size of the context window can be a limitation. make lunch. 23-20=3. They bought 6 more apples, action is finish []

8-bit integer. Use multiple chunks (e.g., with LangChain) so 3+6=9. The answer is 9. Observation: Result of the Question to be answered
previous action
• Can target both weights and activation layers for impact. ! Data must be in format that allows its relevance
• May sacrifice performance, yet beneficial for cost to be assessed at inference time. In the completion, the whole prompt is included.
savings and performance gains. LangChain can be used to connect multiple
Use embedding vectors (vector store)
Improves performance but struggles with components through agents, tools, etc.
Model Pruning
precision-demanding tasks like tax computation Agents: Interpret the user input and determine which
or discount application. tool to use for the task (LangChain includes agents for
Removes redundant model parameters that contribute Vector database: Stores vectors and associated
little to the model performance. metadata, enabling efficient nearest-neighbor PAL & ReAct).
Solution: Allow the LLM to communicate with a proficient
Some methods require full model training, while others are vector search. math program, as a Python interpreter. ReAct reduces the risks of errors.
in the PEFT category (LoRA).

AI Agents by Google
100% (6)
AI Agents by Google
42 pages
Mastering AI Agents
100% (2)
Mastering AI Agents
93 pages
Databricks Big Book of GenAI FINAL
100% (6)
Databricks Big Book of GenAI FINAL
118 pages
Generative AI With LangChain Ben Auffarth 2024 Scribd Download
100% (5)
Generative AI With LangChain Ben Auffarth 2024 Scribd Download
62 pages
Building LLM Applications For Production
100% (3)
Building LLM Applications For Production
28 pages
Apress Understanding Large Language Models B0CJ2C8TXQ
100% (11)
Apress Understanding Large Language Models B0CJ2C8TXQ
166 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Thimira Amaratunga - Understanding Large Language Models - Learning Their Underlying Concepts and Technologies-Apress (2023)
100% (6)
Thimira Amaratunga - Understanding Large Language Models - Learning Their Underlying Concepts and Technologies-Apress (2023)
145 pages
Current Best Practices For Training LLMs From Scratch - Final
No ratings yet
Current Best Practices For Training LLMs From Scratch - Final
23 pages
LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
Building LLM Powered Applications With Langchain
100% (1)
Building LLM Powered Applications With Langchain
11 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
RAG Architecture
100% (7)
RAG Architecture
52 pages
GenerativeAI Projects
100% (2)
GenerativeAI Projects
46 pages
Generative AI With Large Language Models
100% (2)
Generative AI With Large Language Models
31 pages
Generative Ai Terminology
100% (2)
Generative Ai Terminology
26 pages
LLM Evaluation
No ratings yet
LLM Evaluation
1 page
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
50% (2)
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
21 pages
Create LLM Application Using Langchain With Ease
100% (5)
Create LLM Application Using Langchain With Ease
12 pages
List of Open Sourced Fine-Tuned Large Language Models (LLM) - by Sung Kim - Geek Culture - Mar, 2023 - Medium
No ratings yet
List of Open Sourced Fine-Tuned Large Language Models (LLM) - by Sung Kim - Geek Culture - Mar, 2023 - Medium
18 pages
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
No ratings yet
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
13 pages
LLM Questions
100% (1)
LLM Questions
51 pages
LLM Applications
100% (1)
LLM Applications
1 page
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
100% (1)
Running Llama 2 On CPU Inference Locally For Document Q&A - by Kenneth Leung - Jul, 2023 - Towards Data Science
21 pages
26 RAG Concepts in Alphabetical Order
No ratings yet
26 RAG Concepts in Alphabetical Order
15 pages
Aryan A. What Is LLMOps. Large Language Models in Production 2024
No ratings yet
Aryan A. What Is LLMOps. Large Language Models in Production 2024
67 pages
RAG Technics
No ratings yet
RAG Technics
8 pages
Vector_Databases
No ratings yet
Vector_Databases
35 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
7 Agentic RAG System Architectures to Build AI Agents
No ratings yet
7 Agentic RAG System Architectures to Build AI Agents
12 pages
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
100% (1)
LLM Mesh: A Practical Guide To Using Generative AI in The Enterprise
27 pages
Building Machine Learning Systems With A Feature Store - Early Release
100% (1)
Building Machine Learning Systems With A Feature Store - Early Release
48 pages
GenAI_Interview_Questions-Draft
No ratings yet
GenAI_Interview_Questions-Draft
27 pages
Large Language Models
100% (1)
Large Language Models
23 pages
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Generative AI - POC - Readout
100% (3)
Generative AI - POC - Readout
56 pages
TensorFlow Cheatsheet Zero To Mastery V1.01
No ratings yet
TensorFlow Cheatsheet Zero To Mastery V1.01
26 pages
Software Architecture in An AI World
No ratings yet
Software Architecture in An AI World
25 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
300 LangChain Projects
100% (1)
300 LangChain Projects
17 pages
Implementing GenAI Use Cases and Challenges
100% (2)
Implementing GenAI Use Cases and Challenges
42 pages
Fine-Tuning Pre-Trained Models For Generative AI Applications
100% (2)
Fine-Tuning Pre-Trained Models For Generative AI Applications
19 pages
Agentic Ai Roadmap
No ratings yet
Agentic Ai Roadmap
6 pages
Vector Databases - A Technical Primer
No ratings yet
Vector Databases - A Technical Primer
68 pages
GenAI POC - Training
100% (1)
GenAI POC - Training
43 pages
LangGraph: multi-agent systems
No ratings yet
LangGraph: multi-agent systems
9 pages
Mlops Ebook With Preview
67% (3)
Mlops Ebook With Preview
57 pages
10 Most Asked LLM Interview Questions
No ratings yet
10 Most Asked LLM Interview Questions
12 pages
Prompt Engineering
100% (1)
Prompt Engineering
33 pages
AI Agents Google
100% (1)
AI Agents Google
42 pages
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
100% (1)
AIML001 Generative AI On AWS - Build and Scale Generative AI Applications With Foundation Models
28 pages
Retrieval-Augmented Generation For Large Language Models A Survey
No ratings yet
Retrieval-Augmented Generation For Large Language Models A Survey
26 pages
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies 1st Edition Thimira Amaratunga - Download the full set of chapters carefully compiled
100% (1)
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies 1st Edition Thimira Amaratunga - Download the full set of chapters carefully compiled
55 pages
Diffusion
100% (5)
Diffusion
62 pages
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
100% (2)
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
48 pages
Local LLM Inference and Fine-Tuning
100% (3)
Local LLM Inference and Fine-Tuning
26 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
Prompt Engineering ; The Future Of Language Generation
From Everand
Prompt Engineering ; The Future Of Language Generation
Michael Ferguson
3.5/5 (3)
LLM Cheatsheet
No ratings yet
LLM Cheatsheet
1 page
Telegram Useful Bots
No ratings yet
Telegram Useful Bots
6 pages
Skip To Main Contentaccessibility Help
No ratings yet
Skip To Main Contentaccessibility Help
3 pages
Grade 6 Worksheet 1 - Advanced Features
No ratings yet
Grade 6 Worksheet 1 - Advanced Features
2 pages
Amol Bhosale Resume
No ratings yet
Amol Bhosale Resume
2 pages
171356 - SAP Software on Linux - General Information
No ratings yet
171356 - SAP Software on Linux - General Information
11 pages
Information Communication Technology: Grade 12
No ratings yet
Information Communication Technology: Grade 12
112 pages
Very Large Scale Integration
No ratings yet
Very Large Scale Integration
4 pages
University of Eastern Philippines Laoang Campus
No ratings yet
University of Eastern Philippines Laoang Campus
4 pages
Dlms-Network V 1.0: Digital Load Measurement System
No ratings yet
Dlms-Network V 1.0: Digital Load Measurement System
16 pages
Introduction To Matlab Expt - No: Date: Objectives: Gcek, Electrical Engineering Department
No ratings yet
Introduction To Matlab Expt - No: Date: Objectives: Gcek, Electrical Engineering Department
34 pages
BPP (Checklist of Tools) TESDA OP CO 03 Accreditation ACs Forms
No ratings yet
BPP (Checklist of Tools) TESDA OP CO 03 Accreditation ACs Forms
30 pages
Compuational Thinking
No ratings yet
Compuational Thinking
17 pages
Customer Behavior Analysis in E-Commerce Using Decision Tree Machine Learning Approach
No ratings yet
Customer Behavior Analysis in E-Commerce Using Decision Tree Machine Learning Approach
9 pages
Email and Online Communication
No ratings yet
Email and Online Communication
29 pages
Pending List 12.08.2021
No ratings yet
Pending List 12.08.2021
29 pages
ARM Development Tools
50% (2)
ARM Development Tools
5 pages
C en TR E: Syllogism
No ratings yet
C en TR E: Syllogism
7 pages
Français QCM
No ratings yet
Français QCM
18 pages
Coursera Agile Course
No ratings yet
Coursera Agile Course
1 page
180R - 2 - Vocabulary List
No ratings yet
180R - 2 - Vocabulary List
13 pages
Computer System Architecture CHO
No ratings yet
Computer System Architecture CHO
7 pages
Coaching Combination Play Build Up Midfield Rotation Coordinated
No ratings yet
Coaching Combination Play Build Up Midfield Rotation Coordinated
8 pages
2023-02 JDI IT - Systems Security Engineer
No ratings yet
2023-02 JDI IT - Systems Security Engineer
2 pages
CH 2 Data Communication Networking Network Model Multiple Choice Questions and Answers PDF Behrou PDF
No ratings yet
CH 2 Data Communication Networking Network Model Multiple Choice Questions and Answers PDF Behrou PDF
17 pages
All Assignments GDBs Quzzies Solutoins File Links
No ratings yet
All Assignments GDBs Quzzies Solutoins File Links
13 pages
PCGamer MAY2013
No ratings yet
PCGamer MAY2013
99 pages
The Morphology of English
No ratings yet
The Morphology of English
22 pages
CED Module 1
No ratings yet
CED Module 1
5 pages
Entrepreneur 5 P.M. To 9 A.M. - Launching A Profitable Start-Up Without Quitting Your Job (PDFDrive)
No ratings yet
Entrepreneur 5 P.M. To 9 A.M. - Launching A Profitable Start-Up Without Quitting Your Job (PDFDrive)
67 pages
Cloud Computing and Security
No ratings yet
Cloud Computing and Security
567 pages

Introduction To LLMS: Transformers Types of Llms Configuration Settings

Uploaded by

Introduction To LLMS: Transformers Types of Llms Configuration Settings

Uploaded by

TRANSFORMERS TYPES OF LLMS CONFIGURATION SETTINGS

Introduction to LLMs – Can scale efficiently to use multi-core GPUs

Max new tokens Maximum number of tokens

LLM Instruction Task-specific fine-tuning involves training a pre-trained

(e.g., various tasks, non-deterministic outputs, equally

Label this review: Label this review:

Outputs h 2 - Introduce a pair of rank

LLM Compute Challenges Memory Challenge

and Scaling Laws RuntimeError : CUDA out of memory

Preference Reminder on Reinforcement Learning Steps Objective:

Fine-Tuning (Part 1) Type of ML in which an agent learns to make decisions

Preference PPO iteratively updates the policy to maximize the reward,

Fine-Tuning (Part 2) proximity to the previous version within a defined range

FINETUNING WITH RL by backpropagation: fest that left me breathless!”

staying within reliable bounds. An RLHF pipeline is difficult to implement:

You might also like

FINETUNING WITH RL by backpropagation: fest that left me breathless!”