0% found this document useful (0 votes)

200 views92 pages

2023 LLMBC LLM Foundations

The document provides an agenda for an LLM Bootcamp covering foundations of machine learning including key ideas in ML, types of ML, and how neural networks are inspired by and formalize concepts from biology. It discusses how neural networks work via matrix multiplications in layers and how they are trained through backpropagation to minimize loss on batches of data. It also covers splitting datasets into training, validation, and test sets and how large models can be fine-tuned on smaller datasets.

Uploaded by

abhijitkalita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views92 pages

2023 LLMBC LLM Foundations

Uploaded by

abhijitkalita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

LLM Bootcamp 

2023

LLM Foundations
Sergey Karayev

APRIL 21, 2023

LLMBC 2023

Agenda

00 01 02 03
FOUNDATIONS TRANSFORMER TRAINING
NOTABLE LLMS
OF ML ARCHITECTURE & INFERENCE
Speedrun Core ideas and
key ideas in ML notable examples T5, GPT, Chinchilla, et al Running a Transformer

2
 
 
 
 
 
00

Foundations of
Machine Learning

LLMBC 2023

Traditional Programming vs Machine Learning

Input Training Input

📄 📄
📄📄 📄

👩💻 🤖 👩💻 🤖 🤖
Output Output

Software 1.0 Software 2.0

4
LLMBC 2023

Types of Machine Learning

Unsupervised Learning Supervised Learning Reinforcement Learning

Learn structure of data to Learn how data maps to Learn how to act in an
generate more data labels to recognize or predict environment to obtain reward

"This product does what

it is supposed __"
cat

"Hey
Siri"

5
LLMBC 2023

Converging on just...

Supervised or Self-supervised Learning

cat
"This product does what
next
it is supposed __" "to." move
"Hey
Siri"

6
LLMBC 2023

Inputs and outputs are always just numbers

Input Output

What we see "Lincoln"

What the machine "sees" [76, 105, 110, 99, 111, 108, 110]

7
LLMBC 2023

Why is this hard?

"I loved this movie"
• Infinite variety of inputs can all mean the "As good as The Godfather"
same thing "🔥 no cap"

• Meaningful differences can be tiny

• Structure of the world is complex

LLMBC 2023

How is it done?

• Many methods for Machine Learning

- Logistic Regression

- Support Vector Machines

- Decision Trees (xgboost)

• But one is dominant

- Neural Networks (also called Deep Learning)

LLMBC 2023

Inspiration
Output: say "cat"

• Inspired by what we know to be

intelligent: the brain

Input: see a cat

• The brain is composed of billions
of neurons
https://www.the-scientist.com/the-nutshell/what-made-human-brains-so-big-36663
• Each neuron receives electrical
Inputs inputs and sends an electrical
output
Output
• The brain itself has high-level
inputs and outputs

https://medicalxpress.com/news/2018-07-neuron-axons-spindly-theyre-optimizing.html

LLMBC 2023

Formalization
Output: say "cat"

Input: see a cat

https://www.the-scientist.com/the-nutshell/what-made-human-brains-so-big-36663

Inputs Inputs

Output
Output

https://medicalxpress.com/news/2018-07-neuron-axons-spindly-theyre-optimizing.html
https://www.jessicayung.com/explaining-tensor ow-code-for-a-multilayer-perceptron/

11
fl
LLMBC 2023

A "perceptron" is a vector of numbers

https://www.jessicayung.com/explaining-tensor ow-code-for-a-multilayer-perceptron/

12
fl
LLMBC 2023

A "layer" is a matrix of numbers

https://www.jessicayung.com/explaining-tensor ow-code-for-a-multilayer-perceptron/

13
fl
LLMBC 2023

The neural network is a set of matrices

Called "parameters" or "weights"

NN operations are just matrix multiplications.

GPUs are really fast at matrix multiplications.
14

LLMBC 2023

Training
• Data X (e.g. images), labels y (e.g.
labels)

• Take a little batch of data x:

- Use the current model to make a

prediction x -> y'

- Compute loss(y, y')

- Back-propagate the loss through all

the layers of the model

• Repeat until loss stops decreasing https://www.guru99.com/backpropogation-neural-network.html

https://developers.google.com/machine-learning/testing-debugging/metrics/interpretic 15

LLMBC 2023

Dataset Splitting
• Split (X, y) into training (~80%),
validation (~10%), and test (~10%)
sets

• Validation set is for

- ensuring that training is not
"over tting"

- setting hyper-parameters of the

model (e.g. number of parameters)
THIS APPLIES TO YOUR
• Test set is for measuring validity of EXPERIMENTATION
predictions on new data
WITH PROMPTS!

LLMBC 2023

A LOT of data Much less data

Large model Fine-tuned large model

Pre-training: Fine-tuning:
slow training on a lot of data fast training on a little data

LLMBC 2023

Model Hubs

• People share pre-

trained models!

• 🤗: the most popular

Model Hub

- 180K models

- 30K datasets

LLMBC 2023

Before ~2020: each task had its own NN architecture

http://lucasb.eyer.be/transformer 19
LLMBC 2023

Now: all is Transformers

20
Transformer cartoon (DALL-E)
01

The Transformer
Architecture

10
LLMBC 2023

Attention is all you need (2017)

• Ground-breaking architecture that set

SOTA on rst translation and later all
other NLP tasks

• For simplicity, can just look at one half

of it

22
fi

LLMBC 2023

Transformer Decoder Overview

• Task is to complete text
- "It's a blue" -> "sundress"
• Inputs: a sequence of N tokens
- [It's, a, blue]
• Output:
- Probability distribution over the next token
• Inference:
- Sample the next token from the distribution,
append it to inputs, run through the model
again, sample, append, etc.
23

LLMBC 2023

Inputs
• Inputs need to be vectors of numbers
• Start with original text:
- "It's a blue sundress."
• Turn into a sequence of tokens:
- [<SOS>, It, 's, a, blue, sund, ress, ., <EOS>]
• Turn into vocabulary IDs:
- [0, 1026, 338, 257, 4171, 37437, 601, 13, 1]
• Each ID can be represented by a one-hot vector
- e.g. 3 -> [0, 0, 0, 1, 0, 0, 0, 0, ...]
24

LLMBC 2023

Input Embedding
• One-hot vectors are poor representations of words or tokens

- e.g. distance between "cat" and "kitten" is the same as

between "cat" and "tractor"

• Solution: learn an embedding matrix!

- (The simplest NN layer type)

VxV VxE VxE

embedding
matrix 25
 

LLMBC 2023

Attention
• (Ignore "Masked Multi-Head"
for now)

• Key insight: for a given token

in the output sequence, only
one or a few tokens in the
input sequence are most
important

• Introduced in 2015 for

translation tasks

https://lilianweng.github.io/posts/2018-06-24-attention/ 26

 
LLMBC 2023

Basic self-attention 1. Compute

attention
2. Combine
attention-
weights weighted
• Input: sequence of vectors inputs

• Output: sequence of vectors, each one a

weighted sum of the input sequence

- weight is just dot product 

between input vectors

- (made to sum to 1)

http://www.peterbloem.nl/blog/transformers 27
 
 
 
LLMBC 2023

Basic self-attention

• Note that every input

vector is used in 3
ways:

- Query

- Key

- Value

http://www.peterbloem.nl/blog/transformers 28

LLMBC 2023

Basic self-attention
• Problem: there's no learning
involved!

• Solution: project inputs

into query, key, value roles

• Learning these matrices = learning

attention

http://lucasb.eyer.be/transformer
http://www.peterbloem.nl/blog/transformers 29

LLMBC 2023

Multi-head attention

• We can allow di erent ways of transforming

into queries, keys, and values to be learned

• Simply means learning di erent sets of W_q,

W_k, and W_v matrices simultaneously.

- (Actually implemented as a single matrix,

anyway.)

3-headed attention

http://www.peterbloem.nl/blog/transformers 30
ff
ff

LLMBC 2023

Masking attention
In training:

https://jalammar.github.io/illustrated-

Note how you shouldn't see future

tokens when predicting http://www.peterbloem.nl/blog/transformers 31
LLMBC 2023

Masked Multi-Head Attention

• Conceptual view:

- token comes in

- gets "augmented" with previously-seen

tokens that seem relevant (masked self-
attention)

- this happens in several di erent ways

simultaneously (multiple heads)

• NOTE: there's no notion of "position" so far!

LLMBC 2023

Positional Encoding
• Attention is totally position-invariant!

- e.g. [this, movie, is, great] is the same

as [movie, this, great, is]

• So, let's add position-encoding vectors

to embedding vectors

- It really is that simple

LLMBC 2023

Add
• "Skip connections" aka "residual blocks"

- output = module(input) + input

- Allows gradient to ow from the loss function all

the way to the rst layer

- (Possible because each module's output is the

same shape as its input)

34
fi
fl

LLMBC 2023

Layer Normalization
• Neural net modules perform best when input vectors have
uniform mean and std in each dimension.

• As inputs ow through the network, means and std's get

blown out.

• Layer Normalization is a hack to reset things to where we

want them in between layers.

output = module(layer_norm(input)) + input

35
https://arxiv.org/pdf/1803.08494.pdf
fl

LLMBC 2023

Feed Forward Layer

• Standard Multi-Layer Perceptron with one hidden
layer

• De ned y = W2 (GeLU (W1 x + b1 )) + b2

• Conceptual view:

- token (augmented with other relevant tokens

that it has seen) comes in...

- ...and "upgrades" its representation

36
fi

LLMBC 2023

Transformer Architecture
https://aizi.substack.com/p/how-does-gpt-3-spend-its-175b-parameters
• The main Transformer Layer is stacked many times

• The overall hyperparameters are:

- Number of layers

- Embedding dimension

- Number of attention heads

• The largest models are ~70%

feed-forward weights

LLMBC 2023

Why does this work so well?

38
LLMBC 2023

Thinking like Transformers

• Restricted Access Sequence

Processing (RASP, 2021):
programming language of
Transformer-implementable
operations

https://srush.github.io/raspy/ 39
LLMBC 2023

We mostly don't understand it, though

• Much great work from
Anthropic if this has
captured your curiosity!

40
LLMBC 2023

Should you be able to code a Transformer?

• De nitely not necessary!

• BUT: it's not di cult, it is fun, and is probably worth doing

• Andrej Karpathy's GPT-2 implementation is <400 lines of code,

including Attention and MLP blocks

https://www.youtube.com/watch?v=kCc8FmEb1nY
41
fi
ffi

LLMBC 2023

Resources

• Lucas Beyer's Lecture on Transformers

• Peter Bloem's "Transformers from Scratch"

• Nelson Elhage's "Transformers for Software Engineers" for a

di erent view

• Andrej Karpathy's entire Neural Networks: Zero to Hero video series

• Lillian Weng's "The Transformer Family v2" megapost

42
ff

LLMBC 2023

Questions?

43
03

Notable LLMs

30
LLMBC 2023

Three Easy Pieces

From Lukas Beyer’s lecture Intro to Transformers.

LLMBC 2023

BERT (2019)
• Bidirectional Encoder Representations from Transformers

• Encoder-only (no attention masking)

• 110M params

• 15% of all words masked out

• Was great, now dated

From Lukas Beyer’s lecture Intro to Transformers. 46

LLMBC 2023

T5: Text-to-Text Transfer Transformer (2020)

• Input and output are both text
strings

• Encoder-Decoder architecture

• 11B parameters

• Still could be a good choice for

ne-tuning!

From Lukas Beyer’s lecture Intro to Transformers. 47

LLMBC 2023

T5 Training Data
• Unsupervised pre-training on Colossal Clean
Crawled Corpus (C4)

- Start with Common Crawl (over 50TB of

compressed data, 10B+ web pages)

- Filtered down to ~800GB, or ~160B tokens

• Also trained on academic supervised tasks

https://paperswithcode.com/dataset/c4
https://stanford-cs324.github.io/winter2022/lectures/data/
48

LLMBC 2023

GPT / GPT-2 (2019)

• Generative Pre-trained Transformer

• Decoder-only (uses masked self-

attention)

• Largest model is 1.5B

49
From Lukas Beyer’s lecture Intro to Transformers.

LLMBC 2023

GPT-2 Training Data

• Found that Common Crawl has major data quality

issues

• Formed the WebText dataset

- scraped all outbound links (45M) from Reddit which

received at least 3 karma

• After de-duplication and some heuristic ltering, left

with 8M documents for a total of 40GB of text

https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 50

fi
LLMBC 2023

Byte Pair Encoding

• How does GPT

tokenize?

• Middle ground between

- old-school NLP
tokenization, where
out-of-vocab words
would be replaced by
a special token

- UTF-8 bytes

LLMBC 2023

GPT-3 (2020)
• Just like GPT-2, but 100x larger (175B params)

• Exhibited unprecedented few-shot and zero-

shot learning

LLMBC 2023

GPT-3 Training Data

• For a total of 500B tokens

• But trained on only 300B!

https://arxiv.org/pdf/2005.14165.pdf 53

LLMBC 2023

GPT-4 (2023)

54
55
LLMBC 2023

https://ourworldindata.org/grapher/ai-training-computation
LLMBC 2023

The Bitter Lesson

LLMs

56
LLMBC 2023

But what exactly is the relationship between

model size and dataset size?
LLMBC 2023

Chinchilla (2022)
• Empirically derived formulas for
optimal model and training set
size given a xed compute budget

• Found that most LLMs are

"undertrained"

• Trained Chinchilla (70B) vs Gopher

(280B) at the same compute
budget, by using 4x fewer params
and 4x more data

• (Note that this is for one epoch)

https://arxiv.org/pdf/2203.15556.pdf 58
fi

LLMBC 2023

LLaMA (2023)

• "Chinchilla-optimal" open-source
LLMs from Meta

• Several sizes from 7B to 65B,

trained on at least 1T tokens

• Benchmarks competitively against

GPT-3 and other LLMs

• Open-source, but non-commercial

LLMBC 2023

LLaMA Training Data

• Custom quality- ltering of
CommonCrawl + some C4 + Github
+ Wikipedia + Books + ArXiV +
Stack Exchange

• RedPajama:
open-source
recreation

https://arxiv.org/pdf/2302.13971.pdf

https://www.together.xyz/blog/redpajama 60
 
 

fi
LLMBC 2023

Including code in training data

• T5 and GPT-3 (2020) speci cally
removed code. But most recent
models are trained on ~5% code. Why?

• Code-speci c models such as OpenAI

Codex (2021) was GPT-3 further
trained on public GitHub code.

• Empirically, this improved

performance on non-code tasks!

• Open-source dataset: The Stack (3TB

of permissively licensed source code)

Yao Fu et al. How does GPT Obtain its Ability? 61

fi
fi

LLMBC 2023

And there's another important part of the

story: Instruction Tuning
LLMBC 2023

Few-shot vs Zero-shot

• At the time of GPT-3 (2020), the

mindset was mostly few-shot

- e.g. text completion

• By the time of ChatGPT (2022), the

mindset was all zero-shot

- e.g. instruction-following

LLMBC 2023

Supervised Fine-tuning

• Very little text in the original GPT-3

dataset is of the zero-shot form.

• To improve performance on zero-shot

inputs, ne-tuned on a smaller high-
quality dataset of instructions-
completions

• (Sourced from thousands of contractors)

https://openai.com/blog/how-should-ai-systems-behave 64
fi

LLMBC 2023

InstructGPT/GPT-3.5

• Had humans rank di erent

GPT-3 outputs, and used
RL to further ne-tune the
model

• Much better at following

instructions

• Released as text-
davinci-002 in OpenAI API

https://openai.com/blog/instruction-following/ 65

fi
ff
LLMBC 2023

ChatGPT

• Further RLHF on conversations

• ChatML format (messages from system, assistant, user roles)

LLMBC 2023

The GPT Lineage

Yao Fu's How does GPT Obtain its Ability? 67

LLMBC 2023

"Alignment Tax"

• Instruction-tuning increases the

model's zero-shot ability, but at a
cost

- Con dence becomes less

calibrated Base GPT-4 RLHF GPT-4

- Few-shot ability su ers

GPT-4 Technical Report - https://arxiv.org/pdf/2303.08774.pdf 68

ff
LLMBC 2023

It's possible to "steal" RLHF

• Got 52K instruction-following demonstrations from text-

davinci-003, then ne-tuned LLaMA on them.

69
fi
LLMBC 2023

OpenAssistant

• April 2023 dataset release

• 160K messages across 66K

conversation trees, 35 languages,
460K quality ratings, 13.5K
volunteers

https://huggingface.co/datasets/OpenAssistant/oasst1 70

LLMBC 2023

And one last idea

LLMBC 2023

Retrieval-enhanced Transformer (2021)

• Instead of both learning language and memorizing facts in the
model's params, why not just learn language in params, and
retrieve facts from a large database?

• BERT-encode sentences, store them in large DB (>1T tokens)

• Then, fetch matching sentences

and attend to them.

• Doesn't work as well as large LLMs.

Yet.

https://arxiv.org/pdf/2112.04426.pdf Dec 2021

LLMBC 2023

Resources

• Lillian Weng's "The Transformer Family v2" megapost

• Xavier Amatriain's Transformer Models Catalog

• Yao Fu's How does GPT Obtain its Ability?

LLMBC 2023

Questions?

74
04

Training & Inference

50
LLMBC 2023

Problems with training LLMs

• Massive amounts of data

• Massive models don't t on a single GPU or even a single multi-

GPU machine

• Long training runs are painful

LLMBC 2023

Problems with training LLMs

• Massive amounts of data

• Massive models don't t on a single GPU or even a single multi-

GPU machine

• Long training runs are painful

LLMBC 2023

Problems with training LLMs

• Massive amounts of data

• Massive models don't t on a single multi-GPU machine

• Long training runs are painful

78
fi

LLMBC 2023

Parallelism

• Data parallelism: spread a single batch of data

across GPUs

• Model parallelism: spread the model's layers

across GPUs

• Tensor parallelism: spread a single matrix op

across GPUs

https://openai.com/blog/techniques-for-training-large-neural-networks/ 79

LLMBC 2023

BLOOM (GPT-3 sized LM)

o r
ns
Te

Had to use multiple tricks ("3D parallelism")

from two great libraries: DeepSpeed and Megatron-LM

https://huggingface.co/blog/bloom-megatron-deepspeed
https://www.deepspeed.ai/training/ 80
 
LLMBC 2023

Sharded Data-Parallelism

Literally pass around model params between GPUs as computation is proceeding!

Helpful video:
https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/ 81
LLMBC 2023

Problems with training LLMs

• Massive amounts of data

• Massive models don't t on a single GPU or even a single multi-

GPU machine

• Long training runs are painful

LLMBC 2023

A glimpse into training hell

• Dozens of manual restarts, 70+ automatic restarts due to HW

failures

• Manual restarting from checkpoints when loss would diverge

• Switching optimizers and software versions in the middle of

training

https://arxiv.org/pdf/2205.01068.pdf 83

LLMBC 2023

"Training run babysitting"

https://openai.com/contributions/gpt-4 84
LLMBC 2023

Considerations for LLM inference

• Understanding auto-regressive sampling

• Improving (or not) runtime complexity

• Dealing with large model size

LLMBC 2023

Auto-regressive Sampling
• Remember that we sample tokens one at a time
- [It's, a, blue, ...

• The softmax outputs a peaky probability

distribution over possible next tokens

• Temperature parameter makes it less peaky

- t=0 will always sample the most likely next token;
- t=1 will often sample less-likely ones
• Human text is not all high-probability next words!

https://arxiv.org/abs/1904.09751 86

LLMBC 2023

Runtime Complexity
• Self-attention runs in O(N^2) for
sequence length N

• Many O(N) approximations have been

developed

• But none have provided a strict

improvement

• Recently, FlashAttention sped things

up via smart GPU programming

Based on "E cient Transformers: A Survey" by Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
and "Long Range Arena: A Benchmark for E cient Transformers" by Y Tay, M Dehghani, S Abnar, Y Shen, D Bahri, P Pham, J Rao, L Yang, S Ruder, D Metzler
87
http://lucasb.eyer.be/transformer https://github.com/HazyResearch/ ash-attention
ffi

ffi
fl
LLaMA Quantization LLMBC 2023
Dealing with Large Model Sizes
• Large subject! Lilian Weng from OpenAI
has a thorough post

• Quantization is most relevant to us https://github.com/ggerganov/llama.cpp

- LLM weights are usually in oat32 or 16

- Recent work (LLM.int8) has shown that

8-bit post-quantization is basically ne

- Even 4-bit seems ne!

https://arxiv.org/pdf/2208.07339.pdf
https://arxiv.org/pdf/2212.09720.pdf 88
fi

LLMBC 2023

Resources

• Megatron-LM (GitHub): probably still the best insights into training

LLMs at scale.

• OpenAI post Techniques for Training Large Neural Networks

• Lillian Weng's "Large Transformer Model Inference Optimization"

LLMBC 2023

Questions?

90
LLMBC 2023

Questions?

91
LLMBC 2023

Thanks!

@sergeykarayev /imagine green tropical parrot eating

stack of pancakes, flapjack breakfast,
hyper-realistic portrait, DSLR Canon
R5, chromatic aberration, accent
@full_stack_dl lighting, super resolution, hyper-
detailed, cinematic, OpenGL - Shaders

Generative AI in Practice
100% (11)
Generative AI in Practice
301 pages
AI Agents by Google
100% (6)
AI Agents by Google
42 pages
Mastering AI Agents
100% (2)
Mastering AI Agents
93 pages
The Prompt Engineering Guide V3 PDF
100% (5)
The Prompt Engineering Guide V3 PDF
17 pages
Computational Modeling in Cognition
No ratings yet
Computational Modeling in Cognition
377 pages
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
100% (3)
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
209 pages
Generative Ai Fundamentals v1
100% (15)
Generative Ai Fundamentals v1
80 pages
Agentic and Genai Aws Gcp
100% (3)
Agentic and Genai Aws Gcp
34 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Prompt Engineer 101
97% (29)
Prompt Engineer 101
45 pages
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
50% (2)
Natural language processing with TensorFlow Teach language to machines using Python s deep learning library 1st Edition Thushan Ganegedara 2024 scribd download
62 pages
226 ChatGPT Prompts A-Z ChatGPT Prompt Engineering BootCamp
93% (15)
226 ChatGPT Prompts A-Z ChatGPT Prompt Engineering BootCamp
120 pages
RAG - A Simple Introduction
100% (5)
RAG - A Simple Introduction
75 pages
RAG Architecture
100% (7)
RAG Architecture
52 pages
How To Cheat On Your Final Paper: Assigning AI For Student Writing
100% (1)
How To Cheat On Your Final Paper: Assigning AI For Student Writing
11 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
96% (27)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
CB Insights Generative AI Bible
100% (6)
CB Insights Generative AI Bible
122 pages
Tom Taulli - Generative AI - A Non-Technical Introduction-Apress (2023)
100% (7)
Tom Taulli - Generative AI - A Non-Technical Introduction-Apress (2023)
211 pages
Generative AI With Large Language Models
100% (2)
Generative AI With Large Language Models
31 pages
Coleman D. ChatGPT Empowers Your Prompt Engineering With AI Tools. GPT-4... 2023
100% (2)
Coleman D. ChatGPT Empowers Your Prompt Engineering With AI Tools. GPT-4... 2023
63 pages
Bro Log Vars
No ratings yet
Bro Log Vars
6 pages
Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
GPT-3, Bloviator - OpenAI's Language Generator Has No Idea What It's Talking About - MIT Technology Review
No ratings yet
GPT-3, Bloviator - OpenAI's Language Generator Has No Idea What It's Talking About - MIT Technology Review
12 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
LLM Cheatsheet
No ratings yet
LLM Cheatsheet
1 page
Evaluating LLM Models For Production Systems - Methods and Practices - Data Phoenix
No ratings yet
Evaluating LLM Models For Production Systems - Methods and Practices - Data Phoenix
61 pages
Nn4ir PDF
No ratings yet
Nn4ir PDF
290 pages
Multimodal Deep Learning
No ratings yet
Multimodal Deep Learning
21 pages
Graph Neural Network The Next Frontier in Deep Learning
No ratings yet
Graph Neural Network The Next Frontier in Deep Learning
1 page
Introduction To Learning: Frederic Precioso 24/01/2019
No ratings yet
Introduction To Learning: Frederic Precioso 24/01/2019
179 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Performance Analysis of LoRA Finetuning Llama-2
No ratings yet
Performance Analysis of LoRA Finetuning Llama-2
4 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Bioinformatics F&amp M 20100722 Bujak
100% (1)
Bioinformatics F&amp M 20100722 Bujak
27 pages
OS by JJsir
No ratings yet
OS by JJsir
269 pages
Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies 1st Edition Thimira Amaratunga - Download the full set of chapters carefully compiled
100% (1)
Understanding Large Language Models: Learning Their Underlying Concepts and Technologies 1st Edition Thimira Amaratunga - Download the full set of chapters carefully compiled
55 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Graph Neural Networks In Action Meap Version 4 Chapters 4 Of 8 Keita Broadwater pdf download
No ratings yet
Graph Neural Networks In Action Meap Version 4 Chapters 4 Of 8 Keita Broadwater pdf download
52 pages
Deep Learning Approaches For Network Int
No ratings yet
Deep Learning Approaches For Network Int
116 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
Data Visualization With Ma Thematic A
No ratings yet
Data Visualization With Ma Thematic A
46 pages
Machine Learning Interviews V 2 Week 11715787639480
0% (1)
Machine Learning Interviews V 2 Week 11715787639480
49 pages
State of AI Report - 2024 ONLINE
100% (1)
State of AI Report - 2024 ONLINE
213 pages
LLaVA - Large Multimodal Model
No ratings yet
LLaVA - Large Multimodal Model
15 pages
PyCUDA Tutorial
100% (1)
PyCUDA Tutorial
15 pages
Lecture8 PDF
No ratings yet
Lecture8 PDF
434 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
LSTM
No ratings yet
LSTM
42 pages
LLM Survey
100% (1)
LLM Survey
43 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
Top 100 Deep Learning Interview Questions
No ratings yet
Top 100 Deep Learning Interview Questions
157 pages
What Is The Need For Residual Learning?
No ratings yet
What Is The Need For Residual Learning?
3 pages
3D U-Net Based Brain Tumor Segmentation
No ratings yet
3D U-Net Based Brain Tumor Segmentation
11 pages
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
100% (1)
PDF Hands-on Time Series Analysis With Python: From Basics To Bleeding Edge Techniques B. V. Vishwas download
62 pages
huggingface basics
No ratings yet
huggingface basics
28 pages
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
No ratings yet
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
34 pages
Python Notes
No ratings yet
Python Notes
279 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
A Survey On Multimodal Large Language Models
No ratings yet
A Survey On Multimodal Large Language Models
18 pages
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
No ratings yet
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
43 pages
2018 Miccai PDF
No ratings yet
2018 Miccai PDF
239 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
The Big Book of Mlops: Ebook
100% (1)
The Big Book of Mlops: Ebook
36 pages
CDS NW Synthesis and Characterization.12
100% (1)
CDS NW Synthesis and Characterization.12
22 pages
LLM Paper
No ratings yet
LLM Paper
26 pages
Whitepaper_Foundational Large Language Models & Text Generation_v2
100% (1)
Whitepaper_Foundational Large Language Models & Text Generation_v2
86 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
2023 LLMBC Whats Next
No ratings yet
2023 LLMBC Whats Next
95 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
Machine Learning: What Is Data and Model? Machine Learning Workflow Distance Based Classifiers Bayes Decision Theory
No ratings yet
Machine Learning: What Is Data and Model? Machine Learning Workflow Distance Based Classifiers Bayes Decision Theory
81 pages
LLM .Foundation - Models.from - The.ground - Up
No ratings yet
LLM .Foundation - Models.from - The.ground - Up
195 pages
Master AI Literacy 365: One Page a Day
From Everand
Master AI Literacy 365: One Page a Day
Dr. Jaime K
No ratings yet
CBSA Certified Blockchain Solution Architect Exam Practice Test and Dumps CBSA Blockchain Exam Guidebook Updated Questions
From Everand
CBSA Certified Blockchain Solution Architect Exam Practice Test and Dumps CBSA Blockchain Exam Guidebook Updated Questions
Byte Books
No ratings yet
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
From Everand
PyTorch Cookbook: 100+ Solutions across RNNs, CNNs, python tools, distributed training and graph networks
Matthew Rosch
No ratings yet
PyTorch Cookbook
From Everand
PyTorch Cookbook
Matthew Rosch
No ratings yet
Generative AI: Navigating the Course to the Artificial General Intelligence Future
From Everand
Generative AI: Navigating the Course to the Artificial General Intelligence Future
Martin Musiol
No ratings yet
Top 100 Applications of Generative AI 1683282083
100% (14)
Top 100 Applications of Generative AI 1683282083
119 pages
Local LLM Inference and Fine-Tuning
100% (3)
Local LLM Inference and Fine-Tuning
26 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
Create LLM Application Using Langchain With Ease
100% (5)
Create LLM Application Using Langchain With Ease
12 pages
Vector Database Essentials
No ratings yet
Vector Database Essentials
26 pages
Prompt Engineering Lecture Elvis
100% (10)
Prompt Engineering Lecture Elvis
50 pages
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
50% (2)
Best Practices For Fine-Tuning and Prompt Engineering LLMs - Weights & Biases LLM Whitepaper
21 pages
RAG Technics
No ratings yet
RAG Technics
8 pages
Hands-On Guide to Agentic Corrective RAG-1
No ratings yet
Hands-On Guide to Agentic Corrective RAG-1
5 pages
Fine-Tuning Pre-Trained Models For Generative AI Applications
100% (2)
Fine-Tuning Pre-Trained Models For Generative AI Applications
19 pages
Generative Ai Terminology
100% (2)
Generative Ai Terminology
26 pages
Aryan A. What Is LLMOps. Large Language Models in Production 2024
No ratings yet
Aryan A. What Is LLMOps. Large Language Models in Production 2024
67 pages
CHAT GPT CHEAT CODES - v1.5
93% (45)
CHAT GPT CHEAT CODES - v1.5
77 pages
Jay Alammar - Visualizing Machine Learning One Concept at A Time.
No ratings yet
Jay Alammar - Visualizing Machine Learning One Concept at A Time.
15 pages
CS224N: GPT-2 Assignment Documentation
No ratings yet
CS224N: GPT-2 Assignment Documentation
30 pages
OpenAI
No ratings yet
OpenAI
44 pages
A Study of Generative Large Language Model For Medical Research and Healthcare
No ratings yet
A Study of Generative Large Language Model For Medical Research and Healthcare
10 pages
Toolformer - Language Models Can Teach Themselves To Use Tools
No ratings yet
Toolformer - Language Models Can Teach Themselves To Use Tools
17 pages
Project Example
No ratings yet
Project Example
19 pages
Deepfake Detection On Social Media Leveraging Deep Learning and FastText Embeddings For Identifying Machine-Generated Tweets
No ratings yet
Deepfake Detection On Social Media Leveraging Deep Learning and FastText Embeddings For Identifying Machine-Generated Tweets
14 pages
Chatgpt Theory Final
No ratings yet
Chatgpt Theory Final
18 pages
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
No ratings yet
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
14 pages
Situationalawareness 1 30
No ratings yet
Situationalawareness 1 30
30 pages
Advancements in Natural Language Understanding - Driven Machine Translation: Focus On English and The Low Resource Dialectal Lusoga
No ratings yet
Advancements in Natural Language Understanding - Driven Machine Translation: Focus On English and The Low Resource Dialectal Lusoga
11 pages
2411.17900v1
No ratings yet
2411.17900v1
12 pages
Sparkline Deep Learning
No ratings yet
Sparkline Deep Learning
13 pages
Foo Recipe
No ratings yet
Foo Recipe
14 pages
AI Case Study
100% (2)
AI Case Study
14 pages
HuggingFace GPT2
No ratings yet
HuggingFace GPT2
43 pages
Faster Than Training From Scratch - Fine-Tuning The English GPT-2 in Any Language With Hugging Face and Fastai v2 (Practical Case With Portuguese)
No ratings yet
Faster Than Training From Scratch - Fine-Tuning The English GPT-2 in Any Language With Hugging Face and Fastai v2 (Practical Case With Portuguese)
55 pages
ChatGPT for Higher Education and Professional Development_ A Guid (1)
No ratings yet
ChatGPT for Higher Education and Professional Development_ A Guid (1)
135 pages
Large Language Models: A Survey
No ratings yet
Large Language Models: A Survey
43 pages
Advanced Deep Learning and Transformers - Cirrincione
No ratings yet
Advanced Deep Learning and Transformers - Cirrincione
3 pages
AI Song Contest: Human-AI Co-Creation in Songwriting
No ratings yet
AI Song Contest: Human-AI Co-Creation in Songwriting
9 pages
Unit 4: Introduction To Engineering Design: Engr. Roman M. Richard, Meng
No ratings yet
Unit 4: Introduction To Engineering Design: Engr. Roman M. Richard, Meng
17 pages
ROHIT PROJECT sem 6
No ratings yet
ROHIT PROJECT sem 6
38 pages
Perceptions of Human and Machine-Generated Articles
No ratings yet
Perceptions of Human and Machine-Generated Articles
16 pages
From ChatGPT To ThreatGPT Impact of Generative AI
No ratings yet
From ChatGPT To ThreatGPT Impact of Generative AI
28 pages
Text Generative AI PPT 2 (1)
No ratings yet
Text Generative AI PPT 2 (1)
12 pages
What Do Code Models Memorize? An Empirical Study On Large Language Models of Code
No ratings yet
What Do Code Models Memorize? An Empirical Study On Large Language Models of Code
13 pages