0% found this document useful (0 votes)
9 views

14 A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders (1)

The document discusses the rise of Generative AI and Large Language Models (LLMs), highlighting their significance in the technology sector following the popularity of ChatGPT. It explains the Transformer architecture that underpins LLMs, detailing their training process which includes pre-training on vast amounts of text and fine-tuning for specific tasks. The document emphasizes that while LLMs excel at predicting text, they do not inherently understand the meaning behind the words they generate.

Uploaded by

ulascankutay
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

14 A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders (1)

The document discusses the rise of Generative AI and Large Language Models (LLMs), highlighting their significance in the technology sector following the popularity of ChatGPT. It explains the Transformer architecture that underpins LLMs, detailing their training process which includes pre-training on vast amounts of text and fine-tuning for specific tasks. The document emphasizes that while LLMs excel at predicting text, they do not inherently understand the meaning behind the words they generate.

Uploaded by

ulascankutay
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

A Guide to Building AI

Applications Using Large


Language Models (LLMs)
for Leaders
A Guide to Building AI
Applications LLMs for Leaders
• The discussion over Generative AI exploded last year after
the massively viral growth of ChatGPT.
• Along with that, a number of new terms entered the regular
parlance in the technology world - LLMs, Transformers,
Mistral, Hugging Face, RAG, Knowledge Graph, Stable
Diffusion, Vectors, LoRA, PEFT, so on and so forth.

• massively = kitlesel • Discussion = tartışma


• explode = patlamak, birden…-meye başlamak • Regular = yaygın
• Term = terim
• So forth = ve bunu gibi, gibi şeyler
• Parlance = deyim
• In fact, if one made a list of jargons that emerged in the last one
year, it would possibly be more than what we have witnessed
over a decade or more in the world of technology.
• It also came with a question (thanks to Blockchain, Crypto and its
associated scams): ‘is this another passing fad, or something
real?’
• Now, the reports are in. Over 70% of business executives surveyed
by Gartner reported a top-down push for generative AI
implementation.
• Another McKinsey survey showed that 75% of participants
anticipated generative AI to significantly impact their industries
• within three years.
Emerge = ortaya çıkmak • Push = baskı
• Witness = tanık olmak, şahit olmak • Participant = katılımcı
• Decade = 10 yıl • Scam = dolandırıcılık
• Passing = geçici • Anticipate =beklemek, ummak
• Fad = heves, geçici bir moda
• Executive = yönetici
• Survey = anket yapmak, araştırmak
What Are Large Language
Models?
• Large Language Models (LLMs) are AI models that excel at
processing and generating text (or code, as we will see
later).
• Technically, they are deep learning models built using an AI
architecture that became popular after the 2017 paper:
‘Attention Is All You Need’. This architecture is known as the
Transformer architecture.
• So what are Transformers, then? Transformers are a specific
kind of neural network adept at handling sequential data,
like text.
• Excel = mükemmel • Handling = ele almak, işlemek
• Attention = dikkat • Kind = tür
• Architecture = mimari
• Transformer = dönüştürücü
• Adept = becerikli
• This neural network model relies solely on an attention mechanism, a
technique that focuses on important parts of data (like key words in a
sentence).
• This approach is unlike the previous models which used recurrent neural
networks or convolutions.
• This focus on attention allows Transformers to process sequential data, such as
text, more effectively.
• Researchers found that not only are Transformers better at machine translation
tasks where sequential data is at play, they are also faster to train.
• By the way, this paper took the AI world by storm, and it is a must read for
anyone looking to understand the shift in the approach that the AI community
has taken towards building AI models.
• Rely = dayanmak • At play = söz konusu
• Solely = yalnızca • Allow = mümkün kılmak
• Attention = dikkat • Paper = makale
• Convolution = evrişim • Shift = değişim
• Sequential = sıralı • Looking to do sth = birşeyi yapmayı istemek
• Effectively = etkili bir şekilde
• Recurrent = tekrarlayan
Large Language Models Are Built
through Training. But Why?
• So, LLMs are built on the Transformer architecture and they
are trained on massive amounts of text data.
• However, simply putting together a neural network won’t
make it an LLM.
• It needs to be trained for it to actually work.
• The training process is a multi-stage process that involves
feeding the model massive amounts of text data and fine-
tuning its abilities.

• Massive = çok büyük • Feed = beslemek


• Put together = bir araya getirmek • Fine tüne = ince ayar
• Actually = gerçekten • Ability = yetenek
• Multi stage = çok aşamalı
• involve = içermek
• It typically involves use of GPUs and GPU clusters, and may take days, weeks or
even months (especially the pre-training step).
• Here's how it happens:
• Pre-training (Self-supervised Learning): This is the initial stage where the LLM
is exposed to a vast amount of unlabeled text data like books, articles, and
code.
• The model isn't given specific tasks but learns by predicting the next word in a
sequence or filling in missing pieces of text.
• This helps the LLM grasp the overall structure and patterns of language.
• Fine-tuning (Supervised Learning): After pre-training, the LLM is focused on
specific tasks through supervised learning.
• Here, labeled data with clear inputs and desired outputs is used.
• Take = sürmek • Grasp = kavramak
• Expose = maruz kalmak • Through = yoluyla
• Unlabeled = etiketlenmemiş • Supervised = denetimli
• Cluster = küme • Clear = net
• Fill = doldurmak
• Missing = eksik
• Overall = genel
• The model learns by comparing its generated responses to the correct outputs,
refining its ability to perform tasks like question answering or writing different
kinds of creative content.
• Why do we need these two stages? After the pre-training stage, an LLM would
have a good grasp of language mechanics but wouldn't necessarily understand
the meaning or real-world applications of language.
• Think of it as a child who's learned the alphabet and can sound out words but
doesn't understand the stories those words create.
• Here's an example: You ask the LLM to complete the sentence "The cat sat on
the..."
• After pre-training, the LLM might be good at predicting upcoming letters based
on patterns. It could respond with "...mat," a common word following "the cat
sat on."
• …
• However, it wouldn't necessarily understand the concept of a cat or a mat, nor
the physical possibility of a cat sitting on it.
• It would simply be using its statistical knowledge of what word typically follows
that sequence.
• This is a vital thing to understand. LLMs are not magical. They are simply built
on a neural network model that’s great at predicting the next word (rather,
token) after it has been trained.
• At this point, it's worthwhile understanding the meaning of a ‘token’, as you
would come across this in AI literature quite often.
• Tokens are the smallest units of meaning in a language. In Natural Language
Processing (NLP), tokens are typically created by dividing a sentence (or a
document) into words or other meaningful units like phrases based on the
tokenization algorithm used.
• Token = belirteç
• Therefore, the tokenization process involves converting text into a series of
tokens.
• Now, think about it. Language in our world works in a similar way. When
constructing a sentence, we start with a character or a word.
• Once we have the first word, the next word (or character, or phrase) is chosen
by us from a finite number of possible words (or characters, or phrases) that
we could use, and so on. And this is how a sentence eventually comes
together.
• This approach has come to be known as ‘next token prediction’, and is
incredibly powerful.
• Researchers are applying this approach to a range of other domains, and it is
giving miraculous results. For instance, researchers at Meta recently used this
technique to train an AI model to understand a 3D scene.
• …
• A 3D scene, if you think about it, can also be sequential. You have a wall, next
to which you have a door, above which you have a ceiling, and below which
you have a floor.
• So, if an AI model is able to predict the next word, why can’t it predict the next
object in a scene?
• A similar example is that of code. Code is entirely sequential. When you train
LLMs on code, they learn to predict the next ‘token’, in a very similar way that
it learned to predict the next word.
• Now, if pre-training has already trained the LLM to predict the next word (or,
token), why do we need to ‘fine-tune’ it?

• …

You might also like