14 A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders (1)
The document discusses the rise of Generative AI and Large Language Models (LLMs), highlighting their significance in the technology sector following the popularity of ChatGPT. It explains the Transformer architecture that underpins LLMs, detailing their training process which includes pre-training on vast amounts of text and fine-tuning for specific tasks. The document emphasizes that while LLMs excel at predicting text, they do not inherently understand the meaning behind the words they generate.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
9 views
14 A Guide to Building AI Applications Using Large Language Models (LLMs) for Leaders (1)
The document discusses the rise of Generative AI and Large Language Models (LLMs), highlighting their significance in the technology sector following the popularity of ChatGPT. It explains the Transformer architecture that underpins LLMs, detailing their training process which includes pre-training on vast amounts of text and fine-tuning for specific tasks. The document emphasizes that while LLMs excel at predicting text, they do not inherently understand the meaning behind the words they generate.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11
A Guide to Building AI
Applications Using Large
Language Models (LLMs) for Leaders A Guide to Building AI Applications LLMs for Leaders • The discussion over Generative AI exploded last year after the massively viral growth of ChatGPT. • Along with that, a number of new terms entered the regular parlance in the technology world - LLMs, Transformers, Mistral, Hugging Face, RAG, Knowledge Graph, Stable Diffusion, Vectors, LoRA, PEFT, so on and so forth.
• massively = kitlesel • Discussion = tartışma
• explode = patlamak, birden…-meye başlamak • Regular = yaygın • Term = terim • So forth = ve bunu gibi, gibi şeyler • Parlance = deyim • In fact, if one made a list of jargons that emerged in the last one year, it would possibly be more than what we have witnessed over a decade or more in the world of technology. • It also came with a question (thanks to Blockchain, Crypto and its associated scams): ‘is this another passing fad, or something real?’ • Now, the reports are in. Over 70% of business executives surveyed by Gartner reported a top-down push for generative AI implementation. • Another McKinsey survey showed that 75% of participants anticipated generative AI to significantly impact their industries • within three years. Emerge = ortaya çıkmak • Push = baskı • Witness = tanık olmak, şahit olmak • Participant = katılımcı • Decade = 10 yıl • Scam = dolandırıcılık • Passing = geçici • Anticipate =beklemek, ummak • Fad = heves, geçici bir moda • Executive = yönetici • Survey = anket yapmak, araştırmak What Are Large Language Models? • Large Language Models (LLMs) are AI models that excel at processing and generating text (or code, as we will see later). • Technically, they are deep learning models built using an AI architecture that became popular after the 2017 paper: ‘Attention Is All You Need’. This architecture is known as the Transformer architecture. • So what are Transformers, then? Transformers are a specific kind of neural network adept at handling sequential data, like text. • Excel = mükemmel • Handling = ele almak, işlemek • Attention = dikkat • Kind = tür • Architecture = mimari • Transformer = dönüştürücü • Adept = becerikli • This neural network model relies solely on an attention mechanism, a technique that focuses on important parts of data (like key words in a sentence). • This approach is unlike the previous models which used recurrent neural networks or convolutions. • This focus on attention allows Transformers to process sequential data, such as text, more effectively. • Researchers found that not only are Transformers better at machine translation tasks where sequential data is at play, they are also faster to train. • By the way, this paper took the AI world by storm, and it is a must read for anyone looking to understand the shift in the approach that the AI community has taken towards building AI models. • Rely = dayanmak • At play = söz konusu • Solely = yalnızca • Allow = mümkün kılmak • Attention = dikkat • Paper = makale • Convolution = evrişim • Shift = değişim • Sequential = sıralı • Looking to do sth = birşeyi yapmayı istemek • Effectively = etkili bir şekilde • Recurrent = tekrarlayan Large Language Models Are Built through Training. But Why? • So, LLMs are built on the Transformer architecture and they are trained on massive amounts of text data. • However, simply putting together a neural network won’t make it an LLM. • It needs to be trained for it to actually work. • The training process is a multi-stage process that involves feeding the model massive amounts of text data and fine- tuning its abilities.
• Massive = çok büyük • Feed = beslemek
• Put together = bir araya getirmek • Fine tüne = ince ayar • Actually = gerçekten • Ability = yetenek • Multi stage = çok aşamalı • involve = içermek • It typically involves use of GPUs and GPU clusters, and may take days, weeks or even months (especially the pre-training step). • Here's how it happens: • Pre-training (Self-supervised Learning): This is the initial stage where the LLM is exposed to a vast amount of unlabeled text data like books, articles, and code. • The model isn't given specific tasks but learns by predicting the next word in a sequence or filling in missing pieces of text. • This helps the LLM grasp the overall structure and patterns of language. • Fine-tuning (Supervised Learning): After pre-training, the LLM is focused on specific tasks through supervised learning. • Here, labeled data with clear inputs and desired outputs is used. • Take = sürmek • Grasp = kavramak • Expose = maruz kalmak • Through = yoluyla • Unlabeled = etiketlenmemiş • Supervised = denetimli • Cluster = küme • Clear = net • Fill = doldurmak • Missing = eksik • Overall = genel • The model learns by comparing its generated responses to the correct outputs, refining its ability to perform tasks like question answering or writing different kinds of creative content. • Why do we need these two stages? After the pre-training stage, an LLM would have a good grasp of language mechanics but wouldn't necessarily understand the meaning or real-world applications of language. • Think of it as a child who's learned the alphabet and can sound out words but doesn't understand the stories those words create. • Here's an example: You ask the LLM to complete the sentence "The cat sat on the..." • After pre-training, the LLM might be good at predicting upcoming letters based on patterns. It could respond with "...mat," a common word following "the cat sat on." • … • However, it wouldn't necessarily understand the concept of a cat or a mat, nor the physical possibility of a cat sitting on it. • It would simply be using its statistical knowledge of what word typically follows that sequence. • This is a vital thing to understand. LLMs are not magical. They are simply built on a neural network model that’s great at predicting the next word (rather, token) after it has been trained. • At this point, it's worthwhile understanding the meaning of a ‘token’, as you would come across this in AI literature quite often. • Tokens are the smallest units of meaning in a language. In Natural Language Processing (NLP), tokens are typically created by dividing a sentence (or a document) into words or other meaningful units like phrases based on the tokenization algorithm used. • Token = belirteç • Therefore, the tokenization process involves converting text into a series of tokens. • Now, think about it. Language in our world works in a similar way. When constructing a sentence, we start with a character or a word. • Once we have the first word, the next word (or character, or phrase) is chosen by us from a finite number of possible words (or characters, or phrases) that we could use, and so on. And this is how a sentence eventually comes together. • This approach has come to be known as ‘next token prediction’, and is incredibly powerful. • Researchers are applying this approach to a range of other domains, and it is giving miraculous results. For instance, researchers at Meta recently used this technique to train an AI model to understand a 3D scene. • … • A 3D scene, if you think about it, can also be sequential. You have a wall, next to which you have a door, above which you have a ceiling, and below which you have a floor. • So, if an AI model is able to predict the next word, why can’t it predict the next object in a scene? • A similar example is that of code. Code is entirely sequential. When you train LLMs on code, they learn to predict the next ‘token’, in a very similar way that it learned to predict the next word. • Now, if pre-training has already trained the LLM to predict the next word (or, token), why do we need to ‘fine-tune’ it?
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)