Generative AI Notes
Generative AI Notes
Definition:
“Generative AI is a form of artificial intelligence that
generates new and original content, such as text, images, or
sounds, based on the patterns it has learned from
existing data. It goes beyond simply recognizing patterns or
making predictions and can create content that is
entirely new.”
Breakdown:
Generative AI isn't a new concept:
Components of SMT:
Evolution:
Working of GANs:
1. Initialization: Two neural networks are set up: The Generator (G),
which creates new data like images or text, and the Discriminator
(D), which checks if the data is real or fake.
2. Generator’s First Move: The Generator (G) starts by taking
random noise as input and turning it into a new data sample, like a
fake image.
3. Discriminator’s Turn: The Discriminator (D) gets two types of
data: real data from a training set and the fake data created by G. D
tries to tell if each piece of data is real or fake by giving a score
between 0 (fake) and 1 (real).
4. Learning Process: If D correctly identifies real and fake data, it
and G receive small rewards, but the goal is for G to improve
enough to trick D into thinking its fake data is real.
5. Generator’s Improvement: When D gets fooled and thinks G's
fake data is real, G gets better feedback and learns to create even
more realistic data.
6. Discriminator’s Adaptation: When D correctly spots fake data, it
gets better at identifying fakes, making it harder for G to succeed.
7. Ongoing Duel: This back-and-forth battle between G and D
continues until G gets so good at creating realistic data that D can
no longer easily tell the difference. At this point, G is well-trained
and can be used to generate new, realistic data samples.
Applications of GANs:
Image Link
Google Advance Course
Image Link
Medium Article Link
Autoregressive Models:
Definition:
“Autoregressive Models are statistical models used for
predicting future elements in a sequence by relying on the
previous elements in the same sequence. The model learns patterns
and dependencies from past data to generate or forecast new data
points.”
Explanation:
Autoregressive models predict each element in a sequence based on
the elements that came before it. They work by learning how the
earlier parts of the data relate to later parts, making them useful for
tasks like time series analysis and generating text or other sequential
data.
Examples:
3. AR (Autoregressive) Model:
Explanation:
2. Processing:
What is a Neuron in a Neural Network?
Think of a neuron in a neural network like a tiny decision-maker. It
takes some information (inputs), processes it, and then decides what
the output should be.
Here's how this process works:
Step 1: Inputs and Weights
Imagine you have three pieces of information coming into the neuron.
Let's call these pieces x1, x2, and x3 (like three different features of
data).
Each piece of information has a weight (let’s call them w1, w2, and
w3) that tells the neuron how important that piece of information is.
For example, if w1 is big, it means x1 is very important in deciding the
output, while if w2 is small or negative, x2 is less important or has an
opposite effect.
Step 2: Calculating the Weighted Sum
The neuron first multiplies each input by its corresponding weight.
Then, it adds all these values together, plus a little extra number called
the bias (b, which helps to finetune the output.
Here’s what that looks like as a simple formula:
σ(z)=1/1+e^-z
This formula uses the mathematical constant e (about 2.718).
Step 4: Example Walkthrough
Let's use your example to see this in action:
1. Inputs: x1=2, x2=3, x3=1
2. Weights: w1=0.5, w2=−1, w3=0.3
3. Bias: b=1
Calculating the Weighted Sum
σ(z)=1/1+e^-z
σ(z)=1/1+e^-(-0.7)
σ(z)=1/1+e^0.7
σ(z)≈ 1/1+2.013 ≈ 1/3.013 ≈ 0.332
So, the neuron's output, y, is about 0.332. This number represents
how strongly this neuron "fires."
Why Use an Activation Function?
An activation function makes the neuron’s output non-linear, allowing
it to model complex patterns.
Example of Why We Need This:
Imagine you have data points arranged in two circular shapes (one
inside the other). A linear function (like a straight line) won't be able
to separate them effectively.
However, using a non-linear activation function (like ReLU or
sigmoid) allows the neuron to learn more complex boundaries and
better separate these kinds of patterns.
3. Why Use an Activation Function?
The activation function allows the neural network to model non-linear
relationships between inputs and outputs. For example, in a
classification problem with data points arranged in concentric circles, a
linear function would not effectively separate the classes. A non-linear
activation function like ReLU enables the network to find more complex
decision boundaries, making it more suitable for such tasks.
3. Output:
The result of the activation function becomes the output of the neuron,
which is then passed as input to the next layer's neurons. In a multi-
layer network, this process is repeated across multiple layers to
transform the data progressively and learn complex patterns.
3. Learning Rate:
Think of the learning rate like a teacher’s patience when helping you
learn something new.
If the learning rate is too high, it’s like the teacher is moving too fast
for you to understand, and you might miss some important details. The
neural network might adjust the weights and biases too quickly and
overshoot the best settings, leading to poor performance.
If the learning rate is too low, it’s like the teacher is going too slowly,
and it takes a long time for you to learn. The model makes only tiny
adjustments, and it takes a long time to reach the best solution.
The learning rate helps the neural network find a balance: adjusting
the weights and biases enough to make progress, but not so much that
it overshoots the best settings.
2. Token Embedding:
Explanation: Each word or piece of text in the output sequence is
converted into a dense vector using an embedding layer. This vector
represents the token in a high-dimensional space, where similar words
are positioned close to each other.
3. Positional Encoding:
Explanation: To give the model information about the position of
each token in the sequence, positional encoding is added to the token
embeddings. This helps the model understand the order of tokens.
4. Multi-Head Attention:
Explanation: Multi-head attention allows the decoder to look at
different parts of the input or previous tokens simultaneously. This
means the model can focus on various aspects of the data, capturing
different types of information.
Example: When translating "The cat sits on the mat" into French,
different attention heads might focus on different parts of the sentence
to understand the meaning better and generate a more accurate
translation.
5. Decoder Self-Attention:
Explanation: During the generation of the output sequence, self-
attention helps the decoder look back at the tokens it has already
generated. This ensures that the new token being generated fits well
with the previous ones.
Example: If the model has already translated "Le chat" (The cat) in
French, it will use self-attention to generate the next word in context
with "Le chat."
6. Masking:
Explanation: Masking is used to prevent the model from seeing future
tokens during training. This ensures that predictions are made based
only on the tokens that have been generated so far.
Example: When predicting the next word, masking makes sure the
model can't see words that come after the current one, avoiding
"cheating" and focusing only on the previous context.
9. Sampling:
Explanation: During inference (when generating text), sampling
methods like greedy decoding or beam search help in choosing the
next token based on the probabilities provided by the output layer.
Beam Search:
Definition:
Now, the model calculates the probabilities for the next word for
each of these sequences.
When language models like GPT-3 process text, they need to break it
down into smaller parts called tokens. Tokens can be whole words,
parts of words (subwords), or even single characters. Different ways of
breaking down text (tokenization) help the model understand and learn
from the data better. Here's a quick overview of the different types:
1. Word-based Tokenization:
2. Character-based Tokenization:
Breaks down text into individual characters, like "d", "e", "o", etc.
Captures every detail but creates very long sequences, which
may not be meaningful by themselves.
Makes it hard for the model to learn meaningful patterns,
especially in longer, complex words.
3. Subword-based Tokenization:
Example:
Suppose we have data aaabdaaabac which needs to be encoded
(compressed). The byte pair aa occurs most often, so we will
replace it with Z as Z does not occur in our data. So we now have
ZabdZabac where Z = aa. The next common byte pair is ab so let’s
replace it with Y. We now have ZYdZYac where Z = aa and Y = ab.
The only byte pair left is ac which appears as just one so we will not
encode it. We can use recursive byte pair encoding to encode ZY as
X. Our data has now transformed into XdXac where X = ZY, Y = ab,
and Z = aa. It cannot be further compressed as there are no byte
pairs appearing more than once. We decompress the data by
performing replacements in reverse order.
Example:
Facts:
GPT-3 has a vocabulary of 50,257 tokens, and each token is
represented by a vector of 12888 elements.
This vector representation is called an embedding, an important
concept, that helps to capture the semantic meanings of the
tokens.
Multi-Head Attention Mechanism:
Introduction:
Multi-head attention is an important technique used in machine
learning, especially in tasks where we work with sequences of data,
like translating sentences from one language to another.
Imagine you want to translate the English sentence "The cat sat on the
mat" into French. To do this, a computer uses an encoder-decoder
model that reads the entire English sentence (encoder) and then
generates the French translation word by word (decoder).
Now, when the decoder starts generating the French sentence, it
needs to know which words in the English sentence are most relevant
for translating each French word. This is where the attention
mechanism helps. It allows the decoder to "pay attention" to different
words in the input sentence at different times.
In Simple Terms:
Think of it like having multiple pairs of eyes looking at a picture, each
focusing on a different detail. Together, they provide a clearer and
more complete understanding of the whole picture. Similarly, multi-
head attention helps the model understand the context of the input
better and produce a more accurate output.
Article Link
3. Prompt-based Learning:
GPT-3 can also be learned from "prompts." Prompts are like specific
instructions or examples given to the model to guide its response. For
example, if you want the model to write a poem, you might provide a
starting sentence or a specific format. This helps guide the model's
output to be more relevant to what you want.
Self-Supervised Learning:
What is Self-Supervised Learning?
Self-supervised learning is a way for computers to learn from data
without needing humans to provide labels or answers. Instead, the
model creates its own learning signals from the data itself. The main
goal is for the model to learn useful patterns or features in the data,
which it can later use to perform specific tasks.
Examples of (Training):
This analogy is a great way to illustrate the concepts of Supervised Fine-
Tuning (SFT), Reward Model Training (RM), and Reinforcement Learning
Fine-Tuning (RL). Here's a quick breakdown of how each step relates to
the process of teaching a pet to fetch a ball:
1. Supervised Fine-Tuning (SFT): This is like the initial training
phase where you demonstrate the desired behavior to your pet. By
showing them how to fetch the ball, you're providing clear examples
of the actions you want them to learn. The pet practices these
actions and improves its performance based on the demonstrations.
2. Reward Model Training (RM): In this phase, you get feedback
from others on your pet's fetching behavior. This feedback helps
you understand which aspects of the behavior are more desirable.
By analyzing the feedback, you create a model of what fetching
behavior is preferred, which guides how you’ll further train your pet.
3. Reinforcement Learning Fine-Tuning (RL): Using the reward
model, you refine your pet’s fetching behavior by providing rewards
(treats or praise) based on how well the pet’s actions align with the
preferred behavior. This reinforcement helps your pet learn to
adjust its actions to maximize rewards, improving its fetching
performance over time.
This step-by-step approach effectively combines demonstrations,
feedback, and rewards to teach and refine complex behaviors.
Prompt Engineering:
Definition:
“Prompt engineering is the process of designing and refining
prompts to effectively communicate with AI models, like GPT, to get
desired outputs. A prompt is the input or instruction you provide to an
AI to generate a specific response.”
Explanation:
Imagine the AI as a talented chef who can cook any dish, but you must
give clear instructions on what you want. If you just say "food," the
chef might make anything. But if you say, "I want a spicy vegetable
curry with a hint of coconut," the chef knows exactly what to make.
Similarly, with prompt engineering, you craft specific instructions for
the AI. For example:
Basic Prompt: "Tell me about Python."
Refined Prompt: "Explain Python programming language for
beginners, including its uses and main features."
By refining prompts, you guide the AI to give more accurate and
relevant answers. It’s about figuring out the best way to ask the AI to
get the results you want.
1. Structured Text:
Structured text prompts are organized in a clear format, often using
bullet points, numbered lists, or sections to specify what you want from
the AI. This approach helps ensure that the response is detailed and
covers all the necessary points.
2. Flipped Approach:
The flipped approach starts with the result or desired output and works
backward to create a prompt that will lead to that specific response.
This method helps clarify what you want by envisioning the final
answer first. For example, if you want a story that ends with a
surprising twist, you might prompt: “Write a short story that starts with
a peaceful family dinner but ends with a surprising twist.”
Example:
Question: "What is a fun way to celebrate a birthday?"
Step-by-Step Process:
1. Ask the Question Repeatedly:
o 1st Attempt: "A fun way to celebrate a birthday is to have
a surprise party with friends and family."
o 2nd Attempt: "Hosting a surprise party with friends and
family is a fun way to celebrate a birthday."
o 3rd Attempt: "Celebrating with a surprise party involving
friends and family can be very enjoyable."
o 4th Attempt: "A birthday can be celebrated in a fun way
by organizing a surprise party with loved ones."
o 5th Attempt: "One of the best ways to celebrate a
birthday is by throwing a surprise party with your friends
and family."
2. Collect Answers:
o Most answers mention having a surprise party with friends
and family.
3. Vote for the Best Answer:
o The common theme across answers is celebrating with a
surprise party involving friends and family.
4. Final Answer:
o "A fun way to celebrate a birthday is to have a surprise
party with friends and family."
Why This Works: By asking the question multiple times and getting
similar answers about a surprise party, you can confidently conclude
that this is a popular and effective way to celebrate a birthday. This
method helps you avoid random or less common suggestions and
focuses on the most consistently recommended idea.