DL 5
DL 5
A generative model is a type of machine learning model that can generate new data that looks like the data it was
trained on.
Imagine you give a model lots of pictures of cats, and later it can create new pictures of cats that are completely
original — that's what a generative model does.
It learns the patterns and structure of the training data, and then creates new samples that follow those
patterns.
Goal Classify data (e.g., is it a cat or dog?) Generate data (e.g., create new cat images)
A deep generative model is a generative model built using deep learning techniques (neural networks with many
layers).
Realistic images
Natural-sounding speech
Human-like text
Even videos
Because they can generate new examples that look like the training data.
For example:
If trained on handwritten digits → it can create new digits like “2” or “7” that look handwritten.
2. Generator Network: Transforms that noise into something meaningful (like an image).
1. Boltzmann Machines
🔷 Simple Example:
Let’s say you train a deep generative model on images of handwritten digits (like the MNIST dataset).
After training:
The model generates a new image that looks like a handwritten “4” or “8” — even though it’s not copied from
the training data.
Text-to-Image: Generate images from text descriptions (e.g., "a cat riding a bike").
A Boltzmann Machine is a type of neural network used in deep learning. It is designed to learn patterns and
relationships in data without needing labels (unsupervised learning)24.
Key Points:
It is made up of nodes (also called units or neurons). Each node can be either ON (1) or OFF (0)24.
Every node is connected to every other node. These connections have weights that determine how strongly
nodes influence each other24.
o Visible nodes: These are the input data you can observe.
o Hidden nodes: These are not directly seen but help the machine learn patterns in the data24.
The network is called “stochastic” because the nodes make random (probabilistic) decisions about whether to
turn ON or OFF24.
The goal is to find patterns or features in the data by adjusting the connection weights.
The Boltzmann Machine tries to find a balance (like thermal equilibrium in physics) where the overall “energy”
of the network is minimized. Lower energy means the network has learned good patterns from the data23.
It does this by repeatedly turning nodes ON or OFF and updating the weights based on how well the network
represents the input data23.
The learning process is slow but helps the machine discover interesting features in the data2.
Simple Example:
Suppose you have a dataset of black-and-white images (each pixel is either black or white). You feed these images into
the visible nodes. The hidden nodes try to learn patterns, such as “vertical lines” or “corners.” After learning, the
Boltzmann Machine can generate new images that follow the same patterns as your original images.
Applications:
It keeps adjusting itself (its connections and weights) until it gives lowest energy to the patterns it sees often in your
data.
Because it can:
Learn hidden patterns in the data (like what's common in your examples)
Be used in recommendation systems, feature extraction, and as the base for Deep Belief Networks
The Boltzmann Machine learns patterns in this data (e.g., users who like sci-fi also like action).
Then it can predict what movies a new user might like — even if they haven’t watched anything yet.
A Deep Belief Network (DBN) is a deep generative model made by stacking multiple Restricted Boltzmann Machines
(RBMs) on top of each other.
It learns to recognize patterns and generate data similar to what it was trained on — like handwritten digits, images, etc.
Each RBM learns features and passes those to the next layer:
Third RBM learns even more abstract features (like object identity)
Be used for both generative tasks (generate new data) and discriminative tasks (classification)
Suppose you train a DBN on the MNIST dataset (images of digits 0–9):
4. You can now use this model to recognize digits or generate new digit images.
🔷 Applications of DBNs
Image recognition
Speech recognition
Data generation
Dimensionality reduction
A Generative Adversarial Network (GAN) is a special type of deep learning model used to create new, realistic data that
looks similar to the data it was trained on. GANs are famous for generating images, but they can also create music, text,
and more.
Generator: This network tries to create fake data that looks real. For example, it might try to make a fake photo
that looks like a real person.
Discriminator: This network tries to tell the difference between real data (from the training set) and fake data
(from the generator).
The generator tries to fool the discriminator by making better and better fake data.
The discriminator tries to get better at spotting which data is fake and which is real.
This process continues until the generator gets so good that the discriminator can no longer easily tell the difference
between real and fake data12356.
Simple Example
The discriminator looks at both real cat photos and the fake image and tries to guess which is which.
Over time, the generator learns to create very realistic cat photos, even though those exact photos never existed
before.
Summary Table
In Simple Words:
A GAN is like a forger (generator) and a detective (discriminator) competing. The forger tries to make fake art that fools
the detective, and the detective tries to catch the fakes. Over time, both get better at their jobs, and the forger ends up
making very convincing art.
Discriminator Network (in GANs) Explained Simply
A discriminator network is one of the two main parts of a Generative Adversarial Network (GAN). Its job is to tell the
difference between real data (from the training set) and fake data (created by the generator network).
It uses deep learning techniques to analyze the data and decide if it is real or fake.
The output is usually a single value: closer to 1 means "real," closer to 0 means "fake"5.
It acts like a judge in a contest, constantly improving its ability to spot fake data as training continues34.
It provides feedback to the generator, helping it get better at making realistic data.
The competition between the generator and discriminator pushes both networks to improve, leading to more
convincing fake data over time135.
Example:
If you are training a GAN to generate images of handwritten digits:
The discriminator looks at a mix of real digit images and fake ones from the generator.
The better the discriminator gets, the harder the generator has to work to fool it46.
A generator network is one of the two main parts of a Generative Adversarial Network (GAN). Its main job is to create
new data that looks as real as possible, based on what it has learned from real data.
The generator starts with random input, often called “noise” (just a bunch of random numbers)36.
It uses a neural network to transform this noise into a data sample, such as an image, that tries to look like it
came from the real dataset36.
The generator sends its fake data to the discriminator, which tries to tell if it’s real or fake15.
If the discriminator easily spots the fake, the generator gets feedback and learns to improve.
Over many rounds, the generator gets better at making data that can fool the discriminator135.
Example:
If you want a GAN to generate pictures of faces:
The generator takes random noise and tries to turn it into a picture that looks like a real face.
The generator keeps learning and gets better at making faces that look real.
Key Points:
The generator’s goal is to make fake data that is so realistic, the discriminator can’t tell it’s fake.
It learns by getting feedback from the discriminator and improving its own network135.
Over time, the generator can produce very convincing data, like realistic images, sounds, or text.
There are several types of GANs, each designed for different tasks or improvements over the basic model. Here are some
of the most important and commonly used types:
1. Vanilla GAN
It uses basic neural networks for both the generator and discriminator.
It is mainly used for understanding and experimenting with the basic GAN concept124.
In a conditional GAN, both the generator and discriminator receive extra information, like labels or specific
conditions.
This allows the GAN to generate data that matches a certain condition, such as creating images of a specific type
(e.g., only cats or only dogs)1245.
DCGAN uses convolutional neural networks (ConvNets), which are especially good for image data.
These networks help the GAN generate more realistic and higher-quality images1247.
SRGAN is designed to take low-resolution images and generate high-resolution, detailed versions of them.
It is useful for improving the quality of images, such as making blurry images clearer126.
5. CycleGAN
CycleGAN is used to transform images from one style to another without needing paired examples.
For example, it can turn photos of horses into photos of zebras, or summer landscapes into winter ones367.
6. StyleGAN
StyleGAN is designed for generating very high-quality, realistic images, such as faces that look real but do not
belong to any real person.
It allows for control over the “style” of the generated images, such as age, hair color, or facial expression37.
LAPGAN uses a pyramid structure to generate images in stages, starting from low resolution and adding more
details at each stage.
Generative Adversarial Networks (GANs) have a wide range of real-world applications across many fields. Here are some
of the most important and common uses:
1. Image Generation
GANs can create realistic images from scratch, including human faces, animals, objects, and scenes that do not
exist in reality1236.
2. Image-to-Image Translation
They can convert images from one style or domain to another, such as turning sketches into colored photos,
black-and-white images into color, or day scenes into night scenes346.
3. Text-to-Image Translation
GANs can generate images based on text descriptions, such as creating a picture of “a red bird sitting on a
branch” from that sentence346.
4. Super Resolution
GANs can take low-resolution images and generate high-resolution versions, making blurry or pixelated photos
clearer and sharper345.
They can predict future frames in a video or even generate entirely new video sequences, which is useful in
animation, video editing, and surveillance1346.
6. 3D Object Generation
GANs can create 3D models of objects from images or sketches, useful in design, gaming, and virtual reality136.
7. Data Augmentation
They can generate extra training data for machine learning models, especially when real data is limited, such as
creating synthetic medical images or rare event samples for fraud detection5.
GANs are used for face aging, changing facial expressions, swapping faces, or editing features in photos234.
9. Style Transfer
They can apply the artistic style of one image (like a painting) to another image, such as making a photo look like
it was painted by Van Gogh6.
GANs can generate lifelike speech from text (text-to-speech), create new music, or synthesize realistic sound
effects6.
Filling in missing or damaged parts of an image, such as restoring old photos or removing unwanted objects from
pictures34.
GANs can generate highly realistic fake videos or images, such as making someone appear to say or do
something they never did. This has both creative and ethical implications6.
Used in industrial design, interior design, clothing, and product prototyping by generating new design ideas and
visualizations4.