UNIT IV
UNIT IV
UNIT IV
APPLICATIONS OF GENERATIVE AI
Syllabus : Image generation and manipulation – Text generation and natural language
processing – Anomaly detection and data augmentation – Style transfer and artistic applications
– Real-world use cases (Art & Design, Medical Imaging, Content creation, Chatbots, Virtual
Assistants, Cybersecurity, etc.) and industry examples. Guest Lectures by Industry Experts, and
Researchers
Image Generation and Manipulation
Image generation and manipulation involve using generative models like Generative
Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models to
create or edit visual content. GANs, for instance, are widely used in high-resolution image
synthesis, style transfer, and editing tasks. Applications include creating photorealistic images,
modifying facial features, or generating artistic variations of existing artwork. Techniques like
DeepDream and Neural Style Transfer enable blending artistic styles into photographs. In
practical scenarios, tools such as DALL·E, MidJourney, and RunwayML demonstrate the
capability of AI in generating unique and customized visual content. Manipulation tasks also
extend to object removal, inpainting (filling missing parts of an image), and image super-
resolution, which have applications in fields such as design, forensics, and entertainment.
Image generative AI models are a subset of generative models that aim to create realistic and
coherent images from scratch. These models use complex algorithms and deep learning
techniques to learn patterns and features from a vast amount of training data. They can be
broadly classified into two categories: Variational Autoencoders (VAEs) and Generative
Adversarial Networks (GANs).
Variational Autoencoders (VAEs): VAEs are probabilistic models that encode images into a
latent space, where they are represented as vectors. The decoder then reconstructs the images
from the encoded vectors, enabling the model to generate new images by sampling from the
latent space.
Generative Adversarial Networks (GANs): GANs consist of two neural networks, a generator,
and a discriminator, engaged in a competitive process. The generator creates synthetic images
to fool the discriminator, which, in turn, aims to distinguish between real and fake images. This
back-and-forth battle results in the generation of highly realistic images.
2. Text-to-Image Synthesis
Text-to-image synthesis, a fascinating application of image generative AI models, enables the
conversion of textual descriptions into corresponding visual representations. This process
involves combining natural language processing (NLP) techniques with image generative
models to achieve impressive results.
Conditional GANs for Text-to-Image: Researchers have successfully integrated text
embeddings with GANs, allowing for the generation of images conditioned on specific textual
input. This means that by providing a detailed textual description, the model can generate
highly specific and accurate images.
1
AD2602-Generative AI UNIT IV
2
AD2602-Generative AI UNIT IV
Natural language generation (NLG) is the ability of a machine to produce human-like text or
speech that is clear, concise and engaging. It involves tasks like text summarization,
storytelling, dialogue systems and speech synthesis. NLG helps machines generate meaningful
and coherent responses in a way that is easily understood by humans.
NLU focuses on understanding human language, while NLG focuses on generating human-like
language. Both are crucial for building advanced NLP applications that can effectively
communicate with humans in a natural and meaningful way.
Text generation techniques
Statistical models: These models typically use a large dataset of text to learn the patterns and
structures of human language, and then use this knowledge to generate new text. Statistical
models can be effective at generating text that is similar to the training data, but they can
struggle to generate text that is both creative and diverse. N-gram models and conditional
random fields (CRF) are popular statistical models.
N-gram models: These are a type of statistical model that uses the n-gram language model,
which predicts the probability of a sequence of "n-items" in a given context.
Conditional random fields (CRFs): These are a type of statistical model that uses a
probabilistic graphical model to model the dependencies between words in a sentence. CRFs
can be effective at generating text that is both coherent and contextually appropriate, but this
type of text generation model can be computationally expensive to train and might not perform
well on tasks that require a high degree of creative language generation.
Neural networks: These are machine learning algorithms that use artificial neural networks to
identify data patterns. Through APIs, developers can tap into pretrained models for creative
and diverse text generation, closely mirroring the training data's complexity. The quality of the
generated text heavily relies on the training data. However, these networks demand significant
computational resources and extensive data for optimal performance.
Recurrent neural networks (RNNs): These are a foundational type of neural network
optimized for processing sequential data, such as word sequences in sentences or paragraphs.
They excel in tasks that require understanding sequences, making them useful in the early
stages of developing large language models(LLMs). However, RNNs face challenges with
long-term dependencies across extended texts, a limitation stemming from their sequential
processing nature. As information progresses through the network, early input influence
diminishes, leading to the "vanishing gradient" problem during backpropagation, where
updates shrink and hinder the model's ability to maintain long-sequence connections.
Incorporating techniques from reinforcement learning can offer strategies to mitigate these
issues, providing alternative learning paradigms to strengthen sequence memory and decision-
making processes in these networks.
Long short-term memory networks (LSTMs): This is a type of neural network that uses a
memory cell to store and access information over long periods of time. LSTMs can be effective
at handling long-term dependencies, such as the relationships between sentences in a
document, and can generate text that is both coherent and contextually appropriate.
3
AD2602-Generative AI UNIT IV
Transformer-based models: These models are a type of neural network that uses self-attention
mechanisms to process sequential data. Transformer-based models can be effective at
generating text that is both creative and diverse, as they can learn complex patterns and
structures in the training data and generate new text that is similar to the training data. Unlike
historical approaches such as RNNs and LSTMs, transformer-based models have the distinct
advantage of processing data in parallel, rather than sequentially. This allows for more efficient
handling of long-term dependencies across large datasets, making these models especially
powerful for natural language processing applications such as machine translation and text
summarization.
Generative pretrained transformer (GPT): GPT is a transformer-based model that is trained
on a large dataset of text to generate human-like text. GPT can be effective at generating text
that is both creative and diverse, as it can learn complex patterns and structures in the training
data and generate new text that is similar to the training data.
Bidirectional encoder representations from transformers (BERT): BERT is a transformer-
based model that is trained on a large dataset of text to generate bidirectional representations
of words. That means it evaluates the context of words from both before and after a sentence.
This comprehensive context awareness allows BERT to achieve a nuanced understanding of
language nuances, resulting in highly accurate and coherent text generation. This bidirectional
approach is a key distinction that enhances BERT's performance in applications requiring deep
language comprehension, such as question answering and named entity recognition (NER), by
providing a fuller context compared to unidirectional models.
Thus, text generation techniques, especially those implemented in Python, have revolutionized
the way we approach generative AI in the English language and beyond. Using trained models
from platforms like Hugging Face, developers and data scientists can access a plethora of open
source tools and resources that facilitate the creation of sophisticated text generation
applications. Python, being at the forefront of AI and data science, offers libraries that simplify
interacting with these models, allowing for customization through prefix or template
adjustments, and the manipulation of text data for various applications. Furthermore, the use
of metrics and benchmarks to evaluate model performance, along with advanced decoding
strategies, ensures that the generated text meets high standards of coherence and relevance.
Examples of text generation
Text generation is a versatile tool that has a wide range of applications in various domains.
Here are some examples of text generation applications:
• Blog posts and articles:
It can be used to automatically generate blog posts and articles for websites and blogs. These
systems can automatically generate unique and engaging content that is tailored to the reader's
interests and preferences.
• News articles and reports:
It can be used to automatically generate news articles and reports for newspapers, magazines
and other media outlets. These systems can automatically generate timely and accurate content
that is tailored to the reader's interests and preferences.
4
AD2602-Generative AI UNIT IV
5
AD2602-Generative AI UNIT IV
6
AD2602-Generative AI UNIT IV
7
AD2602-Generative AI UNIT IV
8
AD2602-Generative AI UNIT IV
• Although VGG-19 is known for its simplicity, it has a large number of parameters,
owing to the fully connected layers.
• There are approximately 143.7 million trainable parameters in total.
Pre-trained Model:
• VGG-19 is a popular pre-trained model for a variety of computer vision tasks.
Researchers pre-trained it on large datasets such as ImageNet, allowing it to capture a
diverse set of features from various categories.
The neural style transfer paper uses feature maps generated by intermediate layers of VGG-19
network to generate the output image. This architecture takes style and content images as input
and stores the features extracted by convolution layers of VGG network.
9
AD2602-Generative AI UNIT IV
10
AD2602-Generative AI UNIT IV
11