0% found this document useful (0 votes)
3 views

unit3sem7 generative ai

Generative AI is a technology that produces various types of content, including text and imagery, leveraging advancements like generative adversarial networks and transformers. While it offers opportunities for innovation in fields such as entertainment and education, it also raises concerns about accuracy, bias, and misuse, particularly with the emergence of deepfakes. The technology is still evolving, with applications across industries, but challenges remain in ensuring ethical use and reliable outputs.

Uploaded by

princepoddar747
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

unit3sem7 generative ai

Generative AI is a technology that produces various types of content, including text and imagery, leveraging advancements like generative adversarial networks and transformers. While it offers opportunities for innovation in fields such as entertainment and education, it also raises concerns about accuracy, bias, and misuse, particularly with the emergence of deepfakes. The technology is still evolving, with applications across industries, but challenges remain in ensuring ethical use and reliable outputs.

Uploaded by

princepoddar747
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 41

7CS5-13Generative AI

Unit3
Generative Models for Natural Language Processing

Lecture1

Generative AI is a type of artificial intelligence technology that can produce various types of
content, including text, imagery, audio and synthetic data. The recent buzz around generative AI
has been driven by the simplicity of new user interfaces for creating high-quality text, graphics
and videos in a matter of seconds.

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s
in chatbots. But it was not until 2014, with the introduction of generative adversarial networks,
or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly
authentic images, videos and audio of real people.

On the one hand, this newfound capability has opened up opportunities that include better movie
dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally
forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious
requests that realistically mimic an employee's boss.

Two additional recent advances that will be discussed in more detail below have played a critical
part in generative AI going mainstream: transformers and the breakthrough language models
they enabled. Transformers are a type of machine learning that made it possible for researchers
to train ever-larger models without having to label all of the data in advance. New models could
thus be trained on billions of pages of text, resulting in answers with more depth. In addition,
transformers unlocked a new notion called attention that enabled models to track the connections
between words across pages, chapters and books rather than just in individual sentences. And not
just words: Transformers could also use their ability to track connections to analyze code,
proteins, chemicals and DNA.

1
The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or
even trillions of parameters -- have opened a new era in which generative AI models can write
engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the
fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple
types of media, including text, graphics and video. This is the basis for tools like Dall-E that
automatically create images from a text description or generate text captions from images.

These breakthroughs notwithstanding, we are still in the early days of using generative AI to
create readable text and photorealistic stylized graphics. Early implementations have had issues
with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers.
Still, progress thus far indicates that the inherent capabilities of this generative AI could
fundamentally change enterprise technology how businesses operate. Going forward, this
technology could help write code, design new drugs, develop products, redesign business
processes and transform supply chains.

2
3
How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a
design, musical notes, or any input that the AI system can process. Various AI algorithms then
return new content in response to the prompt. Content can include essays, solutions to problems,
or realistic fakes created from pictures or audio of a person.

Early versions of generative AI required submitting data via an API or an otherwise complicated
process. Developers had to familiarize themselves with special tools and write applications using
languages such as Python.

Now, pioneers in generative AI are developing better user experiences that let you describe a
request in plain language. After an initial response, you can also customize the results with
feedback about the style, tone and other elements you want the generated content to reflect.

4
Lecture2

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For
example, to generate text, various natural language processing techniques transform raw
characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and
actions, which are represented as vectors using multiple encoding techniques. Similarly, images
are transformed into various visual elements, also expressed as vectors. One caution is that these
techniques can also encode the biases, racism, deception and puffery contained in the training
data.

Once developers settle on a way to represent the world, they apply a particular neural network to
generate new content in response to a query or prompt. Techniques such as GANs and
variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable
for generating realistic human faces, synthetic data for AI training or even facsimiles of
particular humans.

Recent progress in transformers such as Google's Bidirectional Encoder Representations from


Transformers (BERT), OpenAI's GPT and Google AlphaFold have also resulted in neural
networks that can not only encode language, images and proteins but also generate new content.

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since
the early days of AI. The earliest approaches, known as rule-based systems and later as "expert
systems," used explicitly crafted rules for generating responses or data sets.

Neural networks, which form the basis of much of the AI and machine learning applications
today, flipped the problem around. Designed to mimic how the human brain works, neural
networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and
1960s, the first neural networks were limited by a lack of computational power and small data

5
sets. It was not until the advent of big data in the mid-2000s and improvements in computer
hardware that neural networks became practical for generating content.

The field accelerated when researchers found a way to get neural networks to run in parallel
across the graphics processing units (GPUs) that were being used in the computer gaming
industry to render video games. New machine learning techniques developed in the past decade,
including the aforementioned generative adversarial networks and transformers, have set the
stage for the recent remarkable advances in AI-generated content.

6
Lecture3

What are Dall-E, ChatGPT and Bard?

ChatGPT, Dall-E and Bard are popular generative AI interfaces.

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an
example of a multimodal AI application that identifies connections across multiple media, such
as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was
built using OpenAI's GPT implementation in 2021. Dall-E 2, a second, more capable version,
was released in 2022. It enables users to generate imagery in multiple styles driven by user
prompts.

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on
OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text
responses via a chat interface with interactive feedback. Earlier versions of GPT were only
accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of
its conversation with a user into its results, simulating a real conversation. After the incredible
popularity of the new GPT interface, Microsoft announced a significant new investment into
OpenAI and integrated a version of GPT into its Bing search engine.

7
Here is a snapshot of the differences between ChatGPT and Bard.

Bard. Google was another early leader in pioneering transformer AI techniques for processing
language, proteins and other types of content. It open sourced some of these models for
researchers. However, it never released a public interface for these models. Microsoft's decision
to implement GPT into Bing drove Google to rush to market a public-facing chatbot, Google
Bard, built on a lightweight version of its LaMDA family of large language models. Google
suffered a significant loss in stock price following Bard's rushed debut after the language model
incorrectly said the Webb telescope was the first to discover a planet in a foreign solar system.
Meanwhile, Microsoft and ChatGPT implementations also lost face in their early outings due to
inaccurate results and erratic behavior. Google has since unveiled a new version of Bard built on
its most advanced LLM, PaLM 2, which allows Bard to be more efficient and visual in its
response to user queries.

What are use cases for generative AI?

Generative AI can be applied in various use cases to generate virtually any kind of content. The
technology is becoming more accessible to users of all kinds thanks to cutting-
edge breakthroughs like GPT that can be tuned for different applications. Some of the use cases
for generative AI include the following:

 Implementing chatbots for customer service and technical support.

 Deploying deepfakes for mimicking people or even specific individuals.

 Improving dubbing for movies and educational content in different languages.

 Writing email responses, dating profiles, resumes and term papers.

 Creating photorealistic art in a particular style.

 Improving product demonstration videos.

 Suggesting new drug compounds to test.

 Designing physical products and buildings.

8
 Optimizing new chip designs.

 Writing music in a specific style or tone.

What are the benefits of generative AI?

Generative AI can be applied extensively across many areas of the business. It can make it easier
to interpret and understand existing content and automatically create new content. Developers
are exploring ways that generative AI can improve existing workflows, with an eye to adapting
workflows entirely to take advantage of the technology. Some of the potential benefits of
implementing generative AI include the following:

 Automating the manual process of writing content.

 Reducing the effort of responding to emails.

 Improving the response to specific technical queries.

 Creating realistic representations of people.

 Summarizing complex information into a coherent narrative.

 Simplifying the process of creating content in a particular style.

What are the limitations of generative AI?

Early implementations of generative AI vividly illustrate its many limitations. Some of


the challenges generative AI presents result from the specific approaches used to implement
particular use cases. For example, a summary of a complex topic is easier to read than an
explanation that includes various sources supporting key points. The readability of the summary,
however, comes at the expense of a user being able to vet where the information comes from.

Here are some of the limitations to consider when implementing or using a generative AI app:

 It does not always identify the source of content.

9
 It can be challenging to assess the bias of original sources.

 Realistic-sounding content makes it harder to identify inaccurate information.

 It can be difficult to understand how to tune for new circumstances.

 Results can gloss over bias, prejudice and hatred.

Attention is all you need: Transformers bring new capability

In 2017, Google reported on a new type of neural network architecture that brought significant
improvements in efficiency and accuracy to tasks like natural language processing. The
breakthrough approach, called transformers, was based on the concept of attention.

At a high level, attention refers to the mathematical description of how things (e.g., words) relate
to, complement and modify each other. The researchers described the architecture in their
seminal paper, "Attention is all you need," showing how a transformer neural network was able
to translate between English and French with more accuracy and in only a quarter of the training
time than other neural nets. The breakthrough technique could also discover relationships, or
hidden orders, between other things buried in the data that humans might have been unaware of
because they were too complicated to express or discern.

Transformer architecture has evolved rapidly since it was introduced, giving rise to LLMs such
as GPT-3 and better pre-training techniques, such as Google's BERT.

What are the concerns surrounding generative AI?

The rise of generative AI is also fueling various concerns. These relate to the quality of results,
potential for misuse and abuse, and the potential to disrupt existing business models. Here are
some of the specific types of problematic issues posed by the current state of generative AI:

 It can provide inaccurate and misleading information.

 It is more difficult to trust without knowing the source and provenance of information.

10
 It can promote new kinds of plagiarism that ignore the rights of content creators and
artists of original content.

 It might disrupt existing business models built around search engine optimization and
advertising.

 It makes it easier to generate fake news.

 It makes it easier to claim that real photographic evidence of a wrongdoing was just an
AI-generated fake.

 It could impersonate people for more effective social engineering cyber attacks.

Implementing generative AI is not just about technology. Businesses must also consider its
impact on people and processes.

What are some examples of generative AI tools?

Generative AI tools exist for various modalities, such as text, imagery, music, code and voices.
Some popular AI content generators to explore include the following:

 Text generation tools include GPT, Jasper, AI-Writer and Lex.

 Image generation tools include Dall-E 2, Midjourney and Stable Diffusion.

 Music generation tools include Amper, Dadabots and MuseNet.

 Code generation tools include CodeStarter, Codex, GitHub Copilot and Tabnine.

 Voice synthesis tools include Descript, Listnr and Podcast.ai.

11
 AI chip design tool companies include Synopsys, Cadence, Google and Nvidia.

Use cases for generative AI, by industry

New generative AI technologies have sometimes been described as general-purpose technologies


akin to steam power, electricity and computing because they can profoundly affect many
industries and use cases. It's essential to keep in mind that, like previous general-purpose
technologies, it often took decades for people to find the best way to organize workflows to take
advantage of the new approach rather than speeding up small portions of existing workflows.
Here are some ways generative AI applications could impact different industries:

 Finance can watch transactions in the context of an individual's history to build better
fraud detection systems.

 Legal firms can use generative AI to design and interpret contracts, analyze evidence and
suggest arguments.

 Manufacturers can use generative AI to combine data from cameras, X-ray and other
metrics to identify defective parts and the root causes more accurately and economically.

 Film and media companies can use generative AI to produce content more economically
and translate it into other languages with the actors' own voices.

 The medical industry can use generative AI to identify promising drug candidates more
efficiently.

 Architectural firms can use generative AI to design and adapt prototypes more quickly.

 Gaming companies can use generative AI to design game content and levels.

GPT joins the pantheon of general-purpose technologies

OpenAI, an AI research and deployment company, took the core ideas behind transformers to
train its version, dubbed Generative Pre-trained Transformer, or GPT. Observers have noted that
GPT is the same acronym used to describe general-purpose technologies such as the steam
engine, electricity and computing. Most would agree that GPT and other transformer

12
implementations are already living up to their name as researchers discover ways to apply them
to industry, science, commerce, construction and medicine.

Ethics and bias in generative AI

Despite their promise, the new generative AI tools open a can of worms regarding accuracy,
trustworthiness, bias, hallucination and plagiarism -- ethical issues that likely will take years to
sort out. None of the issues are particularly new to AI. Microsoft's first foray into chatbots in
2016, called Tay, for example, had to be turned off after it started spewing inflammatory rhetoric
on Twitter.

What is new is that the latest crop of generative AI apps sounds more coherent on the surface.
But this combination of humanlike language and coherence is not synonymous with human
intelligence, and there currently is great debate about whether generative AI models can be
trained to have reasoning ability. One Google engineer was even fired after publicly declaring
the company's generative AI app, Language Models for Dialog Applications (LaMDA), was
sentient.

The convincing realism of generative AI content introduces a new set of AI risks. It makes it
harder to detect AI-generated content and, more importantly, makes it more difficult to detect
when things are wrong. This can be a big problem when we rely on generative AI results to write
code or provide medical advice. Many results of generative AI are not transparent, so it is hard to
determine if, for example, they infringe on copyrights or if there is problem with the original
sources from which they draw results. If you don't know how the AI came to a conclusion, you
cannot reason about why it might be wrong.

13
Generative AI vs. AI

Generative AI focuses on creating new and original content, chat responses, designs, synthetic
data or even deepfakes. It's particularly valuable in creative fields and for novel problem-solving,
as it can autonomously generate many types of new outputs.

Generative AI, as noted above, relies on neural network techniques such as transformers, GANs
and VAEs. Other kinds of AI, in distinction, use techniques including convolutional neural
networks, recurrent neural networks and reinforcement learning.

Generative AI often starts with a prompt that lets a user or data source submit a starting query or
data set to guide content generation. This can be an iterative process to explore content
variations. Traditional AI algorithms, on the other hand, often follow a predefined set of rules to
process data and produce a result.

Both approaches have their strengths and weaknesses depending on the problem to be solved,
with generative AI being well-suited for tasks involving NLP and calling for the creation of new
content, and traditional algorithms more effective for tasks involving rule-based processing and
predetermined outcomes.

14
Generative AI vs. predictive AI vs. conversational AI

Predictive AI, in distinction to generative AI, uses patterns in historical data to forecast
outcomes, classify events and actionable insights. Organizations use predictive AI to sharpen
decision-making and develop data-driven strategies.

Conversational AI helps AI systems like virtual assistants, chatbots and customer service apps
interact and engage with humans in a natural way. It uses techniques from NLP and machine
learning to understand language and provide human-like text or speech responses.

Generative AI history

The Eliza chatbot created by Joseph Weizenbaum in the 1960s was one of the earliest examples
of generative AI. These early implementations used a rules-based approach that broke easily due
to a limited vocabulary, lack of context and overreliance on patterns, among other shortcomings.
Early chatbots were also difficult to customize and extend.

The field saw a resurgence in the wake of advances in neural networks and deep learning in 2010
that enabled the technology to automatically learn to parse existing text, classify image elements
and transcribe audio.

Ian Goodfellow introduced GANs in 2014. This deep learning technique provided a novel
approach for organizing competing neural networks to generate and then rate content variations.
These could generate realistic people, voices, music and text. This inspired interest in -- and fear
of -- how generative AI could be used to create realistic deepfakes that impersonate voices and
people in videos.

Since then, progress in other neural network techniques and architectures has helped expand
generative AI capabilities. Techniques include VAEs, long short-term memory, transformers,
diffusion models and neural radiance fields.

15
Best practices for using generative AI

The best practices for using generative AI will vary depending on the modalities, workflow and
desired goals. That said, it is important to consider essential factors such as accuracy,
transparency and ease of use in working with generative AI. The following practices help
achieve these factors:

 Clearly label all generative AI content for users and consumers.

 Vet the accuracy of generated content using primary sources where applicable.

 Consider how bias might get woven into generated AI results.

 Double-check the quality of AI-generated code and content using other tools.

 Learn the strengths and limitations of each generative AI tool.

 Familiarize yourself with common failure modes in results and work around these.

The future of generative AI

16
The incredible depth and ease of ChatGPT spurred widespread adoption of generative AI. To be
sure, the speedy adoption of generative AI applications has also demonstrated some of the
difficulties in rolling out this technology safely and responsibly. But these early implementation
issues have inspired research into better tools for detecting AI-generated text, images and video.

Indeed, the popularity of generative AI tools such as ChatGPT, Midjourney, Stable Diffusion and
Bard has also fueled an endless variety of training courses at all levels of expertise. Many are
aimed at helping developers create AI applications. Others focus more on business users looking
to apply the new technology across the enterprise. At some point, industry and society will also
build better tools for tracking the provenance of information to create more trustworthy AI.

Generative AI will continue to evolve, making advancements in translation, drug discovery,


anomaly detection and the generation of new content, from text and video to fashion design and
music. As good as these new one-off tools are, the most significant impact of generative AI in
the future will come from integrating these capabilities directly into the tools we already use.

Grammar checkers, for example, will get better. Design tools will seamlessly embed more useful
recommendations directly into our workflows. Training tools will be able to automatically
identify best practices in one part of an organization to help train other employees more
efficiently. These are just a fraction of the ways generative AI will change what we do in the
near-term.

What the impact of generative AI will be in the future is hard to say. But as we continue to
harness these tools to automate and augment human tasks, we will inevitably find ourselves
having to reevaluate the nature and value of human expertise.

17
Generative AI will find its way into many business functions.

Generative AI FAQs

Below are some frequently asked questions people have about generative AI.

Who created generative AI?

Joseph Weizenbaum created the first generative AI in the 1960s as part of the Eliza chatbot.

Ian Goodfellow demonstrated generative adversarial networks for generating realistic-looking


and -sounding people in 2014.

Subsequent research into LLMs from Open AI and Google ignited the recent enthusiasm that has
evolved into tools like ChatGPT, Google Bard and Dall-E.
18
How could generative AI replace jobs?

Generative AI has the potential to replace a variety of jobs, including the following:

 Writing product descriptions.

 Creating marketing copy.

 Generating basic web content.

 Initiating interactive sales outreach.

 Answering customer questions.

 Making graphics for webpages.

Some companies will look for opportunities to replace humans where possible, while others will
use generative AI to augment and enhance their existing workforce.

How do you build a generative AI model?

A generative AI model starts by efficiently encoding a representation of what you want to


generate. For example, a generative AI model for text might begin by finding a way to represent
the words as vectors that characterize the similarity between words often used in the same
sentence or that mean similar things.

Recent progress in LLM research has helped the industry implement the same process to
represent patterns found in images, sounds, proteins, DNA, drugs and 3D designs. This
generative AI model provides an efficient way of representing the desired type of content and
efficiently iterating on useful variations.

How do you train a generative AI model?

The generative AI model needs to be trained for a particular use case. The recent progress in
LLMs provides an ideal starting point for customizing applications for different use cases. For
example, the popular GPT model developed by OpenAI has been used to write text, generate
code and create imagery based on written descriptions.

19
Training involves tuning the model's parameters for different use cases and then fine-tuning
results on a given set of training data. For example, a call center might train a chatbot against the
kinds of questions service agents get from various customer types and the responses that service
agents give in return. An image-generating app, in distinction to text, might start with labels that
describe content and style of images to train the model to generate new images.

How is generative AI changing creative work?

Generative AI promises to help creative workers explore variations of ideas. Artists might start
with a basic design concept and then explore variations. Industrial designers could explore
product variations. Architects could explore different building layouts and visualize them as a
starting point for further refinement.

It could also help democratize some aspects of creative work. For example, business users could
explore product marketing imagery using text descriptions. They could further refine these
results using simple commands or suggestions.

What's next for generative AI?

ChatGPT's ability to generate humanlike text has sparked widespread curiosity about generative
AI's potential. It also shined a light on the many problems and challenges ahead.

In the short term, work will focus on improving the user experience and workflows using
generative AI tools. It will also be essential to build trust in generative AI results.

Many companies will also customize generative AI on their own data to help improve branding
and communication. Programming teams will use generative AI to enforce company-specific
best practices for writing and formatting more readable and consistent code.

Vendors will integrate generative AI capabilities into their additional tools to streamline content
generation workflows. This will drive innovation in how these new capabilities can increase
productivity.

20
Generative AI could also play a role in various aspects of data processing, transformation,
labeling and vetting as part of augmented analytics workflows. Semantic web applications could
use generative AI to automatically map internal taxonomies describing job skills to different
taxonomies on skills training and recruitment sites. Similarly, business teams will use these
models to transform and label third-party data for more sophisticated risk assessments and
opportunity analysis capabilities.

In the future, generative AI models will be extended to support 3D modeling, product design,
drug development, digital twins, supply chains and business processes. This will make it easier
to generate new product ideas, experiment with different organizational models and explore
various business ideas.

Latest Generative AI technology defined

AI art (artificial intelligence art)


AI art is any form of digital art created or enhanced with AI tools. Read more

AI prompt
An artificial intelligence (AI) prompt is a mode of interaction between a human and a LLM that
lets the model generate the intended output. This interaction can be in the form of a question,
text, code snippets or examples. Read more

AI prompt engineer
An artificial intelligence (AI) prompt engineer is an expert in creating text-based prompts or cues
that can be interpreted and understood by large language models and generative AI tools. Read
more

Amazon Bedrock
Amazon Bedrock -- also known as AWS Bedrock -- is a machine learning platform used to build
generative artificial intelligence (AI) applications on the Amazon Web Services cloud computing
platform. Read more

21
Auto-GPT
Auto-GPT is an experimental, open source autonomous AI agent based on the GPT-4 language
model that autonomously chains together tasks to achieve a big-picture goal set by the user. Read
more

Google Search Generative Experience


Google Search Generative Experience (SGE) is a set of search and interface capabilities that
integrates generative AI-powered results into Google search engine query responses. Read more

Google Search Labs (GSE)


GSE is an initiative from Alphabet's Google division to provide new capabilities and
experiments for Google Search in a preview format before they become publicly available. Read
more

Image-to-image translation
Image-to-image translation is a generative artificial intelligence (AI) technique that translates a
source image into a target image while preserving certain visual properties of the original
image. Read more

Inception score
The inception score (IS) is a mathematical algorithm used to measure or determine the quality of
images created by generative AI through a generative adversarial network (GAN). The word
"inception" refers to the spark of creativity or initial beginning of a thought or action traditionally
experienced by humans. Read more

LangChain
LangChain is an open source framework that lets software developers working with artificial
intelligence (AI) and its machine learning subset combine large language models with other
external components to develop LLM-powered applications. Read more

22
Q-learning
Q-learning is a machine learning approach that enables a model to iteratively learn and improve
over time by taking the correct action. Read more

Reinforcement learning from human feedback (RLHF)


RLHF is a machine learning approach that combines reinforcement learning techniques, such as
rewards and comparisons, with human guidance to train an AI agent. Read more

Retrieval-augmented generation
Retrieval-augmented generation (RAG) is an artificial intelligence (AI) framework that retrieves
data from external sources of knowledge to improve the quality of responses. Read more

Variational autoencoder (VAE)


A variational autoencoder is a generative AI algorithm that uses deep learning to generate new
content, detect anomalies and remove noise. Read more

What are some generative models for natural language processing?

Some generative models for natural language processing include the following:

 Carnegie Mellon University's XLNet

 OpenAI's GPT (Generative Pre-trained Transformer)

 Google's ALBERT ("A Lite" BERT)

 Google BERT

 Google LaMDA

Will AI ever gain consciousness?

Some AI proponents believe that generative AI is an essential step toward general-purpose AI


and even consciousness. One early tester of Google's LaMDA chatbot even created a stir when
he publicly declared it was sentient. Then he was let go from the company.

23
In 1993, the American science fiction writer and computer scientist Vernor Vinge posited that in
30 years, we would have the technological ability to create a "superhuman intelligence" -- an AI
that is more intelligent than humans -- after which the human era would end. AI pioneer Ray
Kurzweil predicted such a "singularity" by 2045.

Many other AI experts think it could be much further off. Robot pioneer Rodney
Brooks predicted that AI will not gain the sentience of a 6-year-old in his lifetime but could seem
as intelligent and attentive as a dog by 2048.

AI existential risk: Is AI a threat to humanity?

What should enterprises make of the recent warnings about AI's threat to humanity? AI experts
and ethicists offer opinions and practical advice for managing AI risk. Read more

Latest generative AI news and trends

A venture capitalist's take on generative AI investment


Funding for startups such as Anthropic, Cohere and Hugging Face shows that money is still
flowing into the market. However, the criteria for funding are still strict. Read more

SAP joins generative AI crowd with Joule


SAP's Joule is the company's generative AI assistant for its cloud portfolio, but the technology is
still in the early stages and the ERP giant will need to prove its value. Read more

Generative AI emerges for DevSecOps, with some qualms


New and developing tools use natural language processing to assist DevSecOps workflows, but
concerns linger among developers about security risks as well. Read more

Surveyed board members see GenAI as a cybersecurity risk


The emergence of generative AI tools has board members on high alert, along with geopolitical
tensions and the continued rise of ransomware attacks that threaten cybersecurity. Read more

24
Lawyers win in race to generative AI without adequate laws
As Congress drags its feet on passing AI regulations, lawyers are filling the vacuum by helping
enterprises navigate court rulings and regulations based on outdated laws. Read more

This was last updated in October 2023

Continue Reading About What is generative AI? Everything you need to know

 Bard vs. ChatGPT: What’s the difference?

 Pros and cons of AI-generated content

 What is prompt engineering?

 What is generative modeling?

 Prepare for deepfake phishing attacks in the enterprise

Dig Deeper on AI technologies

25
Lecture4

Generative Adversarial Networks, Transformers

Generative Adversarial Networks, or GANs for short, are an approach to generative modeling
using deep learning methods, such as convolutional neural networks.

Generative modeling is an unsupervised learning task in machine learning that involves


automatically discovering and learning the regularities or patterns in input data in such a way
that the model can be used to generate or output new examples that plausibly could have been
drawn from the original dataset.

GANs are a clever way of training a generative model by framing the problem as a supervised
learning problem with two sub-models: the generator model that we train to generate new
examples, and the discriminator model that tries to classify examples as either real (from the
domain) or fake (generated). The two models are trained together in a zero-sum game,
adversarial, until the discriminator model is fooled about half the time, meaning the generator
model is generating plausible examples.

GANs are an exciting and rapidly changing field, delivering on the promise of generative models
in their ability to generate realistic examples across a range of problem domains, most notably in
image-to-image translation tasks such as translating photos of summer to winter or day to night,
and in generating photorealistic photos of objects, scenes, and people that even humans cannot
tell are fake.

In this post, you will discover a gentle introduction to Generative Adversarial Networks, or
GANs.

After reading this post, you will know:

 Context for GANs, including supervised vs. unsupervised learning and discriminative vs.
generative modeling.

26
 GANs are an architecture for automatically training a generative model by treating the
unsupervised problem as supervised and using both a generative and a discriminative model.
 GANs provide a path to sophisticated domain-specific data augmentation and a solution
to problems that require a generative solution, such as image-to-image translation.
Kick-start your project with my new book Generative Adversarial Networks with Python,
including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.

A Gentle Introduction to Generative Adversarial Networks (GANs)

27
Lecture5

Basic overview of Generative AI models: OpenAI models (ChatGPT, GPT 4.0, DALL-E 2),
Google Bard, Microsoft models ( CoDi, Kosmos-2)

What Are Generative Models?

In this section, we will review the idea of generative models, stepping over the supervised vs.
unsupervised learning paradigms and discriminative vs. generative modeling.

Supervised vs. Unsupervised Learning

A typical machine learning problem involves using a model to make a prediction, e.g. predictive
modeling.
This requires a training dataset that is used to train a model, comprised of multiple examples,
called samples, each with input variables (X) and output class labels (y). A model is trained by
showing examples of inputs, having it predict outputs, and correcting the model to make the
outputs more like the expected outputs.
In the predictive or supervised learning approach, the goal is to learn a mapping from inputs x
to outputs y, given a labeled set of input-output pairs …

— Page 2, Machine Learning: A Probabilistic Perspective, 2012.


This correction of the model is generally referred to as a supervised form of learning, or
supervised learning.

28
Example of Supervised Learning

Examples of supervised learning problems include classification and regression, and examples of
supervised learning algorithms include logistic regression and random forest.

There is another paradigm of learning where the model is only given the input variables (X) and
the problem does not have any output variables (y).
A model is constructed by extracting or summarizing the patterns in the input data. There is no
correction of the model, as the model is not predicting anything.

The second main type of machine learning is the descriptive or unsupervised learning approach.
Here we are only given inputs, and the goal is to find “interesting patterns” in the data. […]
This is a much less well-defined problem, since we are not told what kinds of patterns to look for,
and there is no obvious error metric to use (unlike supervised learning, where we can compare
our prediction of y for a given x to the observed value).

— Page 2, Machine Learning: A Probabilistic Perspective, 2012.


This lack of correction is generally referred to as an unsupervised form of learning, or
unsupervised learning.

29
Example of Unsupervised Learning

Examples of unsupervised learning problems include clustering and generative modeling, and
examples of unsupervised learning algorithms are K-means and Generative Adversarial
Networks.

Want to Develop GANs from Scratch?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-Course

Discriminative vs. Generative Modeling

In supervised learning, we may be interested in developing a model to predict a class label given
an example of input variables.

This predictive modeling task is called classification.

Classification is also traditionally referred to as discriminative modeling.

… we use the training data to find a discriminant function f(x) that maps each x directly onto a
class label, thereby combining the inference and decision stages into a single learning problem.

— Page 44, Pattern Recognition and Machine Learning, 2006.


This is because a model must discriminate examples of input variables across classes; it must
choose or make a decision as to what class a given example belongs.

30
Example of Discriminative Modeling

Alternately, unsupervised models that summarize the distribution of input variables may be able
to be used to create or generate new examples in the input distribution.

As such, these types of models are referred to as generative models.

Example of Generative Modeling

For example, a single variable may have a known data distribution, such as a Gaussian
distribution, or bell shape. A generative model may be able to sufficiently summarize this data
distribution, and then be used to generate new variables that plausibly fit into the distribution of
the input variable.
Approaches that explicitly or implicitly model the distribution of inputs as well as outputs are
known as generative models, because by sampling from them it is possible to generate synthetic
data points in the input space.
31
— Page 43, Pattern Recognition and Machine Learning, 2006.
In fact, a really good generative model may be able to generate new examples that are not just
plausible, but indistinguishable from real examples from the problem domain.

Examples of Generative Models

Naive Bayes is an example of a generative model that is more often used as a discriminative
model.
For example, Naive Bayes works by summarizing the probability distribution of each input
variable and the output class. When a prediction is made, the probability for each possible
outcome is calculated for each variable, the independent probabilities are combined, and the
most likely outcome is predicted. Used in reverse, the probability distributions for each variable
can be sampled to generate new plausible (independent) feature values.

Other examples of generative models include Latent Dirichlet Allocation, or LDA, and the
Gaussian Mixture Model, or GMM.

Deep learning methods can be used as generative models. Two popular examples include the
Restricted Boltzmann Machine, or RBM, and the Deep Belief Network, or DBN.

Two modern examples of deep learning generative modeling algorithms include the Variational
Autoencoder, or VAE, and the Generative Adversarial Network, or GAN.

What Are Generative Adversarial Networks?

Generative Adversarial Networks, or GANs, are a deep-learning-based generative model.

More generally, GANs are a model architecture for training a generative model, and it is most
common to use deep learning models in this architecture.

The GAN architecture was first described in the 2014 paper by Ian Goodfellow, et al. titled
“Generative Adversarial Networks.”
A standardized approach called Deep Convolutional Generative Adversarial Networks, or
DCGAN, that led to more stable models was later formalized by Alec Radford, et al. in the 2015
paper titled “Unsupervised Representation Learning with Deep Convolutional Generative
Adversarial Networks“.
Most GANs today are at least loosely based on the DCGAN architecture …

32
— NIPS 2016 Tutorial: Generative Adversarial Networks, 2016.
The GAN model architecture involves two sub-models: a generator model for generating new
examples and a discriminator model for classifying whether generated examples are real, from
the domain, or fake, generated by the generator model.
 Generator. Model that is used to generate new plausible examples from the problem
domain.
 Discriminator. Model that is used to classify examples as real (from the domain) or fake
(generated).
Generative adversarial networks are based on a game theoretic scenario in which the generator
network must compete against an adversary. The generator network directly produces samples.
Its adversary, the discriminator network, attempts to distinguish between samples drawn from
the training data and samples drawn from the generator.

— Page 699, Deep Learning, 2016.


The Generator Model

The generator model takes a fixed-length random vector as input and generates a sample in the
domain.

The vector is drawn from randomly from a Gaussian distribution, and the vector is used to seed
the generative process. After training, points in this multidimensional vector space will
correspond to points in the problem domain, forming a compressed representation of the data
distribution.

This vector space is referred to as a latent space, or a vector space comprised of latent variables.
Latent variables, or hidden variables, are those variables that are important for a domain but are
not directly observable.
A latent variable is a random variable that we cannot observe directly.

— Page 67, Deep Learning, 2016.


We often refer to latent variables, or a latent space, as a projection or compression of a data
distribution. That is, a latent space provides a compression or high-level concepts of the observed
raw data such as the input data distribution. In the case of GANs, the generator model applies
meaning to points in a chosen latent space, such that new points drawn from the latent space can
be provided to the generator model as input and used to generate new and different output
examples.

33
Machine-learning models can learn the statistical latent space of images, music, and stories, and
they can then sample from this space, creating new artworks with characteristics similar to those
the model has seen in its training data.

— Page 270, Deep Learning with Python, 2017.


After training, the generator model is kept and used to generate new samples.

Example of the GAN Generator Model

The Discriminator Model

The discriminator model takes an example from the domain as input (real or generated) and
predicts a binary class label of real or fake (generated).

The real example comes from the training dataset. The generated examples are output by the
generator model.

The discriminator is a normal (and well understood) classification model.

After the training process, the discriminator model is discarded as we are interested in the
generator.

Sometimes, the generator can be repurposed as it has learned to effectively extract features from
examples in the problem domain. Some or all of the feature extraction layers can be used in
transfer learning applications using the same or similar input data.
34
We propose that one way to build good image representations is by training Generative
Adversarial Networks (GANs), and later reusing parts of the generator and discriminator
networks as feature extractors for supervised tasks

— Unsupervised Representation Learning with Deep Convolutional Generative Adversarial


Networks, 2015.

Example of the GAN Discriminator Model

Why Generative Adversarial Networks?

One of the many major advancements in the use of deep learning methods in domains such as
computer vision is a technique called data augmentation.
Data augmentation results in better performing models, both increasing model skill and
providing a regularizing effect, reducing generalization error. It works by creating new, artificial
but plausible examples from the input problem domain on which the model is trained.

The techniques are primitive in the case of image data, involving crops, flips, zooms, and other
simple transforms of existing images in the training dataset.

Successful generative modeling provides an alternative and potentially more domain-specific


approach for data augmentation. In fact, data augmentation is a simplified version of generative
modeling, although it is rarely described this way.

35
… enlarging the sample with latent (unobserved) data. This is called data augmentation. […] In
other problems, the latent data are actual data that should have been observed but are missing.

— Page 276, The Elements of Statistical Learning, 2016.


In complex domains or domains with a limited amount of data, generative modeling provides a
path towards more training for modeling. GANs have seen much success in this use case in
domains such as deep reinforcement learning.

There are many research reasons why GANs are interesting, important, and require further study.
Ian Goodfellow outlines a number of these in his 2016 conference keynote and associated
technical report titled “NIPS 2016 Tutorial: Generative Adversarial Networks.”
Among these reasons, he highlights GANs’ successful ability to model high-dimensional data,
handle missing data, and the capacity of GANs to provide multi-modal outputs or multiple
plausible answers.

Perhaps the most compelling application of GANs is in conditional GANs for tasks that require
the generation of new examples. Here, Goodfellow indicates three main examples:

 Image Super-Resolution. The ability to generate high-resolution versions of input


images.
 Creating Art. The ability to great new and artistic images, sketches, painting, and more.
 Image-to-Image Translation. The ability to translate photographs across domains, such
as day to night, summer to winter, and more.
Perhaps the most compelling reason that GANs are widely studied, developed, and used is
because of their success. GANs have been able to generate photos so realistic that humans are
unable to tell that they are of objects, scenes, and people that do not exist in real life.

A tweak to diffusion models, which are responsible for most of the recent excitement about AI-
generated images, enables them to produce more realistic output.

Types of AI Generative Models

Applications and advantages of flow-based models

Flow-based models have applications in image generation, density estimation, and anomaly
detection. They offer advantages such as tractable likelihood evaluation, exact sampling, and
flexible latent space modeling.

E. Transformer-based model

36
Explanation of transformer-based model and its characteristics

Transformer-based models are a type of deep learning architecture that has gained significant
popularity and success in natural language processing (NLP) tasks. Transformer-based models
are a type of deep learning architecture that has gained significant popularity and success in
natural language processing (NLP) tasks.

Applications and advantages of the transformer-based model

One notable application of Transformer models is the Transformer-based language model known
as GPT (Generative Pre-trained Transformer). Models like GPT-3 have demonstrated impressive
capabilities in generating coherent and contextually relevant text given a prompt. They have been
used for various NLP tasks, including text completion, question answering, translation,
summarization, and more.

37
Lecture6

Applications of AI-Generative Models

A. Image Generation and Manipulation

 Creating realistic images from scratch


 Generative models can generate high-quality images that resemble real-world objects,
scenes, or even abstract art.
 Image style transfer and image-to-image translation
 Generative models enable the transfer of artistic styles from one image to another,
transforming images to match different visual aesthetics.
 Content generation for art and design
 AI generative models can assist artists and designers in generating novel and inspiring
content, opening new avenues for creativity.
B. Text Generation and Language Modeling

 Natural language generation and storytelling


 Generative models can generate coherent paragraphs, simulate human-like conversation,
and even create engaging narratives.
 Language translation and text summarization
 Generative models can facilitate language translation, allowing for automated translation
between different languages. They can also summarize long texts by extracting the most
important information.
 Dialogue systems and conversational agents
 Generative models can power chatbots and virtual assistants, enabling intelligent
conversation and personalized interactions with users.
C. Music and Sound Synthesis

 Generating new musical compositions


 Generative models can compose new musical pieces, emulate the style of famous
composers, and aid in music production.
 Sound generation and audio synthesis
 AI generative models can synthesize new sounds, enabling applications in sound design,
audio effects, and virtual reality experiences.
 Music style transfer and remixing
 Generative models can transfer musical styles from one piece to another, allowing for
creative remixing and experimentation.
D. Video Synthesis and Deepfakes

38
 Video generation and frame prediction
 Generative models can generate new videos or predict future frames, aiding in video
synthesis and simulation.
 Deepfake technology and its implications
 Deepfakes, driven by generative models, raise concerns regarding fake videos and their
potential impact on privacy, misinformation, and trust.
 Video editing and content creation
 AI generative models can automate video editing tasks, enhance visual effects, and
facilitate content creation in the film and entertainment industry.
Evaluation and Challenges in AI Generative Models

A. Metrics for evaluating generative models

Evaluating generative models poses unique challenges. Metrics such as likelihood, inception
score, and Frechet Inception Distance (FID) are commonly used to assess the quality and
diversity of generated samples.

B. Challenges in training and optimizing generative models

Training generative models can be challenging due to issues like mode collapse, overfitting, and
finding the right balance between exploration and exploitation. Optimization techniques and
regularization methods help address these challenges.

C. Ethical considerations and concerns in AI generative modeling

Ethical considerations arise with AI generative models, particularly in areas such as deep fakes,
privacy, bias, and the responsible use of AI-generated content. Ensuring transparency, fairness,
and responsible deployment is essential to mitigate these concerns.

Future Trends and Developments

A. Advancements in generative model architectures and techniques

Ongoing research aims to improve the performance, efficiency, and controllability of generative
models. Innovations in architectures, regularization techniques, and training methods are
expected to shape the future of generative modeling.

B. Integration of generative models with other AI approaches

The integration of generative models with other AI approaches, such as reinforcement learning
and transfer learning, holds promise for more sophisticated and adaptable generative systems.

C. Potential impact on various industries and domains

39
AI generative models have the potential to disrupt industries like entertainment, design,
advertising, and more. They can enhance creative processes, automate content creation, and
enable personalized user experiences.

Lecture7

Models

New models launched at DevDay

We are excited to announce the preview release of GPT-4 Turbo (128k context window) and an
updated GPT-3.5 Turbo (16k context window). Among other things, both models come with
improved instruction following, JSON mode, more reproducible outputs, and parallel function
calling.
Learn more.

Overview

The OpenAI API is powered by a diverse set of models with different capabilities and price
points. You can also make customizations to our models for your specific use case with fine-
tuning.

M O DE L DESCRIPTION

GPT-4 and GPT-4 A set of models that improve on GPT-3.5 and can understand as well
Turbo as generate natural language or code

GPT-3.5 A set of models that improve on GPT-3 and can understand as well as
generate natural language or code

40
M O DE L DESCRIPTION

DALL·E A model that can generate and edit images given a natural language
prompt

TTS A set of models that can convert text into natural sounding spoken
audio

Whisper A model that can convert audio into text

Embeddings A set of models that can convert text into a numerical form

Moderation A fine-tuned model that can detect whether text may be sensitive or
unsafe

GPT base A set of models without instruction following that can understand as
well as generate natural language or code

GPT-3 A set of models that can understand and generate natural language

Legacy

Deprecated A full list of models that have been deprecated along with the
suggested replacement

We have also published open source models including Point-E, Whisper, Jukebox, and CLIP.

Visit our model index for researchers to learn more about which models have been featured in
our research papers and the differences between model series like InstructGPT and GPT-3.5.

41

You might also like