Project Report
Project Report
Project: ChatGPT
CS-250 Data Structures and Algorithms
Group Members
Afifa Attiq
Javeria Hakim
Mehtab Ameem
Inrtroduction
Motivation
Literature Review
Working of ChatGPT
Applications
Comparison with Google
Refrences
Introduction:
John McCarthy, the person who coined the term “artificial intelligence” (AI)
in 1955, describes it as machines that “… use language, form abstractions
On the basis of the features and success of GPT, OpenAI managed to create
a chatbot that could hold natural conversations with humans. This led to the
development of ChatGPT, which was initially released in early 2020. It has
been trained on a massive dataset of hundreds of billions of words, allowing
it to generate high-quality language and perform a wide range of NLP tasks,
from text completion and translation to question answering and
summarization. Also, it has been fine-tuned for conversational language and
can be used to generate responses in a conversational context, such as in a
chatbot or virtual assistant, due to ability to generate human-like responses
to text input. By using advanced machine learning techniques, ChatGPT is
able to analyze the context and content of a conversation, and generate
appropriate responses based on this analysis. This allows it to hold natural,
flowing conversations with humans, rather than simply providing pre-
written responses to specific keywords or phrases.
Fine Tuning:
Fine-tuning is a process where the pre-trained model is further trained on a specific dataset to adapt it to a
particular task or domain. Fine-tuning can enhance the model's performance for specific applications.
1. Transformer Architecture
2. Pre-Training
3. Tokenization
4. Self Attention Mechanism
5. Fine Tuning
Transformers Architecture:
The transformer architecture is the foundation of GPT models. It uses self-attention mechanisms to
capture dependencies between different words in a sentence, allowing the model to consider long-
range context efficiently.
Pre-Training:
ChatGPT undergoes pre-training on a large dataset. During pre-training, the model learns to predict
Tokenization:
Tokenization is the process of breaking down input text into smaller units called tokens. GPT
models operate on these tokens, which can be words, sub words, or characters.
-TOKENS:
GPT-3 was trained on roughly 500 billion tokens, which allows its language models to
more easily assign meaning and predict plausible follow-on text by mapping them in
vector-space. Many words map to single tokens, though longer or more complex words
often break down into multiple tokens. On average, tokens are roughly four characters
long. OpenAI has stayed quiet about the inner workings of GPT-4, but we can safely
assume it was trained on much the same dataset since it's even more powerful. All the
tokens came from a massive corpus of data written by humans. That includes books,
articles, and other documents across all different topics, styles, and genres—and an
unbelievable amount of content scraped from the open internet. Basically, it was
allowed to crunch through the sum total of human knowledge to develop the network it
uses to generate text.
Based on all that training, GPT-3's neural network has 175 billion parameters or
variables that allow it to take an input (user’s prompt) and then, based on the values
and weightings it gives to the different parameters (and a small amount of
randomness), outputs whatever it thinks best matches your request. OpenAI hasn't said
how many parameters GPT-4 has, but it's a safe guess that it's more than 175 billion
and less than the once-rumored 100 trillion parameters. Regardless of the exact
number, more parameters doesn't automatically mean better. Some of GPT-4's
increased power probably comes from having more parameters than GPT-3, but a lot is
probably down to improvements in how it was trained.
At the core of transformers is a process called "self-attention." Older recurrent neural networks (RNNs) read
text from left-to-right. This is fine when related words and concepts are beside each other, but it makes things
complicated when they're at opposite ends of the sentence. (It's also a slow way to compute things as it has to be
done sequentially.)
Transformers, however, read every word in a sentence at once and compare each word to all the others.
Transformers don't work with words, they work with "tokens," which are chunks of text encoded as a vector (a
number with position and direction). The closer two token-vectors are in space, the more related they are.
Similarly, attention is encoded as a vector, which allows transformer-based neural networks to remember
important information from earlier in a paragraph.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected
layers for both the encoder and decoder, shown in the left and right halves of Figure.
ChatGPT Google
[2] F. Jia, D, Sun, Q, Ma, and C. K. Looi,. “Developing an AI-Based Learning System for
L2 Learners’ Authentic and Ubiquitous Learning in English Language,” Sustainability
14, no. 23, 2022, 15527.
3) Bhavya, B., Xiong, J., and Zhai, C. Analogy generation by prompting large language
models: A case study of instructgpt. arXiv preprint arXiv:2210.04186 (2022).
4) zapier.com/blog/how-does-ChatGPT-work
9) www.semrush.com/blog/google-search-algorithm/