Module 3
Module 3
Reinforcement Learning
2. Human Feedback:
3. Policy Training:
4. Reward Modeling:
o Instead of manually defining a reward function, which can be
complex and task-specific, a reward model is trained to
estimate the reward signal from human feedback.
1. Initial Pretraining:
5. Continuous Improvement:
Applications of RLHF
4. Task-specific Optimization:
Challenges in RLHF
1. Feedback Quality:
2. Scalability:
4. Ethical Considerations:
Notable Examples
2. Anthropic’s AI Research:
Conclusion
1. Problem Definition
Before building any model, it's crucial to define the problem clearly. This
includes:
2. Data Collection
Models need data to learn patterns. This step involves gathering relevant
data from:
Challenges:
Data Preprocessing
Once the data is collected, it must be cleaned and prepared for the model.
This step is critical for ensuring high-quality inputs.
Splitting the Data: Split the dataset into training, validation, and
test sets (commonly in an 80-10-10 ratio or 70-15-15).
4. Model Selection
Supervised Learning:
Unsupervised Learning:
This step involves feeding the training data into the model to learn from it.
6. Model Evaluation
Once the model is trained, it's evaluated on the validation and test sets to
assess its performance.
Cross-Validation:
For certain applications, it's essential to understand and explain how the
model makes decisions, especially in regulated industries (e.g.,
healthcare, finance).
Techniques:
9. Model Deployment
Once the model is fine-tuned, it is deployed in a real-world environment
where it will make predictions on new data.
Methods of Deployment:
Model Drift: Retrain the model when new data patterns emerge
that weren’t present in the training set.
Conclusion
InstructGPT:
ChatGPT:
o Example: Users can ask, "Tell me about AI," and then follow up
with, "How is AI used in healthcare?" to which ChatGPT can
maintain continuity and context from the previous responses.
2. Training Methods
InstructGPT:
InstructGPT:
ChatGPT:
4. Use Cases
InstructGPT:
o Examples:
ChatGPT:
o Examples:
Conclusion
Instruct GPT :
1. Instruction Following:
1. Task Focus:
2. Human Feedback:
1. Text Summarization:
2. Code Generation:
3. Clarification of Instructions:
1. Pretraining:
3. Reinforcement Learning:
o Using the reward model, InstructGPT is fine-tuned with
reinforcement learning techniques like Proximal Policy
Optimization (PPO). This process adjusts the model’s behavior
to prioritize generating outputs that maximize the predicted
reward (i.e., those that best align with human intent).
1. Customer Support:
2. Code Assistance:
3. Content Creation:
1. Scope of Instruction:
3. Reliability:
Conclusion
ChatGPT :
1. Conversational Abilities:
2. Multi-turn Context:
o Unlike single-turn models (e.g., InstructGPT), ChatGPT can
track and reference information from earlier points in a
conversation. This allows it to provide more coherent
responses in back-and-forth dialogues.
2. Conversational Flow:
1. Customer Support:
2. Personal Assistants:
5. Casual Conversation:
1. Pretraining:
ChatGPT’s Strengths
1. Contextual Awareness:
o One of ChatGPT's primary strengths is its ability to remember
context throughout a conversation, making it more reliable in
extended interactions where it can maintain continuity of
topics and recall previous statements.
1. Context Limitations:
2. Handling Ambiguity:
3. Generative Risks:
Conclusion
2. Fine-tuning Capabilities:
o The early versions of the API were mostly suited for single-turn
tasks (i.e., one prompt, one response). However, with the
advent of ChatGPT, there was a demand for multi-turn
conversations where the model retains context.
o The pricing structure and token usage of the OpenAI API have
evolved to better reflect the needs of developers. Newer API
versions provide more granular control over usage, such as
separate pricing tiers for different model versions (e.g., GPT-
3.5, GPT-4) and refined token accounting, helping developers
optimize their costs.
1. Backward Compatibility:
2. Model-specific Updates:
1. Multi-modal Capabilities:
2. Real-time Capabilities:
3. Cross-platform Integration:
Conclusion
The changing API landscape reflects the ongoing evolution of
technology, user needs, and developer demands. In the case of OpenAI,
the API has evolved to support increasingly complex and diverse
interactions, from simple text generation to multi-turn conversations and
specialized tasks like embeddings. As conversational AI continues to grow,
APIs will continue to adapt, offering more flexibility, control, and
integration possibilities for developers across various industries.
1. Multi-turn Conversation:
3. Roles in Conversation:
4. State Management:
6. Flexible Responses:
4. Iterative Exchange:
o Each API call must include the entire conversation history for
the assistant to generate a context-aware response.
json
Copy code
"model": "gpt-4",
"messages": [
{ "role": "user", "content": "Can you explain how recursion works in this
example?" }
Response Example:
json
Copy code
"role": "assistant",
"content": "In this example, the function `factorial` calls itself recursively.
The function multiplies the current value of `n` by the result of
`factorial(n-1)`. When `n` reaches 0, the recursion stops because the base
case (`if n == 0`) returns 1. The recursive calls then resolve in reverse
order, calculating the final factorial."
1. model:
2. messages:
3. max_tokens:
5. top_p:
6. stop:
7. n:
1. Customer Service:
2. Virtual Assistants:
3. Educational Tools:
4. Collaborative Writing:
1. Context Retention:
3. Customizable Behavior:
4. Scalability:
2. Token Limits:
o The API has a token limit (which includes both input and
output tokens), so very long conversations may eventually
require trimming or summarizing parts of the conversation
history to fit within these constraints.
3. Response Consistency:
Conclusion
The Chat Completion API is a powerful tool for building dynamic, multi-
turn conversational applications. It offers flexibility in structuring
interactions, maintains context, and provides a high level of control over
the assistant's behavior. While it requires developers to manage
conversation history, its ability to handle complex interactions and tailor
responses to specific roles makes it ideal for chatbots, customer service
agents, educational tools, and much more.
As OpenAI’s models and APIs have evolved, there’s been a shift from
traditional chat-based interactions toward function-based models. This
evolution broadens the range of use cases for AI, allowing it to perform
more complex, integrated tasks beyond simple conversational exchanges.
Moving beyond chat to functions introduces capabilities where the model
can invoke actions, interact with other systems, or provide structured
outputs, making it a more powerful and flexible tool for developers.
2. Action-Oriented Interactions:
o As AI integrates more deeply into business and software
applications, it needs to move from just responding to
requests conversationally to actually doing things—like
triggering workflows, generating code snippets, retrieving
data, making API calls, and more.
1. Defining Functions:
o Developers can define functions (in their code) that they want
the model to call when needed. These functions can perform
actions like retrieving data from a database, executing
commands, performing calculations, interacting with external
APIs, etc.
User Query: “Can you book me a flight from New York to Paris for
next Monday?”
Function Call: The model recognizes the need to interact with a
flight-booking API and calls a predefined function book_flight(origin,
destination, date).
1. Task Automation:
2. Precision in Responses:
1. Improved AI Utility:
o Example: “Give me the sales data for the last quarter” would
trigger a function to access the relevant data, format it, and
return a summary or detailed report.
3. Financial Services:
Conclusion
Example:
3. Dialogues as Prompts:
Example:
Example:
Example:
Example:
To take the analogy a step further, here are some playwriting techniques
that can inform and improve your prompt engineering:
Prompt Example:
“You are a knowledgeable historian specializing in ancient Egypt. Explain
the significance of the pyramids to a high school student.”
Prompt Example:
“You are a lawyer, and your client insists on pleading guilty despite
evidence proving their innocence. Advise your client on the best course of
action.”
Prompt Example:
“You are a life coach. The user is feeling unmotivated and unsure about
their career. Offer encouragement and practical advice to help them find
direction.”
Just as no play unfolds exactly the same way every time it’s performed, AI
responses vary with each input, even if the prompts are similar. The goal
is to balance structure and creativity, just as a playwright gives actors the
space to interpret and bring a script to life. This dynamic, creative
interaction is what makes prompt engineering similar to the art of
playwriting.
Sample Questions: