Exposition Wholesale
Exposition Wholesale
Agenda
Project Introduction and Setup [12 min]
Intro & Overview of the Extensions project
Task Overview
New Dimensions
Prompt Requests
Multi-turn Conversations
Punts and Blank Responses
Key Model Assumptions
Knowledge Checks
Step 2: Read the prompt carefully to understand what the user needs. Look up terms
you don't know. If there is a link, look through it as you evaluate each response.
See below for an example prompt.
Make an extensive and effective summary of the following video: https://youtu.be/0-
3SlufYHwA? si=nX4ORzISCGR5p3vj
There are four new dimensions to be rated for the response: Content Conciseness &
Relevance, Content Completeness, Collaborativity, Contextual Awareness (more
details on these new dimensions in the rubric course).
Step 6: Write a justification for your decision on which model response you prefer
Step 7: Select the error categories for the response and the tool log
Prompt Requests
Links in the prompt and "@"
Links – A user usually includes a link in the prompt for a very specific reason. Be
sure to always read through the link in the prompt! It's a critical part of the
user's request.
"@" – If you see the "@" symbol before the name of a tool like "@Google Hotels" or
"@travel arrow," that means the user is requesting the model to use that specific
tool to fulfill their request.
Request Importance
Not all requests are made equal! Some are more important than others.
Imagine a prompt requests 10 hotels under $300 in LA. Let's say one response offers
5 hotels under $300 in LA and another response offers 10 hotels above $300 in LA.
Which response would a user probably prefer? Probably the first—in this context,
prices are important
Multi-turn Conversations
Multi-turn conversations (chat/conversation history) are where the user and chatbot
have a conversation before the final prompt. Sometimes, the conversation history
will appear in the response itself as "==Conversation History==". Sometimes, it
will appear above the other responses.
Always read the chat history for multi-turn conversations carefully: it provides
important context on the latest prompt and the 2 responses you'll rate.
Embedded UI Guidance
Table of Contents
Embedded UI Guidance
Tips
Conclusion
When there are no flights, maps, videos, pictures, or music listed in the response.
In such a response, the model says it has options for flights or hotels, but none
are shown…
No Issues
If the text says there is an embedded UI, check the tool log output. If the tool
log output makes sense and is accurate, rate Truthfulness as “No Issues.”
If the tool log exists and the text DOES NOT say there is an embedded UI, rate
“Major Issues.”
If no tool log exists and the text DOES SAY that an embedded UI exists, rate
“Cannot Assess.”
If no tool log exists and the text DOES NOT say there is an embedded UI, rate
“Major Issues."
Tips
Table of Contents
Embedded UI Guidance
Tips
Conclusion
Whenever the prompt requests that the model show, play, or display something, you
should check for an embedded UI.
Introduction
In this supplemental course, we will cover the most common tasking errors we see on
this project and how to address them.
Table of Contents
Common Errors
Verifying Prompt Requirements
Handling Punt Ratings
Embedded UI
Carefully Checking the Code and Code Output
Ignoring the Last Step with an Empty `code`, `tool_executions`, `error`, and
`observation`
Conclusion
Reminder: Whether you are new to this project or have been tasking for a while,
please review the material carefully. Even simple mistakes can create bad training
data, and contributors that consistently submit low quality tasks may be removed
from the project.
Common Errors
The common errors that we will address in this course are:
Not verifying that all requirements in the prompt were addressed
Not rating punts correctly according to the instructions
Not knowing how to rate responses with an Embedded UI
Not checking the Code and Code Output carefully
Not checking every code step in the Code and Code Output
Not knowing to ignore the last code step with all fields set to blank values