dolphin Genesis Image-to-Text
dolphin Genesis Image-to-Text
Attempter Specifications
Table of Contents:
Project Overview
Task Specifications
Step 1: Understanding the Competency
Competency View in the PAT
Step 2: Source an Image
Baseline image-to-text prompt requirements:
Image Permissibility requirements
Step 3: Write a Prompt
The prompt must meet all of the below requirements:
Baseline image-to-text prompt requirements:
Image Permissibility requirements
Prompt quality requirements
Do NOT proceed to step 4 until you have received back your approved prompt to submit
to arcade!
Competency View in Arcade
How to exit a task in Arcade
Step 4: Check System Instructions (SI)
Step 5: Rate the Model Responses
Appendix
Competency Guidance
Please upload high-quality images only. The image should be necessary to solve the
problem. Prompts that the AI Assistant can still solve while ignoring the image are not
helpful.
Creative Writing with Image Inputs
Reasoning with Multimedia Inputs (Text + Images)
Understanding Infographics and Graphs
Patterns with Multimedia Inputs (Text + Image)
Science with Multimedia Inputs (Text + Images)
Counting with Image Inputs
Math with Multimedia Inputs (Text + Images)
Scene Understanding with Image Inputs
Extracting Data from Images
System Instructions Ratings (to rate how well model response follows System Instructions)
Overall Response Ratings (to rate the accuracy and helpfulness of the model response)
Assistant Limitations and Capabilities
Prompt Quality (how Reviewers and QA will rate YOUR prompt)
Complexity vs. Specificity
Image Sourcing
Examples
Creative Writing With Image Inputs
Reasoning With Multimedia Inputs (Text + Images)
Understanding Infographics and Graphs
Science With Multimedia Inputs (Text + Images)
Math With Multimedia Inputs (Text + Images)
Counting with image inputs
Scene Understanding With Image Inputs
Patterns With Multimedia Inputs (Text + Images)
Extracting Data from Images
Project Overview
Welcome to the Genesis “Image & Text”-to-Text Project! This project focuses on creating
prompts that ask questions based on an attached image(s). There are nine competencies
from which you may choose to submit a prompt. Each competency has its own set of unique
instructions, so it is essential to carefully follow the specific guidance for each one. Please refer
to this Competency Guidance Table in the Appendix to understand the purpose and
requirements of each competency.
Please note that prompts should be direct, simple, and natural
sounding. There is no need to include any context to justify why the request is being
made!
The image must be necessary to answer the prompt. Your prompt should refer to
the image in a general way such that the model is forced to refer to it in order to
answer the prompt. If the prompt refers to the image too specifically, then the image
is unnecessary because the written prompt contains all of the relevant information If
the prompt refers to the image too vaguely, then it is likely that the prompt can be
answered by referring to any image, or the image is not relevant to the question at
hand.
Once the prompt and image pairings are submitted, the model will generate two responses.
These responses will be rated based on System Instructions and Overall Quality.
Responses should not contain information that the prompt did not request,
chit-chat, or irrelevant details that are beside the point of our question (even
if they are interesting).
Just because a response is longer does not mean it is necessarily
better!
Task Specifications
Step 1: Understanding the Competency
Competency View in the PAT
You will receive a task in the prompt approval tool (PAT) with one of the competencies filled in.
Please see the Competency Guidance table in the Appendix to understand each competency,
or navigate to the specific Competency via these links:
Images can be added anywhere within the prompt, e.g. at the beginning, end, or even in the
middle of the prompt. If you choose to upload multiple images, you may decide to put them in
several different places within the prompt. Please be creative and vary the position of the image
within the text.
The prompt should be a 40-word (max) not obvious natural request that:
To upload an image, go to the user input text box and select click the ‘Insert Image’
button outlined in the above. This should be a square polaroid outline with a ‘plus’ sign in the
top right corner. You will see a Markdown insert indicator where the image is applied to the text
input. E.g. <image-1>
To verify that the image is rendered correctly please refer to the ‘Markdown Preview’ on
the right hand side of the screen.
The prompt should be a 40-word (max) not obvious natural request that:
Make sure to aim to maximize diversity across the prompts you are submitting!
For the following four (4) competencies, the prompt should request a reasoning process
needed to arrive at the final answer:
Note: Your prompt does not have to contain an explicit request that contains the words
"step by step." Prompts can also implicitly request a step-by-step process for the answer.
Please refer to this document for examples.
Do NOT proceed to step 4 until you have received back your approved
prompt to submit to arcade!
Key factors to assess Overall Response Quality include helpfulness, format, presentation,
conciseness, factual accuracy, harmlessness, and focus.
Responses that fully meet the prompt’s requests and are well-structured with no
errors should be rated higher than those with mistakes or missing details.
The preferred response should be direct, simple, and straightforward. It
should provide enough helpful information to fulfill the prompt request and satisfy
the User.
Responses should not contain information that the prompt did not
request, chit-chat, or irrelevant details that are beside the point of our
question (even if they are interesting).
Just because a response is longer does not mean it is necessarily
better!
For the following four (4) competencies, the AI Assistant should write out a
reasoning process needed to arrive at the final answer, followed by the
answer itself. Your prompt should include a request for this reasoning process.
CAN do any sort of text based task including sharing information, creative
writing, math, and reasoning, etc.
CANNOT browse the internet
CANNOT use tools, such as a calendar application, notepad, etc.
CANNOT take actions in the real world (e.g. send emails or book
holidays)
DOES NOT have any knowledge about recent events within the last 6-12
months. (If asked about something more recent the assistant should
politely explain that it doesn’t know.)
DOES NOT have access to previous conversations that were ended. It
CAN only see the current conversation.
DOES NOT have any information about the User.,
SHOULD NOT pretend to be human, express emotions or opinions, or
build relationships with the user.
Appendix
Competency Guidance
Please upload high-quality images only. The image should be necessary to solve the
problem. Prompts that the AI Assistant can still solve while ignoring the image
are not helpful.
Creative Writing with Please try to choose Make sure to: Creative
images that are diverse in Writing w
mage Inputs medium, content, topic, etc. Image Inputs
Specify how the
You can use images of any image should be Example
Design tasks kind: paintings, drawings, incorporated into
which require cartoons, illustrations, the creative output
the AI collages, etc. They can (e.g. as inspiration
Assistant to capture a wide range of for the piece, as
produce a content: still life, an illustration to
creative output landscapes, architecture,
accompany the
using an image portraits, screenshot,
piece, etc.)
as input in graphs, charts, etc. or any Include
some way. Be other image that you think
constraints for the
creative with is suitable for your task. expected output,
both how the
such as length,
image is used
Ask the AI style, tone, mood,
in the prompt
Assistant to rhyme scheme for
as well as with
produce poetry, etc.
the
creative Vary your prompts
requirements
output in any in language,
you set for the
format, length, level of
Assistant!
based on the detail and types of
attached images and tasks.
image(s), The prompt
Suggested specifies
output can constraints or
include requires
poems reasoning that
(haiku, the response
ballad, ode, should follow
etc.), songs
(e.g. rap,
pop, ballad),
stories (short
story,
chapter,
movie
scene),
scripts (for a
movie or
play), jokes
or a memes
(or any other
funny piece),
letters, etc.
Math with Multimedia Requests for this The prompt must Math w
competency can include request a Multimedia
nputs (Text + Images) the following: reasoning process Inputs
in how the answer Example
was determined,
Design tasks Answering in addition to the
for the AI math answer itself.
Assistant that questions
test the ability using a The AI Assistant
to perform diagram, should write out
math with figure, or the reasoning
multimedia graph process needed to
inputs. (algebra, arrive at the final
calculus, answer, followed
geometry, by the answer
topology, set itself.
theory, The prompt
tables, etc.) specifies
Finding constraints or
errors in an requires
image reasoning that
containing the response
equations of should follow
mathematica
l reasoning
Make
statistical
estimates of
a quantity in
an image
using
mathematica
l reasoning
and
approximatio
ns
Evaluate
graphical
proofs
Scene Understanding Create tasks that test The image should Scene
whether the Assistant can be self-contained Understandin
with Image Inputs describe where objects are and not require g w Image
in an image in relation to any additional Inputs
Design tasks each other. Focus on information to be Example1
focused on spatial relationships such able to answer the
testing the as front vs back, left vs question.
Assistant’s right, and top vs The prompt
ability to bottom. Include size, color, specifies
understand and other distinctive constraints or
spatial features of the objects requires
relationships when relevant. reasoning that
(front vs back,
the response
should follow.
left vs right, top For Do NOT request
vs bottom, etc; example, JSON/ CSV file
size, color, and you could formats!
other elements ask the AI
in the image.) Assistant to
‘Tell me
what objects
are located
to the left of
the chair,
under the
table.’
For
example,
you can ask
for extremely
detailed or
concise
descriptions.
You can also
focus on
specific
parts of the
image or the
image as a
whole.
Choose
tasks that
real users
would ask
the AI
Assistant
that would
be helpful or
useful.
Treat the System Instruction ratings independently of the Overall Response Quality
ratings.
SI Rating SI (System Instructions)
Mostly Follows almost all System Instructions, but has some minor
mistakes or omissions - violates one System Instruction
Overall Response Ratings (to rate the accuracy and helpfulness of the
model response)
Do NOT rate down Overall Response Quality due to a response ignoring System
Instructions.
Resp Very Bad Bad OK Good Very Amazi
onse Good ng
Ratin
g
Model Major Factu Fails to Follows some Fulfills All Fulfills A This
Resp ality Errors meet parameters major LL rating is
Did not several of and requests in prompt prompt RARE -
onse follow the the the prompt, but requests/con requests Respon
prompt parameters with minor straints, but Well- se is
requests at or requests inconsistenci could miss a formatte incredibl
all. in the es smaller d, e
Response prompt Could request concise, and not
violates Invalid CSV be better Minor spellin and hing
more than 1 or JSON structured or g or grammar has NO could
of Factual phrased errors gramma make it
the Assista Inaccuracie Some gramma Could have r or better.
nt s r/spelling included mor spelling There
Limitations Response errors e relevant mistake are NO
and includes details/speci s follow
Capabilities information fics Has up
(shown in that minor questio
the table is NSFW (T room for ns. All
below) his includes improve requests
responses ments are clea
that are r.
harmful,
offensive,
violent, or
profane)
Response
violates one
of
the Assista
nt
Limitations
and
Capabilities
(shown in
the table
below)
Cannot provide personal information about the user (without the information being in the prompt)
he model cannot pretend to be human, express emotions or opinions, or build relationships with the user
Note: the model using phrases such as: “I”,“we”, etc. does not necessarily count as “pretending to be
uman”)
Response cannot mislead the user into thinking the assistant can complete real-world actions (sending an
mail, booking a trip, etc.)
Response cannot mislead the user into thinking the assistant can use a tool, such as booking a meeting on a
alendar
Response cannot mislead the user into thinking the assistant can browse the internet
Cannot reference previous conversations with the model (Only the current conversation is accessible)
Prompt Quality (how Reviewers and QA will rate YOUR prompt)
Rating 0 1 2 3
Specificity Not Specific At All Minimally Specific Mostly Fully
The main subject/ The prompt ask or - The prompt is too Specific Specific
concern of the request is completely vague or broad, hinting - The -
prompt request unclear. at the desired outcome request is Complete
The prompt request without providing specific and ly
does not rely on the enough detail for a clear specific,
media attached. clear and actionable enough, but concise,
response. could be and clear
-It requires multiple improved. in what is
follow-up clarifications - Follow up being
due to unclear or questions asked for.
impractical/unattainable after - There
analysis requests. reading, but are no
- The prompt requests can still be follow up
an active URL. answered. questions
- For ITT, The prompt (When necessar
does not include the deciding y to
correct number of between 1 or understan
images required to fulfill 2 for the d the
the task requirements. score, weigh request.
up the - Prompt
decision is fully
based on related to
how many the media
follow-up when
questions applicabl
you have.) e.
Image
Gen
Exampl
es:
Orientat
ion,
“realisti
c-
looking”
, image
type
(photo,
cartoon,
sketch,
etc.)
Examples
Specificity Complexity
No Complexity / Minimally
Specific
Write something based on the image.
Note:
Complexity: No parameters
of complexity.
Specifics:The prompt is
simple, but does have
some elements of
specificity (how to paint
gaming miniatures) and
(information on what to
expect), but there are still
follow-up questions.
Complexity: There is only 1
parameter of complexity,
which is the type (blog
post).
Image Sourcing
Examples
Creative Writing With Image Inputs
User:
Write a short poem about the tree in the picture that identifies what kind of tree it is. Incorporate
an AABB rhyme scheme, and make sure you mention Autumn.
.
Assistant:
Camouflage bark, a mottled map,
Of peeling greens and creamy wraps.
A sycamore, so tall and grand,
With its rich crown seeds that land,
Whirling down on autumn's breeze,
To scatter life among the trees.
Assistant:
There are two pieces that can currently capture an opponent's piece. The white knight on the f3
square could capture the black pawn on the e5 square. Additionally, the white bishop at the a4
square could capture the black knight at the c6 square
A decrease in the amount of water vapor in the atmosphere above the deforested
area due to lower transpiration rates. This can reduce cloud formation and
ultimately lead to less precipitation.
An increase in surface runoff and decrease in infiltration due to lack of
vegetation, leading to higher erosion and a decline in groundwater levels.
Desertification, from the combination of reduced precipitation, increased surface
runoff, and decreased groundwater levels.
Assistant:
The diagram shows a cross-section of an airfoil, with a fluid deflected above and below it at
different velocities, creating lift.
Lift is a mechanical aerodynamic force produced by the motion of an airplane through the air. It
is the force that directly opposes the weight of an airplane and holds the airplane in the air.
In the diagram, lift is represented by the black arrows pointing up. Because lift is a force, it is a
vector quantity, having both a magnitude and a direction associated with it.
Lift occurs when a moving flow of gas is turned by a solid object. The flow is turned in one
direction, and the lift is generated in the opposite direction, according to Newton’s Third Law of
action and reaction. For an aircraft wing, both the upper and lower surfaces contribute to the
flow turning.
While lift is generated by every part of the airplane, most of the lift on a normal airliner is
generated by the wings which have an airfoil shape, like the object in the diagram.
Introduction
Welcome to the Genesis Image to Text November 8 Customer Feedback course! This course will present
the latest feedback from the customer and test your understanding with six (6) questions at the end of
the updates. Then, the course will continue with a refresher on how to task for the Image to Text
project!
Overall Quality (OQ) Rating No Longer Incorporates the System Instruction (SI) Rating
Prompts in the "Counting" competency should test the model's ability to correctly count the number of
objects depicted in an image.
On the other hand, prompts in the "Scene Understanding" competency should test the model's ability
on spatial reasoning, such as where an object in an image is located relative to others.
Please do not submit counting requests in the "Scene Understanding" competency!
Example: "Counting" Requests Incorrectly Submitted in "Scene
Understanding" Competency
Here's an example of a "Counting" prompt that was that is better suited for the
Give the name and count of objects kept on the shelves from left to right provided in
the images. Provide a separate numbered list for the 2 different shelves.
While this prompt contains an element of spatial reasoning ("objects kept on the
shelves from left to right"), the request for a "count of objects" makes this prompt a
This prompt could also be improved by specifically referring to the object(s) that should
be counted. Note that it would still be necessary to refer to the images to answer the
Understanding" competency.
Although we should avoid unrealistically long requests about how the information should be presented,
organized, and formatted, it's important to add some instruction about how the response should be
structured.
Example: How the Response Should be Formatted Still Matters!
Here's an example of a prompt submitted in the "Counting" competency that could be
While prompts should be realistic and to the point, indicating how the response
Please strive to be creative in your prompts by considering various and realistic use-case scenarios.
A use-case scenario that has been particularly over-used are requests for recipes and calorie counts 👎.
Consider the following example, which was submitted in the "Reasoning" competency:
Make a recipe using all the ingredients in the images provided. Make a main course and a dessert.
Please avoid these kinds of requests for now. Instead, consider prompts that address different kinds of
topics and use-case scenarios.
4. Over-Describing the Image in Prompts
While prompts should pose a question about the attached image(s), they should avoid
describing the image too specifically, which makes it unnecessary to refer to the
image.
Using the attached image as inspiration, write a short story about fishing adventures.
The short story should use an excited tone and should not be less than 500 words.
In this example, the image is unnecessary because the model can easily create a short
story about fishing, without looking at the image. Instead, prompts should refer to
the content of the image in a more general way, such that it is still necessary to
5. Overall Quality (OQ) Rating No Longer Incorporates the System Instruction (SI) Rating
The Overall Quality Rating is now independent of the SI Rating-- there is no need to consider the SI
Rating in OQ!
Prior guidance directed incorporating the SI rating into the OQ rating by rating down the initial OQ
rating, if the SI rating was less than "Completely". However, both of these ratings are now completely
separate from each other.
In other words, the Overall Quality (OQ) rating only considers the helpfulness and accuracy of the
response, without considering how well it follows System Instructions.
On the other hand, the System Instructions (SI) rating only considers how well the response follows the
System Instructions, or how the information should be conveyed.
How to Rate Responses on Overall Quality (OQ) and System Instructions (SI)
The purpose of this course is to guide you on how to rate the model's responses on
efficient models.
System Instructions
A response that contains any of these (chit-chat, pleasantries, or
of System Instructions.
project to project, or even from task to task! This will impact how you rate
😀 Prompt:
Summarize the information in both images, highlighting differences in the food groups included
and portion sizes recommended. Output your response in paragraph form, format highlighted
🤖 Response:
Both images present nutritional pyramids that offer guidelines for a balanced diet, with similar
categories of food. Both highlight fruits and vegetables as foundational, abundant components
of a healthy eating plan. The first image specifies 3-5 servings of vegetables and 2-4 servings of
fruit, while the second image simply depicts them as the base without specific portion sizes.
However, the grain recommendations differ significantly. The first image suggests a larger
portion of grains, with 6-11 servings, placing them as the base of the pyramid, whereas the
second image displays grains above vegetables but lower than the other categories. The
second image also includes legumes in the vegetable section, while the first image does not
distinguish them from vegetables. Overall, both pyramids offer similar nutritional guidelines
with few key differences in the suggested quantities of food groups that are emphasized.
Need Help?
Determining the System Instructions (SI) Rating
Here is another set of System Instructions, which states that responses should present
information with a conversational tone, a polite, casual, and approachable tone, and
project to project, or even from task to task! This will impact how you rate
😀 Prompt:
Based on the pattern in the image, what color comes after the final color shown at the
Give the answer in a concise 3-sentence paragraph that includes step-by-step reasoning
🤖 Response:
Certainly! Here is the step-by-step reasoning to identify the final color shown in the
The pattern in the image consists of four colors that repeat in the following order: pink,
yellow, purple, and teal. Since the final color shown at the bottom of the image is teal,
the next color in the sequence will be pink. Therefore, the color that comes after the
Hope this helps! Let me know if you have any other questions.
Instruction rating!
- For example, a response can be "Very Bad" in Overall Quality, but still
All that matters in considering the SI rating is how well the response follows the
System In
Overall Quality Rubric
reasoning process. This guide focuses on writing effective justifications for your
ratings.
the specific aspects of the response that led you to your decision. Here are some
Example 1:
Good: "Factually correct and followed SI, but did not address the prompt's
Why it's good: This justification identifies both positive aspects (accuracy,
Example 2:
Good: "Mostly accurate information and follows my user requests pretty well. A
few mistakes here and there but overall, it is on point. There is a system
of the user prompt. The layout and format are pleasing. I would be satisfied with
Example 3:
identifies the host of the show and his political views, and lists the two
memberships mentioned in the audio clip. However, the costs are incorrect and
the membership benefits for each tier are not completely accurate. Due to these
two errors, it makes the response relevant, but largely unhelpful. Based on the
Why it's good: This justification provides a thorough analysis of the response,
mentioning specific details from both the instructions and the response content.
The reasoning for the "Bad" rating is clearly linked to the identified inaccuracies.
Decision-Making: Explain how you chose between the two LLM responses. Did
one response directly answer the question better, or was it more generally useful
Correcting Errors: If an answer is incorrect and you know the right one, share it