0% found this document useful (0 votes)
20 views48 pages

dolphin Genesis Image-to-Text

The Genesis Image-to-Text project focuses on creating prompts that require an attached image to answer questions across nine competencies. Each step involves understanding the competency, sourcing images, writing prompts, checking system instructions, and rating model responses. The document outlines specific requirements for prompt quality, image permissibility, and response evaluation to ensure effective interaction with the AI Assistant.

Uploaded by

briansimiyu105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views48 pages

dolphin Genesis Image-to-Text

The Genesis Image-to-Text project focuses on creating prompts that require an attached image to answer questions across nine competencies. Each step involves understanding the competency, sourcing images, writing prompts, checking system instructions, and rating model responses. The document outlines specific requirements for prompt quality, image permissibility, and response evaluation to ensure effective interaction with the AI Assistant.

Uploaded by

briansimiyu105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Genesis Image-to-Text

Attempter Specifications
Table of Contents:
Project Overview
Task Specifications
Step 1: Understanding the Competency
Competency View in the PAT
Step 2: Source an Image
Baseline image-to-text prompt requirements:
Image Permissibility requirements
Step 3: Write a Prompt
The prompt must meet all of the below requirements:
Baseline image-to-text prompt requirements:
Image Permissibility requirements
Prompt quality requirements
Do NOT proceed to step 4 until you have received back your approved prompt to submit
to arcade!
Competency View in Arcade
How to exit a task in Arcade
Step 4: Check System Instructions (SI)
Step 5: Rate the Model Responses
Appendix
Competency Guidance
Please upload high-quality images only. The image should be necessary to solve the
problem. Prompts that the AI Assistant can still solve while ignoring the image are not
helpful.
Creative Writing with Image Inputs
Reasoning with Multimedia Inputs (Text + Images)
Understanding Infographics and Graphs
Patterns with Multimedia Inputs (Text + Image)
Science with Multimedia Inputs (Text + Images)
Counting with Image Inputs
Math with Multimedia Inputs (Text + Images)
Scene Understanding with Image Inputs
Extracting Data from Images
System Instructions Ratings (to rate how well model response follows System Instructions)
Overall Response Ratings (to rate the accuracy and helpfulness of the model response)
Assistant Limitations and Capabilities
Prompt Quality (how Reviewers and QA will rate YOUR prompt)
Complexity vs. Specificity
Image Sourcing
Examples
Creative Writing With Image Inputs
Reasoning With Multimedia Inputs (Text + Images)
Understanding Infographics and Graphs
Science With Multimedia Inputs (Text + Images)
Math With Multimedia Inputs (Text + Images)
Counting with image inputs
Scene Understanding With Image Inputs
Patterns With Multimedia Inputs (Text + Images)
Extracting Data from Images

Project Overview
Welcome to the Genesis “Image & Text”-to-Text Project! This project focuses on creating
prompts that ask questions based on an attached image(s). There are nine competencies
from which you may choose to submit a prompt. Each competency has its own set of unique
instructions, so it is essential to carefully follow the specific guidance for each one. Please refer
to this Competency Guidance Table in the Appendix to understand the purpose and
requirements of each competency.
 Please note that prompts should be direct, simple, and natural
sounding. There is no need to include any context to justify why the request is being
made!
 The image must be necessary to answer the prompt. Your prompt should refer to
the image in a general way such that the model is forced to refer to it in order to
answer the prompt. If the prompt refers to the image too specifically, then the image
is unnecessary because the written prompt contains all of the relevant information If
the prompt refers to the image too vaguely, then it is likely that the prompt can be
answered by referring to any image, or the image is not relevant to the question at
hand.

Once the prompt and image pairings are submitted, the model will generate two responses.
These responses will be rated based on System Instructions and Overall Quality.

 The preferred response also should be direct, simple, and straightforward. It


should provide enough helpful information to fulfill the prompt request and satisfy the
User.

 Responses should not contain information that the prompt did not request,
chit-chat, or irrelevant details that are beside the point of our question (even
if they are interesting).
 Just because a response is longer does not mean it is necessarily
better!

Task Specifications
Step 1: Understanding the Competency
Competency View in the PAT
You will receive a task in the prompt approval tool (PAT) with one of the competencies filled in.
Please see the Competency Guidance table in the Appendix to understand each competency,
or navigate to the specific Competency via these links:

1. Creative Writing with Image


2. Reasoning with Multimedia
3. Understanding Infographics and Graphs
4. Patterns with Multimedia
5. Science with Multimedia
6. Counting with Image
7. Math with Multimedia
8. Scene Understanding with Image
9. Extracting Data from Images
Step 2: Source an Image
For each task, source an image either a personal image or an image sourced from the internet.
Once you have selected an image, you will then write a prompt relevant to the competency you
are working in. You may include more than one image, but all images must be needed to
answer the question. If you decide to upload multiple images, each image must be
necessary to answer the prompt. Please be careful because using multiple images also
carries the risk that one of the images is irrelevant to the prompt.

Images can be added anywhere within the prompt, e.g. at the beginning, end, or even in the
middle of the prompt. If you choose to upload multiple images, you may decide to put them in
several different places within the prompt. Please be creative and vary the position of the image
within the text.

Baseline image-to-text prompt requirements:

The prompt should be a 40-word (max) not obvious natural request that:

 Must follow completely and uniquely the right competency


 Has a factually verifiable answer that you could verify.
 Each image must be essential to answering the question and should be
referenced in the prompt without labeling them as "image 1," "image 2,"
etc., or needing extra information.
 Avoid preambles or justification or starting the prompt with an “I” (e.g., “I
want…”, “I need…”, “I am looking for…”).

Image Permissibility requirements

 Images must be high-resolution (pixel-dense, not blurred) and in the following


formats: JPEG, JPG, and PNG.
 Images must NOT be AI-generated, copyrighted, or include watermarks;
and cropping images is not allowed to exclude the source of information.
 Images must NOT contain any NSFW, harmful, or personal information.

To upload an image, go to the user input text box and select click the ‘Insert Image’
button outlined in the above. This should be a square polaroid outline with a ‘plus’ sign in the
top right corner. You will see a Markdown insert indicator where the image is applied to the text
input. E.g. <image-1>
To verify that the image is rendered correctly please refer to the ‘Markdown Preview’ on
the right hand side of the screen.

Step 3: Write a Prompt


The prompt must meet all of the below requirements:

Baseline image-to-text prompt requirements:

The prompt should be a 40-word (max) not obvious natural request that:

 Must follow completely and uniquely the right competency


 Has a factually verifiable answer that you could verify.
 Each image must be essential to answering the question and should be
referenced in the prompt without labeling them as "image 1," "image 2,"
etc., or needing extra information.
 Avoid preambles or justification or starting the prompt with an “I” (e.g., “I
want…”, “I need…”, “I am looking for…”).

Image Permissibility requirements

 Images must be high-resolution (pixel-dense, not blurred) and in the following


formats: JPEG, JPG, and PNG.
 Images must NOT be AI-generated, copyrighted, or include watermarks;
and cropping images is not allowed to exclude the source of information.
 Images must NOT contain any NSFW, harmful, or personal information.

Prompt quality requirements

 Relevance: The prompt completely follows the selected competency.


 Specificity: The prompt is completely specific, clear, and concise in what is
being asked for.
 Complexity: The prompt has 1-4 parameters (a request for how the output
should be formatted) that are creative and makes sense with respect to the
image and simulates a potential real-life use case when interacting with a
chatbot. There are no unnecessary/unnatural requirements and there isn’t an
excessive level of complexity.
 Presentation: The prompt is perfectly legible and makes total sense and has
high-quality spelling and grammar.

Make sure to aim to maximize diversity across the prompts you are submitting!
For the following four (4) competencies, the prompt should request a reasoning process
needed to arrive at the final answer:

 Reasoning with Multimedia Inputs (Text + Images)


 Patterns with Multimedia Inputs (Text + Images)
 Science with Multimedia Inputs (Text + Images)
 Math with Multimedia Inputs (Text + Image)

Note: Your prompt does not have to contain an explicit request that contains the words
"step by step." Prompts can also implicitly request a step-by-step process for the answer.
Please refer to this document for examples.

Do NOT proceed to step 4 until you have received back your approved
prompt to submit to arcade!

Competency View in Arcade


Once you receive your approved prompt in the PAT and log in to arcade, please select the
competency in which your approved task was done, enter your prompt, and generate model
responses.

How to exit a task in Arcade


To leave a task, please click the Abandon task button on the upper right hand side of the
screen and select the relevant reason.

Step 4: Check System Instructions (SI)


System Instructions may vary from task to task. You will rate the model response according to
the SI given in the task, so it is important to read through them.
Note: Please pay attention to these, as they are different from SI found on Audio-to-Text or
Video-to-Text tasks, though they may be subject to change.
Here are System Instructions that you may encounter in a task:
System Instruction Explanation

Extra details when appropriate Offer additional explanations, recommendations, or


context when relevant, but do so concisely. Focus on
adding value to the response without being overly
elaborate. Ensure all important aspects are
covered. Generally, the inclusion of any “extra details” in
an response should be considered in the Overall Quality
Rating, not the System Instructions Rating!

Avoid friendly chit-chat Responses should be clear and professional, focusing


solely on the task or information requested. Avoid casual
or conversational tones unless specifically asked for by
the user.

No pleasantries Exclude any greetings, farewells, or unnecessary banter in


responses. The content should be direct and strictly
related to the task, avoiding expressions like "Sure thing"
or "I hope this helps!"

No acknowledgements Do not repeat or acknowledge the user’s request in the


response. Start directly with the answer or task
completion, omitting phrases like "Here is your..." or "As
requested."

Markdown as an option Utilize Markdown formatting where appropriate to enhance


readability and organization. Use it for structuring
responses when it makes the content clearer, such as for
lists, headings, or emphasis (e.g., bold or
italics). Remember that specific formatting
requests should be considered in the Overall Quality
Rating, not the System Instruction Rating!

Step 5: Rate the Model Responses


When rating the AI's responses, you should evaluate each based on how well it followed
the System Instructions (SI) and Overall Response Quality (OQ). Please refer to the System
Instructions Rating Rubric and Overall Response Rating Rubric for specific guidance in the
Appendix below.
UPDATE: Treat the System Instruction ratings independently of the Overall Response
Quality ratings. Do NOT rate down Overall Response Quality due to a response ignoring
System Instructions.

Key factors to assess Overall Response Quality include helpfulness, format, presentation,
conciseness, factual accuracy, harmlessness, and focus.

 Responses that fully meet the prompt’s requests and are well-structured with no
errors should be rated higher than those with mistakes or missing details.
 The preferred response should be direct, simple, and straightforward. It
should provide enough helpful information to fulfill the prompt request and satisfy
the User.

 Responses should not contain information that the prompt did not
request, chit-chat, or irrelevant details that are beside the point of our
question (even if they are interesting).
 Just because a response is longer does not mean it is necessarily
better!

 For the following four (4) competencies, the AI Assistant should write out a
reasoning process needed to arrive at the final answer, followed by the
answer itself. Your prompt should include a request for this reasoning process.

 Reasoning with Multimedia Inputs (Text + Images)


 Patterns with Multimedia Inputs (Text + Images)
 Science with Multimedia Inputs (Text + Images)
 Math with Multimedia Inputs (Text + Image)

 The following AI Limitations are actions that the Assistant is unable to


do. Rate down the Overall Quality Rating to “Bad” or “Very Bad” for any
response that claims to have these capabilities and breaks Limitations:

 CAN do any sort of text based task including sharing information, creative
writing, math, and reasoning, etc.
 CANNOT browse the internet
 CANNOT use tools, such as a calendar application, notepad, etc.
 CANNOT take actions in the real world (e.g. send emails or book
holidays)
 DOES NOT have any knowledge about recent events within the last 6-12
months. (If asked about something more recent the assistant should
politely explain that it doesn’t know.)
 DOES NOT have access to previous conversations that were ended. It
CAN only see the current conversation.
 DOES NOT have any information about the User.,
 SHOULD NOT pretend to be human, express emotions or opinions, or
build relationships with the user.
Appendix
Competency Guidance
Please upload high-quality images only. The image should be necessary to solve the
problem. Prompts that the AI Assistant can still solve while ignoring the image
are not helpful.

Competency How to design a task Important Info Example

Creative Writing with Please try to choose Make sure to: Creative
images that are diverse in Writing w
mage Inputs medium, content, topic, etc. Image Inputs
 Specify how the
You can use images of any image should be Example
 Design tasks kind: paintings, drawings, incorporated into
which require cartoons, illustrations, the creative output
the AI collages, etc. They can (e.g. as inspiration
Assistant to capture a wide range of for the piece, as
produce a content: still life, an illustration to
creative output landscapes, architecture,
accompany the
using an image portraits, screenshot,
piece, etc.)
as input in graphs, charts, etc. or any  Include
some way. Be other image that you think
constraints for the
creative with is suitable for your task. expected output,
both how the
such as length,
image is used
 Ask the AI style, tone, mood,
in the prompt
Assistant to rhyme scheme for
as well as with
produce poetry, etc.
the
creative  Vary your prompts
requirements
output in any in language,
you set for the
format, length, level of
Assistant!
based on the detail and types of
attached images and tasks.
image(s),  The prompt
 Suggested specifies
output can constraints or
include requires
poems reasoning that
(haiku, the response
ballad, ode, should follow
etc.), songs
(e.g. rap,
pop, ballad),
stories (short
story,
chapter,
movie
scene),
scripts (for a
movie or
play), jokes
or a memes
(or any other
funny piece),
letters, etc.

Reasoning with Requests for this  The prompt must Reasoning w


competency may include request a Multimedia
Multimedia Inputs (Text the following: reasoning process Inputs
+ Images) as to how the Example
 Suggesting answer was
 Design tasks the next determined, in
for the AI move in a addition to the
Assistant that puzzle or answer itself.
test the ability game  The AI Assistant
to perform  Check should write out
reasoning with whether a the reasoning
multimedia given shape process needed to
inputs. will fit within arrive at the final
Reasoning another answer, followed
tasks require shape by the answer
the AI  Estimations itself.
Assistant to or  The prompt
compositionall measureme specifies
y infer new nts over constraints or
information satellite requires
either imagery reasoning that
deductively or  Answering the response
inductively science or should follow
(does not math based  While science and
primarily rely questions math- based
on that use a questions are fine
memorization figure to submit in the
or retrieval). Reasoning
 This includes, competency, it is
but is not preferred that
limited to, these are limited
games and to those who are
puzzles, experts in STEM
operations and fields.
logistics
questions,
and mathemati
cal and
scientific
problem
solving.

Understanding Requests for this  Please note that Understandin


competency may include the AI Assistant g
nfographics and the following: CANNOT provide Infographics
Graphs images in its and Graphs
 Ask the output. Please Example
 Design tasks Assistant to take this into
which require summarize account when
the AI the key constructing your
Assistant to results prompts, and
understand, displayed in ensure that your
summarize, or a graph, and requests ]make
reason over to explain sense for text-only
the information what the outputs.
conveyed by implication  Do not use
charts, graphs, of these infographics that
plots, results might include
diagrams, be. watermarks,
infographics or  Ask the source
similar Assistant to information, or are
visualizations. answer a under copyright.
specific  Cropping out
question watermarks and
about one source information
element of does NOT make
an the image
infographic permissible.
that a user
might not
understand.
 Ask the
Assistant to
hypothesize
some
possible
explanations
for a
particular
result or
datapoint in
a given
graph or
plot.
 Ask the
Assistant to
reason
about the
hypothetical
counterfactu
als in the
displayed
diagram. For
example, in
a wiring
diagram, you
might ask
the Assistant
to tell you
what might
happen if a
particular
wire
connection
was
changed.

Patterns with Requests for this  Pattern images Patterns w


competency may include should depict a Multimedia
Multimedia Inputs (Text the following: Inputs
regular sequence
+ Image) that is repeated Example
 Answering or repeatable.
 Design tasks questions on What comes next
for the AI visual in the sequence
Assistant that pattern- can be predicted
test the ability matching or with 100%
to infer identification accuracy.
patterns from  Finding  The prompt must
multimedia patterns or request a
inputs. similarities in reasoning process
Reasoning a diagram, as to how the
tasks that photo, or answer was
require the AI other image determined, in
Assistant to  Finding addition to the
infer new errors in an answer itself.
information existing  The AI Assistant
either answer to a should write out
deductively or pattern- the reasoning
inductively and identification process needed to
do not problem arrive at the final
primarily rely
answer, followed
on
by the answer
memorization
itself.
or retrieval.
 The prompt
specifies
constraints or
requires
reasoning that
the response
should follow

Science with Requests for this  The prompt must Science w


competency may include request a Multimedia
Multimedia Inputs (Text the following: reasoning process Inputs
+ Images) as to how the Example
 Identifying answer was
 Design tasks scientific determined, in
for the AI structures or addition to the
Assistant that concepts answer itself.
test the ability from an  The AI Assistant
to answer Image should write out
scientific (chemistry, the reasoning
problems using biology, process needed to
multimedia physics, or arrive at the final
inputs. other answer, followed
sciences) by the answer
 Answering itself.
questions  The prompt
that require specifies
scientific constraints or
knowledge requires
and reasoning that
reasoning the response
about an should follow
image, table,
or diagram
 Finding
errors in an
existing
image-based
answer to a
science
problem

Counting with Image Requests for this  Avoid questions Counting w


competency may include that require Image Inputs
nputs the following: common Example
knowledge like
 Design tasks “How many legs
 Counting all
for the AI does a cat have?”
visible
Assistant that  The image should
objects in an
test the ability be self contained
image
to count and not require
 Counting
objects in an any additional
objects in an information to be
image. image, given able to answer the
a constraint question.
(e.g. count  If a counting task
the number has
of trees in a correct respons
this image, e (meaning that
excluding you haven’t
the ones in provided an
the unattainable or
background) ambiguous
 Counting a task), approximati
specific type ons cannot be
of object rated as “Good”
(e.g. how for the overall
many birds quality.
are in the
image?)  For
 Estimating example, if
the number the correct
of objects in answer to
an image a question
when the is ‘21’ and
exact the model
number is responds
impossible with an
to determine answer of
‘19’, this
Try to vary the language should be
you use in the prompt as rated as
much as possible e.g.: ‘Bad’ or
lower.
 How many x
are there?
 Count the
number of x
 Are there
more than x
in the
image?
 Give me the
total number
of x in this
photo

Math with Multimedia Requests for this  The prompt must Math w
competency can include request a Multimedia
nputs (Text + Images) the following: reasoning process Inputs
in how the answer Example
was determined,
 Design tasks  Answering in addition to the
for the AI math answer itself.
Assistant that questions
test the ability using a  The AI Assistant
to perform diagram, should write out
math with figure, or the reasoning
multimedia graph process needed to
inputs. (algebra, arrive at the final
calculus, answer, followed
geometry, by the answer
topology, set itself.
theory,  The prompt
tables, etc.) specifies
 Finding constraints or
errors in an requires
image reasoning that
containing the response
equations of should follow
mathematica
l reasoning
 Make
statistical
estimates of
a quantity in
an image
using
mathematica
l reasoning
and
approximatio
ns
 Evaluate
graphical
proofs

Scene Understanding Create tasks that test  The image should Scene
whether the Assistant can be self-contained Understandin
with Image Inputs describe where objects are and not require g w Image
in an image in relation to any additional Inputs
 Design tasks each other. Focus on information to be Example1
focused on spatial relationships such able to answer the
testing the as front vs back, left vs question.
Assistant’s right, and top vs  The prompt
ability to bottom. Include size, color, specifies
understand and other distinctive constraints or
spatial features of the objects requires
relationships when relevant. reasoning that
(front vs back,
the response
should follow.
left vs right, top  For  Do NOT request
vs bottom, etc; example, JSON/ CSV file
size, color, and you could formats!
other elements ask the AI
in the image.) Assistant to
‘Tell me
what objects
are located
to the left of
the chair,
under the
table.’

Please also make sure to


choose images which
convey information that you
understand so that you can
accurately assess how well
the AI Assistant deals with
your input.

Please be creative in how


you ask for the description
and vary the level of detail
that you request from the
AI Assistant in its
descriptions.

 For
example,
you can ask
for extremely
detailed or
concise
descriptions.
 You can also
focus on
specific
parts of the
image or the
image as a
whole.
 Choose
tasks that
real users
would ask
the AI
Assistant
that would
be helpful or
useful.

Extracting Data from Data extraction involves  Ask the AI to Extracting


extracting specific pieces of extract and Data from
mages information about an image structure Images
and outputting that information from Example
 Design tasks information in a structured images (e.g.,
for the AI format such as a table, product
Assistant that JSON or a comma- characteristics,
test the ability separated file (CSV). attributes,
to perform data identifiers).
extraction from  Specify the output
 For
images. format (JSON,
example,
businesses CSV), or request
might want structured data
to create a output in
product a table or list form
description at
by extracting  Do not ask
the questions that rely
characteristi upon ambiguous
cs or information.
features of a  Verify that the
product from response is
images of a valid file by
that product. using JSON
 Please also CHECKER for
be creative JSON files;
and try to Toolkit Bay CSV
think of Validator
different Tool or Lambda
situations in
Test Validator
which
businesses Tool for CSV files
or
individuals
might want
to extract
specific
pieces of
information
from an
image in a
structured
way.
System Instructions Ratings (to rate how well model response follows
System Instructions)

Treat the System Instruction ratings independently of the Overall Response Quality
ratings.
SI Rating SI (System Instructions)

Not At All  Ignores ALL System Instructions

Somewhat  Ignores most of the System Instructions, but follows some -


violates three System Instructions

Moderately  Follows most System Instructions, but misses major aspects -


violates two System Instructions

Mostly  Follows almost all System Instructions, but has some minor
mistakes or omissions - violates one System Instruction

Completely  Perfectly follows ALL aspects of System Instructions

Overall Response Ratings (to rate the accuracy and helpfulness of the
model response)

Do NOT rate down Overall Response Quality due to a response ignoring System
Instructions.
Resp Very Bad Bad OK Good Very Amazi
onse Good ng
Ratin
g

Model Major Factu Fails to Follows some Fulfills All Fulfills A This
Resp ality Errors meet parameters major LL rating is
Did not several of and requests in prompt prompt RARE -
onse follow the the the prompt, but requests/con requests Respon
prompt parameters with minor straints, but Well- se is
requests at or requests inconsistenci could miss a formatte incredibl
all. in the es smaller d, e
Response prompt Could request concise, and not
violates Invalid CSV be better Minor spellin and hing
more than 1 or JSON structured or g or grammar has NO could
of Factual phrased errors gramma make it
the Assista Inaccuracie Some gramma Could have r or better.
nt s r/spelling included mor spelling There
Limitations Response errors e relevant mistake are NO
and includes details/speci s follow
Capabilities information fics Has up
(shown in that minor questio
the table is NSFW (T room for ns. All
below) his includes improve requests
responses ments are clea
that are r.
harmful,
offensive,
violent, or
profane)
Response
violates one
of
the Assista
nt
Limitations
and
Capabilities
(shown in
the table
below)

Assistant Limitations and Capabilities


If the model response breaks an AI Limitation, rate down the Overall Response Quality
rating to “Bad” or “Very Bad”.
Cannot hallucinate current information that is not in its database (within the last 6-12 months)

Cannot provide personal information about the user (without the information being in the prompt)

he model cannot pretend to be human, express emotions or opinions, or build relationships with the user
Note: the model using phrases such as: “I”,“we”, etc. does not necessarily count as “pretending to be
uman”)

Response cannot mislead the user into thinking the assistant can complete real-world actions (sending an
mail, booking a trip, etc.)

Response cannot mislead the user into thinking the assistant can use a tool, such as booking a meeting on a
alendar

Response cannot mislead the user into thinking the assistant can browse the internet

Cannot reference previous conversations with the model (Only the current conversation is accessible)
Prompt Quality (how Reviewers and QA will rate YOUR prompt)

Rating 0 1 2 3
Specificity Not Specific At All Minimally Specific Mostly Fully
The main subject/ The prompt ask or - The prompt is too Specific Specific
concern of the request is completely vague or broad, hinting - The -
prompt request unclear. at the desired outcome request is Complete
The prompt request without providing specific and ly
does not rely on the enough detail for a clear specific,
media attached. clear and actionable enough, but concise,
response. could be and clear
-It requires multiple improved. in what is
follow-up clarifications - Follow up being
due to unclear or questions asked for.
impractical/unattainable after - There
analysis requests. reading, but are no
- The prompt requests can still be follow up
an active URL. answered. questions
- For ITT, The prompt (When necessar
does not include the deciding y to
correct number of between 1 or understan
images required to fulfill 2 for the d the
the task requirements. score, weigh request.
up the - Prompt
decision is fully
based on related to
how many the media
follow-up when
questions applicabl
you have.) e.

Complexit No Complexity At All Inappropriate/ Complex N/A


There are no Insufficient - The prompt
y complexity parameters Complexity includes 1-4
Hurdles/
constraints the present. - All or most of the parameters
model must prompt parameters are that are
address in order unnatural due to focused and
to fulfill the excessive or illogical relevant to
request, known
as parameters output requests. the prompt
(audience, format, - The prompt contains ask.
etc). too many parameters of
complexity (over four - The
elements). request does
not contain
any
unnecessary
requirement
s, or
excessive
complexity.

Relevance Not Relevant Partially Relevant Relevant N/A


The Competency - Request does NOT - The prompt only - Completely
follow the task partially meets the and uniquely
competency at all competency follows the
- The ITT requirements or it is selected
Competency requirem unclear. competency
ent for reasoning is
not met.
- The prompt has
multiple asks that
are mixing
competencies.

Presentati Unintelligible Major Issues Minor Perfectly


- No user input was - The prompt contains Issues Legible
on provided at all. written instructions for - Minor - Scans
The way the
prompt - The user input is the reviewer, significant spelling, well and
looks/reads unintelligible and hard spelling/grammar grammar, makes
to make any sense of. errors, and poor and total
structure that hinders readability sense.
readability. issues are - High-
- Code/LaTeX is either present; quality
unnecessary, poorly slight spelling
presented, or results in structural and
unnatural improvement grammar
overformatting. s could througho
enhance ut, but the
clarity and use of
ease of slang or
understandin intended
g. typos for
creative
purposes
is
allowed.

Complexity vs. Specificity

Complexity Specificity The Differenc


e…
Complexity refers to Specificity refers to the When considering
the “HOW” and “WHAT” of the if it is specific and
“WHY” of the prompt. It is a metric complex,
prompt. It is based on based on the clarity of consider...
“parameters” the requests asked in a Does this have to do
requested in the prompt. with what is being
prompt. Examples: asked or how the
Examples of model should
parameters:  Question answer?
s or reque
 Type: “ sts asked  If it has
Essay”, in the to do
“story”, prompt. with h
“summa ow the
ry”,  Ho respon
“email”, wc se
“letter”, lea should
“blog r is answer
post”, the the
“step- pro prompt
by-step” mpt , it’s a
 Atmos req param
phere: ues eter of
Style, t? If compl
tone, you exity.
mood, hav  If it has
audienc e to do
e, no with
vocabul foll the
ary ow- actual
 Limitati up questio
ons: C que ns
onstrain stio asked
ts, ns in the
limits, in prompt
output reg , it is a
require ard specifi
ments, s to city
word wh param
count, at eter.
reading is
level wa One way to think
 Format nte about it is…
ting: Fo d, Specifics are
rmat, the the ‘Questions’ and
layout, n it Complexity is the
section is 'Delivery' (paramete
s, spe
order, rs of how the
organiz cific response should be
ation . delivered.)

 Image
Gen
Exampl
es:
Orientat
ion,
“realisti
c-
looking”
, image
type
(photo,
cartoon,
sketch,
etc.)

Examples
Specificity Complexity
No Complexity / Minimally
Specific
Write something based on the image.
Note:

 Specifics: Prompt is very


simple with very little
specificity (based on the
image), making the request
unclear.

 Complexity: No parameters
of complexity.

Simple / Mostly Specific


Write a blog post about how to paint
gaming miniatures and give information on
what to expect.
Don’t reference brands.
Note:

 Specifics:The prompt is
simple, but does have
some elements of
specificity (how to paint
gaming miniatures) and
(information on what to
expect), but there are still
follow-up questions.
 Complexity: There is only 1
parameter of complexity,
which is the type (blog
post).

Intermediate / Fully Specific


Write a blog post about how to paint
gaming miniatures and the supplies
needed.
Organize the lesson into these sections:
- How to prep a figure
- How to paint a figure
Note:

 Specifics: The prompt has


several elements of
specificity (how-to paint),
(information on what to
expect), and (section titles)
and the ask is clear.
 Complexity: There are 2
elements of complexity,
which are the type (blog
post) and how to organize
the lesson (into the given
sections).

Complex / Fully Specific


“Write a blog post that is targeted at
beginners trying miniature painting.
Organize the post into these subsections
with headers:
- What You’ll Need
- How to Paint Your First Miniature
Don’t recommend any specific brands.”
Note:

 Specificity: This prompt


has a clear request that is
relevant, makes sense,
and is clear. It includes
the topic (trying miniature
painting) and the section
titles.
 Complexity: This is a
complex prompt with
several parameters. It
includes the type (blog
post), audience (targeted
at beginners), and
organization (subsections
with headers).

Image Sourcing

Examples
Creative Writing With Image Inputs
User:
Write a short poem about the tree in the picture that identifies what kind of tree it is. Incorporate
an AABB rhyme scheme, and make sure you mention Autumn.

.
Assistant:
Camouflage bark, a mottled map,
Of peeling greens and creamy wraps.
A sycamore, so tall and grand,
With its rich crown seeds that land,
Whirling down on autumn's breeze,
To scatter life among the trees.

Reasoning With Multimedia Inputs (Text + Images)


User:
Is there a move the player with the white chess pieces can take on this turn to capture one of
the opponent’s black pieces? Please provide the reasoning on how you arrived at your answer.

Assistant:
There are two pieces that can currently capture an opponent's piece. The white knight on the f3
square could capture the black pawn on the e5 square. Additionally, the white bishop at the a4
square could capture the black knight at the c6 square

Understanding Infographics and Graphs


User:
What would happen in this example water cycle if there was significant deforestation? Give a
step-by-step account of how this change would impact each stage of the example water cycle,
taking into account the specific landscape depicted here.
Assistant:
The most immediate and direct impact of deforestation would be a significant reduction in
transpiration. With fewer trees, less water would be drawn from the soil and released into the
atmosphere as water vapor.
Subsequent effects could include:

 A decrease in the amount of water vapor in the atmosphere above the deforested
area due to lower transpiration rates. This can reduce cloud formation and
ultimately lead to less precipitation.
 An increase in surface runoff and decrease in infiltration due to lack of
vegetation, leading to higher erosion and a decline in groundwater levels.
 Desertification, from the combination of reduced precipitation, increased surface
runoff, and decreased groundwater levels.

Science With Multimedia Inputs (Text + Images)


User:
Explain the following diagram, which represents an aerodynamic phenomenon, and how it
relates to air travel with reasoning.

Assistant:
The diagram shows a cross-section of an airfoil, with a fluid deflected above and below it at
different velocities, creating lift.
Lift is a mechanical aerodynamic force produced by the motion of an airplane through the air. It
is the force that directly opposes the weight of an airplane and holds the airplane in the air.
In the diagram, lift is represented by the black arrows pointing up. Because lift is a force, it is a
vector quantity, having both a magnitude and a direction associated with it.
Lift occurs when a moving flow of gas is turned by a solid object. The flow is turned in one
direction, and the lift is generated in the opposite direction, according to Newton’s Third Law of
action and reaction. For an aircraft wing, both the upper and lower surfaces contribute to the
flow turning.
While lift is generated by every part of the airplane, most of the lift on a normal airliner is
generated by the wings which have an airfoil shape, like the object in the diagram.

Math With Multimedia Inputs (Text + Images)


User:
What is (area of large circle) – (area of small circle) in the figure above? Please explain how you
arrive at the answer.
Assistant:
In the figure, we are given a large circle and a small circle inside it; with the diameter equal to
the radius of the large one. The diameter of the small circle is 4 cm. This means that its radius is
2 cm. Since the diameter of the small circle is the radius of the large circle, the radius of the
large circle is 4 cm. The area of a circle is calculated by: πr2 where r is the radius.
Area of the small circle: π(2)2 = 4π
Area of the large circle: π(4)22 = 16π
The difference area is found by:
Area of the large circle – Area of the small circle = 16π – 4π = 12π
The answer is 12π.

Counting with image inputs


User:
How many sweets are in the jar? Explain how you worked out the solution.
Assistant:
An approximate method to calculate the number of sweets in a jar is to multiply
the number along the width and length of the base by the number of sweets in the
height of the jar. Granular Matter theory then tells us that on average a jar of
mixed shapes will have about a 30% air gap in between the sweets. For a jar with
12 sweets along both the width and length of the base and a height of 18 sweets
you would need to calculate 12x12x18=2376. Then, to take into account the gap in
between the sweets, reduce this total by thirty percent, 0.70×2376, giving an
answer of 1663 sweets in the jar.

Scene Understanding With Image Inputs


Example 1:
User:
List all the objects to the left of the laptop and describe them in detail.
Assistant:
There are three objects to the left of the laptop. The first is a brown leather-bound notebook, the
second is a metal single hole puncher, and the third is a small green rose succulent in a white
pot.
Example 2:

Patterns With Multimedia Inputs (Text + Images)


User: Identify the shape that completes the pattern. Explain the pattern and your reasoning.
Assistant:
The correct answer is shape 5.
The pattern across the columns (left-to-right) is a rotation (90 degrees).
The pattern down the rows (top-to-bottom) is a progression (adding one line on each step).
Combining these two patterns produces shape number 5.

Extracting Data from Images


User:
Please tell me about the following attributes of the dish in the image above: cuisine, dish-
category (starter, main, side-dish), ingredients, allergens, vegetarian (true/false/unclear), vegan
(true/false/unclear). Only include ingredients that are clear from the image. Output as JSON.
Assistant:

dish-attributes {
"cuisine": "italian",
"dish-category": "main",
"ingredients": ["pasta", "pork", "cheese"],
"allergens": ["gluten", "dairy"],
"vegetarian": "false",
"vegan": "false"
}

Introduction

Welcome to the Genesis Image to Text November 8 Customer Feedback course! This course will present
the latest feedback from the customer and test your understanding with six (6) questions at the end of
the updates. Then, the course will continue with a refresher on how to task for the Image to Text
project!

The customer feedback consists of five (5) topics:

"Counting" Requests Incorrectly Submitted in "Scene Understanding" Competency

Prompts Should Specify How the Response Should be Presented

Greater Prompt Diversity

Over-Describing the Image in Prompts

Overall Quality (OQ) Rating No Longer Incorporates the System Instruction (SI) Rating

1. "Counting" Requests Incorrectly Submitted in "Scene Understanding" Competency

Prompts in the "Counting" competency should test the model's ability to correctly count the number of
objects depicted in an image.

On the other hand, prompts in the "Scene Understanding" competency should test the model's ability
on spatial reasoning, such as where an object in an image is located relative to others.
Please do not submit counting requests in the "Scene Understanding" competency!
Example: "Counting" Requests Incorrectly Submitted in "Scene
Understanding" Competency
Here's an example of a "Counting" prompt that was that is better suited for the

Counting competency, rather than the Scene Understanding competency:

Give the name and count of objects kept on the shelves from left to right provided in

the images. Provide a separate numbered list for the 2 different shelves.

While this prompt contains an element of spatial reasoning ("objects kept on the

shelves from left to right"), the request for a "count of objects" makes this prompt a

much better fit for the "Counting" competency.

This prompt could also be improved by specifically referring to the object(s) that should

be counted. Note that it would still be necessary to refer to the images to answer the

prompt, even if the prompt specifically referred to the object(s) to be counted.

Please avoid "counting" requests in prompts submitted in the "Scene

Understanding" competency.

2. Prompts Should Specify How the Response Should be Presented


While we have encouraged greater "Complexity" in prompts to test the model's ability to understand
and reason, please continue to specify how responses should be formatted as well. This does not need
to be overly elaborate!

Although we should avoid unrealistically long requests about how the information should be presented,
organized, and formatted, it's important to add some instruction about how the response should be
structured.
Example: How the Response Should be Formatted Still Matters!
Here's an example of a prompt submitted in the "Counting" competency that could be

improved by including some instructions on how the response should be formatted.

How many empty parking spaces are there in this picture?

While prompts should be realistic and to the point, indicating how the response

should be structured is an important element to incorporate in prompts!

3. Greater Prompt Diversity

Please strive to be creative in your prompts by considering various and realistic use-case scenarios.
A use-case scenario that has been particularly over-used are requests for recipes and calorie counts 👎.
Consider the following example, which was submitted in the "Reasoning" competency:

Make a recipe using all the ingredients in the images provided. Make a main course and a dessert.

Please avoid these kinds of requests for now. Instead, consider prompts that address different kinds of
topics and use-case scenarios.
4. Over-Describing the Image in Prompts
While prompts should pose a question about the attached image(s), they should avoid

describing the image too specifically, which makes it unnecessary to refer to the

image.

Here's an example of a prompt that over-describes the image:

Using the attached image as inspiration, write a short story about fishing adventures.

The short story should use an excited tone and should not be less than 500 words.

In this example, the image is unnecessary because the model can easily create a short

story about fishing, without looking at the image. Instead, prompts should refer to
the content of the image in a more general way, such that it is still necessary to

refer to the image to provide a response.

5. Overall Quality (OQ) Rating No Longer Incorporates the System Instruction (SI) Rating

The Overall Quality Rating is now independent of the SI Rating-- there is no need to consider the SI
Rating in OQ!

Prior guidance directed incorporating the SI rating into the OQ rating by rating down the initial OQ
rating, if the SI rating was less than "Completely". However, both of these ratings are now completely
separate from each other.

In other words, the Overall Quality (OQ) rating only considers the helpfulness and accuracy of the
response, without considering how well it follows System Instructions.

On the other hand, the System Instructions (SI) rating only considers how well the response follows the
System Instructions, or how the information should be conveyed.

How to Rate Responses on Overall Quality (OQ) and System Instructions (SI)
The purpose of this course is to guide you on how to rate the model's responses on

Overall Quality (OQ), and on following System Instructions (SI).

Why Rate Model Responses?


Rating model responses correctly is essential for teaching the AI to improve its accuracy

and effectiveness. High-quality ratings improve AI responses, making them practical,


accurate, and helpful. Consistent and accurate ratings lead to more reliable and

efficient models.

You will rate model responses on two measures:

 System Instructions

 Overall Response Quality

 Determining the System Instructions (SI) Rating


 The System Instructions (SI) rating measures how well the response follows the

System Instructions, or how the response is conveyed.

 This set of System Instructions requires responses to present information

directly. Avoid chit-chat, pleasantries, or acknowledgments.


 A response that contains any of these (chit-chat, pleasantries, or

acknowledgements) should be rated down in the SI Rating with this set

of System Instructions.

 Please be advised that the System Instructions may change from

project to project, or even from task to task! This will impact how you rate

responses in the System Instruction rating!

xample of a direct SI instructions


The following response is an example of a "direct" System Instructions with no textual evidence

of any chit-chat, pleasantries, or acknowledgements.

😀 Prompt:

Summarize the information in both images, highlighting differences in the food groups included

and portion sizes recommended. Output your response in paragraph form, format highlighted

differences in bold text and italicize information consistent in both images.

🤖 Response:
Both images present nutritional pyramids that offer guidelines for a balanced diet, with similar

categories of food. Both highlight fruits and vegetables as foundational, abundant components

of a healthy eating plan. The first image specifies 3-5 servings of vegetables and 2-4 servings of

fruit, while the second image simply depicts them as the base without specific portion sizes.

However, the grain recommendations differ significantly. The first image suggests a larger

portion of grains, with 6-11 servings, placing them as the base of the pyramid, whereas the

second image displays grains above vegetables but lower than the other categories. The

second image also includes legumes in the vegetable section, while the first image does not

distinguish them from vegetables. Overall, both pyramids offer similar nutritional guidelines

with few key differences in the suggested quantities of food groups that are emphasized.

Need Help?
Determining the System Instructions (SI) Rating
Here is another set of System Instructions, which states that responses should present

information with a conversational tone, a polite, casual, and approachable tone, and

also include acknowledgements. All of these elements must be included in the

response to receive a "Completely" rating under this set of SI!


Again, please be advised that the System Instructions may change from

project to project, or even from task to task! This will impact how you rate

responses in the System Instruction rating!

Example of different set of instructions


The following response is an example of system instructions with the inclusion of chit-

chat, pleasantries, and acknowledgments.

😀 Prompt:
Based on the pattern in the image, what color comes after the final color shown at the

bottom of the image?

Give the answer in a concise 3-sentence paragraph that includes step-by-step reasoning

that identifies each color in the pattern.

🤖 Response:

Certainly! Here is the step-by-step reasoning to identify the final color shown in the

bottom of the image:

The pattern in the image consists of four colors that repeat in the following order: pink,

yellow, purple, and teal. Since the final color shown at the bottom of the image is teal,

the next color in the sequence will be pink. Therefore, the color that comes after the

final color shown is pink.

Hope this helps! Let me know if you have any other questions.

Determining the System Instructions (SI) Rating


Rate down the SI rating by one level for each System Instruction break, as shown

by the rubric below.


Remember, the helpfulness or accuracy of the response does not affect the System

Instruction rating!

 SI rating is independent of the Overall Quality rating!

- For example, a response can be "Very Bad" in Overall Quality, but still

meet the System Instructions "Completely"!

 All that matters in considering the SI rating is how well the response follows the

System In
 Overall Quality Rubric

 Writing Effective Justifications


 Introduction

 Welcome to the course on writing effective justifications! Your dedication to

providing high-quality ratings is crucial in training our Assistant to deliver


accurate and insightful responses. While assigning scores is essential, your

written feedback provides invaluable context and helps us understand your

reasoning process. This guide focuses on writing effective justifications for your

ratings.

What Makes a Good Justification?


A good justification clearly explains the rationale behind your rating. It should highlight

the specific aspects of the response that led you to your decision. Here are some

examples of clear and helpful justifications:

Example 1:

 Good: "Factually correct and followed SI, but did not address the prompt's

request to end with a conclusion."

 Why it's good: This justification identifies both positive aspects (accuracy,

adherence to system instructions) and the specific shortcoming (missing

conclusion) that influenced the rating.

Example 2:

 Good: "Mostly accurate information and follows my user requests pretty well. A

few mistakes here and there but overall, it is on point. There is a system

instruction violation in that first sentence which sounds like an acknowledgment

of the user prompt. The layout and format are pleasing. I would be satisfied with

this result in real life."


 Why it's good: This justification provides a balanced assessment,

acknowledging both strengths and weaknesses. It also highlights specific

examples (system instruction violation, formatting) to support the rating.

Example 3:

 Good: "This response completely meets the system instructions, correctly

identifies the host of the show and his political views, and lists the two

memberships mentioned in the audio clip. However, the costs are incorrect and

the membership benefits for each tier are not completely accurate. Due to these

two errors, it makes the response relevant, but largely unhelpful. Based on the

rubric, this deserves a 'Bad' rating."

 Why it's good: This justification provides a thorough analysis of the response,

mentioning specific details from both the instructions and the response content.

The reasoning for the "Bad" rating is clearly linked to the identified inaccuracies.

Potential Elements to Include


When writing your justifications, consider including the following elements:

 Errors: Highlight any factual inaccuracies or inconsistencies in the response.

 System Instruction Adherence: Note whether the response follows all

specified system instructions.


 User Request Fulfillment: Assess how well the response addresses the user's

prompt and specific requests.

 Formatting and Layout: Mention if the response's formatting and presentation

are clear and effective.

 Helpfulness and Relevance: Evaluate the overall helpfulness and relevance of

the information provided.

 Decision-Making: Explain how you chose between the two LLM responses. Did

one response directly answer the question better, or was it more generally useful

and informative overall?

 Correcting Errors: If an answer is incorrect and you know the right one, share it

to help maintain accuracy.

You might also like