0% found this document useful (0 votes)
2 views4 pages

ITT

The document provides feedback for contributors on improving prompts for the Image-to-Text project, emphasizing that prompts should challenge the model and avoid simplicity. It outlines principles for crafting effective prompts, including the use of natural language, clarity, and ensuring requests are attainable based on the provided image information. Contributors are advised to avoid overly simplistic prompts and to incorporate real-world scenarios to enhance the model's reasoning capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

ITT

The document provides feedback for contributors on improving prompts for the Image-to-Text project, emphasizing that prompts should challenge the model and avoid simplicity. It outlines principles for crafting effective prompts, including the use of natural language, clarity, and ensuring requests are attainable based on the provided image information. Contributors are advised to avoid overly simplistic prompts and to incorporate real-world scenarios to enhance the model's reasoning capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Genesis Image-to-Text Feedback, 03/21

Dear Contributors, please take a moment to review the following customer feedback for the
Image-to-Text (ITT) project! The main takeaway is that prompts must appropriately
challenge the model to reason through the question to get to the answer! In other words,
the answer should NOT be obvious! As always, prompts should draw upon real-world
scenarios where you might ask an AI for guidance, using simple, direct, and natural-
sounding language.

Please do NOT attempt tasking any further before reading this feedback!

1. Scene Understanding/ Counting Prompts should NOT be simple or easy to answer 🫥


2. Prompts should use natural-sounding language 🌿
3. Prompts should be clear and unambiguous! 🎯 🔍
4. No Unattainable Requests! 🙅

1. Scene Understanding/ Counting Prompts should NOT be simple or easy


to answer 🫥
In order to improve the model’s abilities, we need to challenge it by submitting well-thought out,
creative prompts that test its understanding of what is being asked. Please avoid submitting
overly simplistic prompts that can be easily answered, particularly in the Scene
Understanding and Counting competencies ⛔.

Here’s an example of too-simple prompt that was submitted in the Scene Understanding
competency:

Prompt Example
While this prompt is appropriate for the Scene Understanding competency in that it asks about
the position of the car relative to the garage door, the answer almost immediately evident just
by looking at the image 👎. Please avoid overly simple prompts like this, particularly in the Scene
Understanding/ Counting competencies! This kind of prompt does not challenge the model, or
help it to improve.

Here are some principles that you might consider incorporating into prompts for the Scene
Understanding competency:
● Hierarchical Spatial Reasoning: Eg, “Can you tell which objects are on the floor, which
are on furniture, and which are stacked on each other?”
● Occlusion and Depth Reasoning: Eg, “Which building looks closest to the viewer, and
which looks farthest away?”
● Relational Layout with Directional Anchoring: “Using the tree in the middle as a
reference, where are the animals located around it– left, right, behind, or in front?”
● Symmetry/ Alignment Detection: Eg, “Are the benches lined up even with the
fountain? If not, how are they positioned differently?”
● Navigation and Pathfinding Reasoning: Eg, “If someone walks from the red door to
the green gate, what’s the best clear path they can take? Are there any obstacles?”
● Object Orientation and Facing Direction: Eg, “Who’s facing the camera, who’s turned
sideways, and who has their back to us?”
● Nested Spatial Structures: Eg, “What items are inside other items in this image– like a
spoon in a cup or a cup in a cabinet? Can you describe the full nesting?”
● Motion & Temporal Spatial Inference: Eg, “Looking at the positions of the child and the
ball, who’s likely to reach it first?”

(Note that while the prompt example above does test the model’s ability to understand the
relational layout of the image, it only asks about two objects relative to each other. A better
prompt might ask about the relational layout of multiple objects in the image, relative to a
reference point.)

2. Prompts should use natural-sounding language 🌿

Prompts should draw upon real-world scenarios where you might ask an AI for guidance, using
simple, direct, and natural-sounding language. While sometimes it is necessary to use more
technical language for clarity, prompts should generally sound just as if someone in the real
world is using an AI model to ask a question about a problem that they are encountering, and
how to solve it.

In the same vein, many Attempters have relied heavily on formatting requests to fulfill
Complexity, resulting in unnatural-sounding prompts. Please avoid using formatting requests
in your prompts so that they sound more natural. However, you may still include formatting
requests in your prompts, if they make sense. Do NOT include formatting requests simply to
fulfill Complexity. The Complexity reviewer measure will be changed soon to reflect that
prompts should contain sufficient complexity, without relying on formatting requests.

3. Prompts should be clear and unambiguous! 🎯 🔍

Make sure your prompts are specific and precise in what you are asking for! Vague
prompts can be interpreted in many ways, leading to responses that may be off-target, and thus
difficult to rate. Be as specific as possible about the kind of information that you are
seeking in the response!

4. No Unattainable Requests! 🙅

Prompts should be answerable based on information that is visually provided by the image.
Please do not submit prompts that cannot be answered by information contained in the
image. This includes requests that are not among the options in an image.

The following “unattainable” prompt was submitted in the Patterns competency. However, it is
not possible to identify a correct answer based on the given options:

Prompt Example
|

The correct answer is not found in the second image. The answer is that the white horse
inside of the white diamond is the odd knight.

Additionally, the prompt does not provide any explicit instructions on what the model should do if
none of the options are correct. This lack of clear guidance may mislead the agent if the answer
is unattainable.

You might also like