Unit-4 Part3
Unit-4 Part3
The primary focus has been on indexing, searching, and clustering, rather than information
display.
Display changes in data while maintaining the same representation (e.g., showing new
linkages between clusters).
Enable interactive user input for dynamic movement between information spaces.
Perfect search algorithms achieving near 100% precision and recall have not been realized,
as shown by TREC and other information forums.
Information Visualization helps reduce user overhead by optimally displaying search results,
aiding users in selecting the most relevant items.
Visual displays consolidate search results into a form that is easier for users to process,
though they do not directly answer specific retrieval needs.
Cognitive engineering applies design principles based on human cognitive processes such as
attention, memory, imagery, and information processing.
Research from 1989 highlighted the importance of mental depiction in cognition, showing
that visual representation is as important as symbolic description.
Types of Visualization:
Historical Roots:
o The concept of visualization dates back over 2400 years to Plato, who believed that
perception involves translating physical energy from the environment into neural
signals, interpreted and categorized by the mind. The mind processes not only
physical stimuli but also computer-generated inputs.
o Text-only interfaces simplify user interaction but limit the mind's powerful
information-processing functions, which have evolved over time.
Development of Information Visualization:
o Information visualization emerged as a discipline in the 1970s, influenced by
debates about how the brain processes mental images.
o Advancements in technology and information retrieval were necessary for this field
to become viable.
o One of the earliest pioneers, Doyle (1962), introduced the idea of semantic road
maps to help users visualize the entire database. These maps show items related to
specific semantic themes, enabling focused querying of the database.
o In the late 1960s, the concept expanded to spatial organization of information,
leading to mapping techniques that visually represented data.
o Sammon (1969) implemented a non-linear mapping algorithm to reveal document
associations, further enhancing the spatial organization of data.
Advancements in the 1990s:
o The 1990s saw technological advancements and the exponential growth of available
information, leading to practical research and commercialization in information
visualization.
o The WIMP interface (Windows, Icons, Menus, and Pointing devices) revolutionized
how users interact with computers, although it still required human activities to be
optimized for computer understanding.
o Donald A. Norman (1996) advocated for reversing this trend, stating that
technology should conform to people, not the other way around.
Importance of Information Visualization:
o Information visualization has the potential to enhance the user’s ability to
efficiently find needed information, minimizing resource expenditure and optimizing
interaction with complex data.
Norman's emphasis: To optimize information retrieval, focus on understanding user
interface and information processing, adapting them to computer interfaces.
Challenges with text-only interfaces:
o Text reduces complexity but restricts the more powerful cognitive functions of the
mind.
o Textual search results often present large numbers of items, requiring users to
manually sift through them.
Information visualization aids the user by:
o Reducing time to understand search results and identify clusters of relevant
information.
o Revealing relationships between items, as opposed to treating them as
independent.
o Performing simple actions that enhance search functions.
Fox et al. study findings:
o Users wanted systems that allowed them to locate and explore patterns in
document databases.
o Visual tools should help users focus on areas of personal interest and identify
emerging topics.
o Desired interfaces to easily identify trends and topics of interest.
Benefits of visual representation:
o Cognitive parallel processing of multiple facts and data relationships.
o Provides high-level summaries of search results, allowing users to focus on relevant
items.
o Enables visualization of relationships between items that would be missed in text-
only formats.
Usage of aggregates in visualization:
o Allows users to view items in clusters, refining searches by focusing on relevant
categories.
o Helps correlate search terms with retrieved items, showing their influence on search
results.
Limitations of textual display:
o No mechanism to show relationships between items (e.g., "date" and "researcher"
for polio vaccine development).
o Text can only sort items by one attribute, typically relevance rank.
Human cognition as a basis for visualization:
o Visualization techniques are developed heuristically and are tested based on
cognitive principles.
o Commercial pressures drive the development of intuitive visualization techniques.
o The 1970s debate centered on whether vision is purely data collection or also
involves information processing.
o Arnheim proposed that treating perception and thinking as separate functions leads
to treating the mind as a serial automaton, where perception deals with instances
and thinking handles generalizations.
Visualization as a Process:
o Visualization involves transforming information into a visual form that enables users
to observe and understand it.
o Visual input is treated as part of the understanding process, not as discrete facts.
o Gestalt Psychology: The mind follows specific rules to combine stimuli into a
coherent mental representation:
o Shifting the cognitive load from slower cognitive processes to faster perceptual
systems enhances human-computer interactions.
Example: The visual system can easily detect borders between changes in orientation of the
same object.
Orientation and Grouping:
If information semantics are organized in orientations, the brain's clustering function can
detect groupings more easily than with different objects, assuming orientations are
meaningful.
This uses the feature detectors in the retina for maximum efficiency in recognizing patterns.
Pre-attentive Processing in Boundary Detection:
The preattentive process detects boundaries between orientation groups of the same
object.
Identifying rotated objects, such as a rotated square, requires more effort than detecting
boundaries between different orientations of the same object.
Character Recognition:
When dealing with characters, identifying a rotated character (in an uncommon direction) is
more challenging.
Conscious Processing:
Conscious processing capabilities come into play when detecting the different shapes and
borders, such as in distinguishing between different boundaries in Figure 8.1.
The time it takes to visualize and recognize different boundaries can be measured, showing
the difference between pre-attentive and conscious processing.
A light object on a dark background appears larger than when the object is dark and the
background is light.
This optical illusion suggests using bright colors for small objects in visual displays to
enhance their visibility.
Color Attributes:
o Saturation: The degree to which a hue differs from gray with the same lightness.
Complementary Colors: Two colors that combine to form white or gray (e.g., red/green,
yellow/blue).
Color in Visualization: Colors are frequently used to organize, classify, and enhance features.
o Humans are naturally attracted to primary colors (red, blue, green, yellow), and they
retain images associated with these colors longer.
o However, colors can evoke emotional responses and cause aversion in some people,
and color blindness should be considered when using color in displays.
Depth in Visualization:
Depth can be represented using monocular cues like shading, blurring, perspective, motion,
and texture.
Shading and contrast: These cues are more affected by lightness than by contrast, so using
bright colors against a contrasting background can enhance depth representation.
Innate Depth Perception: Depth and size recognition are learned early in life; for instance,
six-month-old children already understand depth (Gibson-60).
Depth is a natural cognitive process used universally to classify and process information in
the real world.
Configural Clues: These are arrangements of objects that allow easy recognition of high-
level abstract conditions, replacing more concentrated lower-level visual processes.
Example: In a regular polygon (like a square), modifying the sides creates a configural effect,
where visual processing quickly detects deviations from the normal shape.
These effects are useful in detecting changes in operational systems or visual displays that
require monitoring.
Human Visual System: The system is sensitive to spatial frequency, which refers to the
number of light-dark cycles per degree of visual field. A cycle represents one complete light-
dark change.
Sensitivity Limits: The human visual system is less sensitive to spatial frequencies of 5-6
cycles per degree. Higher spatial frequencies are often harder to process, which can reduce
cognitive processing time, allowing faster reactions to motion (e.g., by animals like cats).
Motion and Pattern Extraction: Distinct, well-defined images are easier to detect for motion
or changes than blurred images. Certain spatial frequencies aid in extracting patterns of
interest, especially when motion is used to convey information.
Application in Aircraft Displays: Dr. Mary Kaiser from NASA-AMES is researching
perceptually derived displays, focusing on human vision filters like spatial resolution,
stereopsis, and attentional focus, to improve aircraft information displays.
Learning from Usage: The human sensory system learns and adapts based on usage, making
it easier to process familiar orientations (e.g., horizontal and vertical) than others, which
require additional cognitive effort.
Bright Colors: Bright colors in displays naturally attract attention, similar to how brightly
colored flowers are noticed in a garden, enhancing focus.
Depth Representation: The cognitive system is adept at interpreting depth in objects, such
as the use of rectangular depth for information representation, which is easier for the visual
system to process than abstract three-dimensional forms (e.g., spheres).
Pre-existing Biases: For example, if a user has been working with clusters of items, they may
see non-existent clusters in new presentations. This could lead to misinterpretation.
Past Experiences Influence Interpretation: Users may interpret visual information based on
what they commonly encounter in their daily lives, which may differ from the designer’s
intended representation. This highlights the need to consider context when designing
visualizations to avoid confusion and misinterpretation.
1. Document Clustering:
o Goal: To visually represent the document space defined by search criteria. This
space contains clusters of documents grouped by their content.
o Visualization Techniques: The clusters are displayed with indications of their size
and topic, helping users navigate towards items of interest. This method is
analogous to browsing a library’s index and then exploring the relevant books in
various sections retrieved by the search.
o Benefits: Allows users to get an overview of related documents, helping them to
locate the most relevant items more efficiently.
2. Search Statement Analysis:
o Goal: To assist users in understanding why specific items were retrieved in response
to a query. This analysis provides insights into how search results are related to the
query terms, especially with modern search algorithms and ranking techniques.
o Challenges: Unlike traditional Boolean systems, modern search engines use
techniques like relevance feedback and thesaurus expansion, making it harder for
users to correlate the expanded terms in a query with the retrieved results.
o Visualization Techniques: Tools are used to display the full set of terms, including
additional ones introduced through relevance feedback or thesaurus expansion.
Alongside these terms, the retrieved documents are shown, with indications of how
important each term is in the ranking and retrieval process.
o Benefits: This approach helps users understand the reasoning behind the retrieval
results, aiding in refining queries and improving search accuracy.
Structured databases and link analysis play a crucial role in improving the effectiveness of
information retrieval (IR) systems. Both methods focus on organizing and correlating data in a way
that enhances users' ability to locate relevant documents and understand their context. Here's an
overview of these techniques and their applications in information retrieval:
Purpose: Structured databases are essential for storing and managing citation and semantic
data about documents. This data often includes metadata such as author, publication date,
topic, and other descriptors that help in categorizing and retrieving information.
Role: These structured tiles allow efficient access and manipulation of the data, which is
crucial when dealing with large datasets like academic citations or historical records. A well-
organized database improves the speed and accuracy of search results by ensuring that
information is easily accessible and related queries can be processed more effectively.
Link Analysis:
Purpose: Link analysis examines the relationships between multiple documents, recognizing
that information is often connected across different sources. This technique is particularly
useful when searching for topics that involve multiple interrelated documents.
Example: A time/event link analysis can correlate documents discussing a specific event,
such as an oil spill caused by a tanker. While individual documents may discuss aspects of
the spill, the link analysis identifies the temporal relationships between documents,
revealing patterns or sequences of events that are critical but may not be explicitly
mentioned in any single document.
Hierarchical structures are commonly used to represent information that has inherent relationships
or dependencies, such as time, organization, or classification systems. These structures are
especially useful when representing large datasets or complex topics. The following techniques are
designed to help users visualize and navigate hierarchical data more easily:
1. ConeTree:
o Advantages: The Cone-Tree allows users to quickly visualize the entire hierarchy,
making it easier to understand the size and relationships of different subtrees.
Compared to traditional 2D node-and-link trees, the Cone-Tree maximizes the
available space and provides a more intuitive perspective of large datasets.
2. Perspective Wall:
o Description: The Perspective Wall divides information into three visual areas, with
the primary area in focus and the others out of focus. This creates a layered
representation of the data, allowing users to focus on one part of the hierarchy
while keeping the context of the surrounding information visible.
o Advantages: This technique helps maintain an overview of the data while also
allowing in-depth exploration of a specific area. It's particularly useful when users
need to keep track of large datasets without losing context.
3. TreeMaps:
o Advantages: TreeMaps make optimal use of available screen space and provide an
efficient way to visualize large hierarchies. They are especially useful for displaying
categories that can be grouped into subcategories, such as topics within a document
collection.
When data has network-type relationships (i.e., items are interrelated), cluster-based approaches
can visually represent these relationships. Semantic scatterplots are a common method for
visualizing clustering patterns:
Vineta and Bead Systems: These systems display clustering using three-dimensional
scatterplots, helping users visualize the relationships between documents or items within a
multidimensional space.
2. Embedded Coordinate Spaces: Feiner’s “worlds within worlds” approach suggests that
larger coordinate spaces can be redefined with subspaces nested inside other spaces,
allowing for better organization of multidimensional data.
3. Other Methods: Techniques such as semantic regions (Kohonen), linked trees (Narcissus
system), or non-Euclidean landscapes (Lamping and Rao) aim to tackle the issue of
visualizing complex multidimensional data.
Search-Driven Visualizations:
Information Crystal: A visualization technique inspired by Venn diagrams, which helps users
detect patterns of term relationships in a search result file (referred to as a Hit file). It
constrains the data within a specific search space, allowing the user to better understand
how search terms relate to the results.
An important aspect of visualization is helping users refine their search statements and understand
the effects of different search terms. The challenge with systems that use similarity measures (e.g.,
relevance scoring) is that it can be difficult for users to discern how specific terms impact the
selection and ranking of documents in their search results. Visualization systems aim to address this
by:
Graphical Display of Item Characteristics: A visualization tool can display the characteristics
(such as term frequency, relevance, or weight) of the retrieved items to show how search
terms influenced the results.
VIBE System: A system designed for visualizing term relationships by spatially positioning
documents in a way that shows their relevance to specific query terms. This allows users to
see the term relationships and the distribution of documents based on their relevance to the
search terms.
Self-Organization and Kohonen’s Algorithm: Lin has further enhanced this self-organizing
approach by applying Kohonen’s algorithm to automatically generate a table of contents
(TOC) for documents, which is displayed in a map format. This technique helps visualize
document groupings based on their content.
Envision System Overview:
The Envision system aims to present a user-friendly, graphical representation of search results,
making it accessible even on a variety of computer platforms. The system features three interactive
windows for displaying search results:
1. Query Window: This window provides an editable version of the user's query, allowing easy
modification of the search terms.
2. Item Summary Window: This window shows bibliographic citation information for items
selected in the Graphic View Window.
3. Graphic View Window: This window is a two-dimensional scatterplot, where each item in
the search results (Hit file) is represented by an icon. Items are depicted as circles (for single
items) or ellipses (for clusters of multiple items). The X-axis represents the estimated
relevance, and the Y-axis shows the author’s name. The weight of each item or cluster is
displayed below the icon or ellipse, and selecting an item reveals its bibliographic
information.
While the system is simple and user-friendly, it can face challenges when the number of items or
entries becomes large. To address this, Envision plans to introduce a "zoom" feature to allow users
to view larger areas of the scatterplot at lower detail.
Relevance and Cluster Representation: The system uses scatterplot graphs where circles
represent individual items, and ellipses represent clusters of items that share similar
attributes or relevance scores. This allows users to visually assess both individual items and
groups of related documents.
Interactive User Experience: Selecting an item in the Graphic View Window not only
highlights the item but also updates the Item Summary Window with detailed bibliographic
information. This interactive system enhances user engagement and ease of navigation.
A similar technique, though differing in format, is used by Veerasamy and Belkin (1996), which
involves a series of vertical columns of bars:
Rows Represent Index Terms: Each row corresponds to an index term (word or phrase) used
in the search.
Bar Height Indicates Term Weight: The height of the bar in each column shows the weight
of the corresponding index term in the document represented by that column.
Relevance Feedback: This approach also displays terms added by relevance feedback,
helping users identify which terms have the most significant impact on the retrieval of
specific documents.
1. Identify Key Terms: Quickly determine which search terms were most influential in
retrieving a specific document by scanning the columns.
2. Refine Search Terms: See how various terms contributed to the retrieval process, making it
easier to adjust query terms that are underperforming or causing false hits. Users can
remove or reduce the weight of terms that aren't contributing positively to the results.
several advanced visualization techniques used in commercial and research systems for document
analysis and retrieval. These systems are designed to enhance the user’s ability to understand and
navigate complex information spaces. Here's a breakdown of the different systems and approaches
described:
Visualization: DCARS displays query results as a histogram where each row represents an
item, and the width of a tile bar indicates the contribution of each term to the selection.
This provides insight into why a particular item was found.
User Interface: The system offers a friendly interface but struggles with allowing users to
effectively modify search queries to improve results. While it shows term contributions, it’s
challenging for users to use this information to refine or improve their search statements.
Cityscape Visualization:
Structure: These skyscrapers are interconnected with lines that vary in representation to
show interrelationships between concepts. The colors or fill designs of buildings can add
another layer of meaning, with buildings sharing the same color possibly representing
higher-level concepts.
Zooming: Users can move through the cityscape, zooming in on specific areas of interest,
and uncovering additional structures that may be hidden in other views.
Library Metaphor:
User-Friendly Navigation: Another widely understood metaphor is the library. In this model,
the information is represented as a space within a library, and users can navigate through
these areas.
Viewing Information: Once in an information room, users can view virtual “books” placed on
a shelf. After selecting a book, the user can scan related items, with each item represented
as a page within the book. The ability to fan the pages out allows users to explore the
content in a spatial manner. This is exemplified by the WebBook system.
Clustering and Statistical Term Relationships:
Matrix Representation: When correlating items or terms, a large matrix is often created
where each cell represents the similarity between two terms or items. However, displaying
this matrix in traditional table form is impractical due to its size and complexity.
various information visualization techniques used in different types of systems for analyzing and
interacting with large datasets, focusing on text, citation data, structured databases, hyperlinks,
and pattern/linkage analysis. Below is a breakdown of the systems and concepts described:
SeeSoft System: Created by AT&T Bell Laboratories, this system visualizes software code by
using columns and color codes to show changes in lines of code over time. This allows
developers to easily track modifications across large codebases.
DEC FUSE/SoftVis: Built on SeeSoft's approach, this tool provides small pictures of code files
with size proportional to the number of lines. It uses color coding (e.g., green for comments)
to help visualize the structure and complexity of code modules.
TileBars Tool: Developed by Xerox PARC, this tool visualizes the distribution of query terms
within each item in a Hit file, helping users quickly find the most relevant sections of
documents.
Query By Example: IBM’s early tool for structured database queries used a two-dimensional
table where users defined values of interest. The system would then complete the search
based on these values.
IVEE (Information Visualization and Exploration Environment): This tool uses three-
dimensional representations of structured databases. For example, a box representing a
department can have smaller boxes inside it representing employees. The system allows for
additional visualizations like maps and starfields, with interactive sliders and toggles for
manipulating the search.
HomeFinder System: Used for displaying homes for sale, this tool combines a starfield
display of homes with a city map to show the geographic location of each home.
Navigation and Hyperlink Visualization:
Navigational Visualization: Hyperlinks in information retrieval systems can lead users to feel
"lost in cyberspace." One solution is to provide a tree structure visualization of the
information space. MITRE Corporation has developed a tool for web browsers that visually
represents the path users have followed and helps them navigate effectively.
Pathfinder Project: A system sponsored by the Army that incorporates various visualization
techniques for analyzing patterns and linkages. It includes several tools:
o Document Browser: Uses color density in text to indicate the importance of an item
relative to a user's query.
o CAMEO: Models analytic processes with nodes and links, where the color of nodes
changes based on how well the found items satisfy the query.
o Counts: Plots statistical information about words and phrases over time, helping
visualize trends and developments.
o CrossField Matrix: A two-dimensional matrix that correlates data from two fields
(e.g., countries and products), using color to represent time span and showing how
long a country has been producing a specific product.
The use of maps and geographic data is another area of information visualization.
Geographic Information Systems (GIS) provide a way to visualize and analyze data in the
context of geographic locations.
SPIRE tool:
Information Visualization: A technique used to represent complex data visually to help users
understand relationships, trends, and important concepts.
Metaphors for Aggregation: Visualization often uses metaphors like peaks, valleys, or cityscapes to
highlight major concepts and relationships, providing users with an overview before diving into
details.
Cognitive Engineering: Visual representation must consider users' cultural and physiological
differences, such as color blindness or familiarity with certain environments, to ensure the
metaphor is understandable.
Complexity of Search Algorithms: As search algorithms evolve, the role of visualization will expand
to clarify not just the retrieved information but also the relationships between the search statement
and the items.
Hypertext Linkages: The growth of hyperlinks in information systems will require new visualization
tools to help users understand the network of relationships between linked items.
Technical Limitations:
Interactive Clustering: Users expect the ability to interact with clusters, drilling down from high-level
to more granular levels with increasing precision.
Real-time Processing: The system must provide near-instant feedback (e.g., under 20 seconds) to
meet user expectations for speed.
Key Challenges:
2. Increased Precision: As users drill deeper into clusters, the system needs to display results
with higher precision, often requiring natural language processing for better understanding
and categorization.