0% found this document useful (0 votes)
30 views

Enhancing Multi-Modal Perception and Interaction - An Augmented Reality Visualization System For Complex Decision Making

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Enhancing Multi-Modal Perception and Interaction - An Augmented Reality Visualization System For Complex Decision Making

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

systems

Article
Enhancing Multi-Modal Perception and Interaction: An
Augmented Reality Visualization System for Complex
Decision Making
Liru Chen 1 , Hantao Zhao 1,2, *, Chenhui Shi 1 , Youbo Wu 1 , Xuewen Yu 1 , Wenze Ren 1 , Ziyi Zhang 3
and Xiaomeng Shi 4

1 School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China;
[email protected] (L.C.)
2 Purple Mountain Laboratories, Nanjing 211189, China
3 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
4 School of Transportation, Southeast University, Nanjing 211189, China
* Correspondence: [email protected]

Abstract: Visualization systems play a crucial role in industry, education, and research domains by
offering valuable insights and enhancing decision making. These systems enable the representation
of complex workflows and data in a visually intuitive manner, facilitating better understanding,
analysis, and communication of information. This paper explores the potential of augmented
reality (AR) visualization systems that enhance multi-modal perception and interaction for complex
decision making. The proposed system combines the physicality and intuitiveness of the real world
with the immersive and interactive capabilities of AR systems. By integrating physical objects
and virtual elements, users can engage in natural and intuitive interactions, leveraging multiple
sensory modalities. Specifically, the system incorporates vision, touch, eye-tracking, and sound as
multi-modal interaction methods to further improve the user experience. This multi-modal nature
enables users to perceive and interact in a more holistic and immersive manner. The software and
hardware engineering of the proposed system are elaborated in detail, and the system’s architecture
and preliminary function testing results are also included in the manuscript. The findings aim to
Citation: Chen, L.; Zhao, H.; Shi, C.; aid visualization system designers, researchers, and practitioners in exploring and harnessing the
Wu, Y.; Yu, X.; Ren, W.; Zhang, Z.; Shi, capabilities of this integrated approach, ultimately leading to more engaging and immersive user
X. Enhancing Multi-Modal Perception
experiences in various application domains.
and Interaction: An Augmented
Reality Visualization System for
Keywords: visualization systems; AR visualization systems; multi-modal perception and interaction;
Complex Decision Making. Systems
user experience
2024, 12, 7. https://doi.org/10.3390/
systems12010007

Academic Editor: William T. Scherer

Received: 15 November 2023


1. Introduction
Revised: 21 December 2023 Visualization systems can provide valuable insights and aid in decision-making pro-
Accepted: 22 December 2023 cesses [1]; therefore, they have become indispensable tools in various domains including
Published: 25 December 2023 industry, education, and research. These systems enable the representation of complex
workflows and data in a visually intuitive manner, enhancing understanding, analysis, and
communication of information [2]. However, traditional visualization systems mainly rely
on visual perception, which limits users’ ability to fully participate in data and limits the
Copyright: © 2023 by the authors.
potential of the system.
Licensee MDPI, Basel, Switzerland.
To overcome these limitations, researchers have explored the integration of augmented
This article is an open access article
distributed under the terms and
reality and visualization systems to enhance the user experience and improve multi-modal
conditions of the Creative Commons
perception and interaction. The AR system overlays virtual elements into a real-world
Attribution (CC BY) license (https://
environment, creating an immersive interactive experience [3]. On the other hand, tangible
creativecommons.org/licenses/by/ objects involve physical objects that the user can directly observe and touch, and the
4.0/). visualization feature allows them to directly adjust to real-world objects to improve their

Systems 2024, 12, 7. https://doi.org/10.3390/systems12010007 https://www.mdpi.com/journal/systems


Systems 2024, 12, 7 2 of 24

ability to face complex decisions. By combining these two methods, users can utilize
multiple sensory modes for natural and intuitive interaction. Previous tangible user
interface (TUI) study explored the “tangible virtual interaction” between tangible Earth
instruments and virtual data visualization and proposed that head-worn AR displays
allow seamless integration between virtual visualization and contextual tangible references
such as physical Earth instruments [4]. In addition to enhancing the user experience, the
integration of AR and visualization systems also brings benefits in terms of accessibility and
inclusivity. Users with motion impairments can use their body posture and movements to
manipulate virtual objects, enabling them to interact more effectively with virtual elements,
thereby overcoming the limitations of traditional input devices.
This study proposed an advanced augmented reality visualization system that in-
corporates multi-modal perception and interaction methods. This cutting-edge system
seamlessly integrates virtual elements into the real-world environment, enhancing users’
interaction with their surroundings. By employing various multi-modal interaction meth-
ods, including visual, tactile, and auditory, users can easily identify and engage with virtual
elements superimposed onto their physical reality. The system also enables interactive
feedback, allowing users to physically interact with virtual objects, enhancing the overall
sense of realism. In addition, our system incorporates eye-tracking technology, which
provides a more intuitive and natural interactive visualization, increasing a certain degree
of convenience.
The rest of this article is organized as follows. Firstly, we conducted a literature review
on visualization systems, AR technology, and virtual user interfaces. Then, we introduced
the AR visualization system for improving the user experience, including system archi-
tecture and functional design. Subsequently, the results of AR memory eye-tracking data
using the system were presented. Finally, the data analysis and functionality of the system
were discussed, as well as its limitations and future research prospects. Overall, the system
described in this article provides a deep understanding of the integration aspects of AR vi-
sualization systems, showcasing their functionality and potential applications. By exploring
and harnessing the capabilities of this integrated approach, we can unlock new possibilities
for enhancing multi-modal perception and interaction, ultimately revolutionizing the way
we interact with visualized data and workflows.

2. Literature Review
2.1. Visualization Systems
Visualization systems play a crucial role in aiding the comprehension and analysis
of data [5]. These systems allow users to transform raw data into visual representations,
providing a more intuitive and interactive way to explore and understand information.
Visualization systems offer numerous benefits that contribute to their widespread adoption
in various domains. One of the primary advantages is the ability to uncover patterns
and relationships that may not be apparent in raw data. By presenting data in a visual
form, users can easily identify trends, outliers, and correlations, leading to more informed
decision-making processes [6].
Visualization systems apply in a wide range of domains, including scientific research,
business analytics, and healthcare. In scientific research, visualization systems have been
instrumental in understanding complex phenomena, such as environmental monitoring [7].
In business analytics, visualization systems are used for communication, information seek-
ing, analysis, and decision support [8]. In healthcare, visualization systems aid in electronic
medical records and medical decision making, enhancing patient care and outcomes [9].
As technology continues to advance, visualization systems are expected to evolve in
various ways. One emerging trend is the integration of virtual reality (VR) and augmented
reality technologies into visualization systems [10]. In the context of augmented reality,
visualization systems can leverage the capabilities of AR technology to present data in a
more intuitive and context-aware manner. Various visualization techniques, such as 3D
models, graphs, charts, and spatial layouts, have been explored to enhance data exploration
Systems 2024, 12, 7 3 of 24

and understanding. Martins [11] proposed a visualization framework for AR that enhances
data exploration and analysis. The framework leverages the capabilities of AR to provide
interactive visualizations in real-time, allowing users to manipulate and explore data
from different perspectives. By combining AR with visualization techniques, users can
gain deeper insights and make more informed decisions. Additionally, there has been a
growing interest in collaborative visualization systems using AR. Chen [12] developed a
collaborative AR visualization system that enables multiple users to interact and visualize
data simultaneously. The system supports co-located and remote collaborations, enhancing
communication and understanding among users.
In recent years, there has been a growing interest in incorporating multi-modal feed-
back, including visual and tactile cues, to create more immersive and intuitive experiences.
Haptic feedback has been explored as an essential component of visualization systems to
provide users with a tactile sense of virtual objects. Haptic feedback can enhance the user’s
perception of shape, texture, and force, allowing for a more realistic and immersive experi-
ence. Several studies have investigated the integration of haptic feedback into visualization
systems, such as the use of force feedback devices [13] or vibrotactile feedback [14]. These ap-
proaches enable users to feel and manipulate virtual objects, enhancing their understanding
and engagement with the data. In summary, the integration of multi-modal perception and
interaction in visualization systems, particularly through the use of augmented reality and
user interfaces, has been an active area of research. Previous studies have demonstrated the
benefits of combining visual and tactile feedback to create more immersive and intuitive
experiences [14]. The visualization system proposed in this study presents data that are
challenging to visualize in text or daily life, such as a certain range of information. It utilizes
a 3D format, in contrast to previous information that is entirely virtual or detached from
associated equipment. This study provides timely and reliable information assistance in
user decision making by mapping virtual information onto tangible physical objects and
through multi-modal feedback, including visual and auditory cues.

2.2. Augmented Reality Technology


Augmented reality is a technology that overlays virtual information onto the real
world, enhancing the user’s perception and interaction with the environment [15]. This in-
tegration is achieved through the use of computer vision techniques, tracking technologies,
and display devices. Azuma [16] introduced the concept of AR as a combination of real
and virtual environments, where virtual objects are seamlessly integrated into the physical
world. AR enables users to perceive and manipulate virtual objects in a real-world context,
leading to improved spatial understanding and an enhanced user experience.
AR has gained significant attention in various domains, including education [17],
healthcare [15], and entertainment [18]. Several studies have explored the benefits and chal-
lenges of AR in different applications [19,20]. In recent years, advancements in hardware,
such as smartphones and head-mounted displays (HMDs), have made AR more accessible
and widely adopted. HMDs, like Microsoft HoloLens and Magic Leap, provide immersive
experiences by overlaying virtual objects directly into the user’s field of view [21]. These
devices offer a wide range of possibilities for visualization and interaction in AR systems.
Researchers have explored different AR techniques [22–24] to improve the user’s visual per-
ception and engagement. Studies have shown that AR can provide a more immersive and
interactive experience by combining virtual objects with real-world surroundings, offering
opportunities for enhanced learning, training, and decision-making processes [25]. How-
ever, some challenges need to be addressed for the successful implementation of AR. One
major challenge is the accurate and robust tracking of the user’s position and orientation in
real time [26]. Various tracking techniques, such as marker-based [27], sensor-based [28],
and Simultaneous Localization and Mapping (SLAM) [29], have been developed to over-
come this challenge. Our system is designed for versatility and scalability, applicable across
various fields such as industry, education, and healthcare. This universal design enhances
Systems 2024, 12, 7 4 of 24

the practical value of our system, addressing diverse user needs in different domains and
increasing its utility and potential for widespread application.
Another challenge is the design and development of intuitive and natural user in-
terfaces for AR systems. Traditional input devices, such as keyboards and mice, may not
be suitable for AR interactions. Therefore, researchers have explored alternative input
methods, including gesture recognition and voice commands, to enhance user engagement
and interaction [30–32]. However, a single gesture recognition or voice command may only
provide limited interaction options, limit the user’s operating methods, and have reliability
and accuracy issues, while lacking diversity and flexibility. Therefore, multiple interaction
methods should be provided in AR applications to ensure the widespread adoption and
successful implementation of AR technology in various applications.
To address these challenges and provide users with a richer and more immersive
experience, this study combines AR and visualization systems. By integrating visual,
auditory, and tactile multi-modal perception and interaction, AR applications can offer a
more comprehensive and engaging user experience. This approach expands the possibilities
for interaction and enhances the user’s ability to manipulate and explore virtual objects in
the real world.

2.3. Virtual User Interfaces


Virtual User Interfaces, including tangible user interfaces, provide a physical and tangible
means for users to interact with digital information [33]. TUIs enable users to manipulate virtual
objects or control digital systems through physical artifacts or objects [34]. This interaction
paradigm enables users to leverage their existing knowledge and skills to manipulate and
participate in digital content in a more natural and meaningful way, making technology
easier to use and user-friendly [35].
Unlike traditional graphical user interfaces (GUIs), TUIs provide a more embodied
and tangible interaction experience by utilizing physical objects as input and output de-
vices. These physical objects, also known as “affordances”, are designed to represent and
convey digital information in a perceptible and manipulable form [36]. TUIs offer several
benefits over traditional interaction methods. One of the key advantages is their ability to
leverage humans’ innate physical and sensorimotor skills, enabling a more natural and
intuitive interaction. By providing physical objects that users can grasp, touch, and move,
TUIs engage multiple senses and enhance the user’s spatial awareness and cognitive en-
gagement [37]. Additionally, TUIs facilitate a more tangible and embodied understanding
of digital information, as users can directly manipulate and explore physical objects that
represent abstract data [34].
TUIs offer multi-modal feedback and intuitive manipulation of virtual objects, making
them suitable for AR environments. Research has shown that TUIs in AR visualization
systems can enhance user collaboration, spatial cognition, and overall user experience.
Sketched Reality [32] combines AR technology and TUI technology to achieve bidirectional
interaction through tactile feedback and physical interaction. This bidirectional interaction
method enables users to feel the existence of virtual objects more realistically, enhancing
the immersion and interactivity of AR applications. Ubi Edge is an edge-based augmented
reality touchable user interface authoring tool. This system allows users to control aug-
mented reality elements by sliding or clicking on the edges of physical objects. For example,
users can change the color of virtual light bulbs by sliding on the edge of a coffee cup or
activate AR shooting animations by clicking on the edge of a toy airplane. These examples
demonstrate the potential and application of multi-modal perception TUI in augmented
reality environments [38]. The combination of eye-tracking and user interaction feedback is
a promising development direction. Utilizing the fixation point of the eyes, the system can
discern the user’s intention and offer corresponding feedback. By embedding interactive
objects in the TUI, when the user gazes at a specific object, the system can detect the user’s
fixation point through eye-tracking technology and provide relevant feedback, enabling
control through pseudo-ideation. This enables users to tailor interaction methods based
Systems 2024, 12, 7 5 of 24

on preferences, fostering a tighter connection between the system and users, ultimately
enhancing user satisfaction and improving overall experiences.

3. Materials and Methods


3.1. System Framework
The study proposed an Augmented Reality Visualization System with multi-modal
perception and interaction, aiming to elevate the capabilities of this integrated approach.
The system is developed by combining Unity, HoloLens, and the Augmented Reality
Toolkit. By leveraging these technologies, we aim to provide more intuitive, accurate, and
comprehensive support for complex decisions. The system consists of several modules,
each playing a crucial role in achieving our goals. These modules include the member
management module, the augmented reality interface module, the user behavior interac-
tion module, the eye-tracking data acquisition module, and the AR experimental process
management module, as shown in Figure 1.

Performance Layer

Multi-modal perception
User interface Physical mapping
and interaction

Business Layer

Member Registration and Information Privilege


management module login management management

Augmented reality Visualization


Device location Sensing range
of data
interface module and configuration visualization
transmission

User behavior Multimodal Interaction


Interactive objects
interaction module sensing feedback

Eye movement data Eye movement


Gaze point
Eye tracking heatmap
acquisition module recording
generation

AR experiment process Experiment Experiment Experiment flow


management module design execution management

Data Layer

Configuration data Eye tracking data Debugging data

Figure 1. The main design and implementation module of the system. The illustration integrates
three components: performance layer, business layer, and data layer.

• Member management module: This module is a comprehensive system that in-


cludes system tutorials, system experiments, data recording, and data processing and
analysis. Participants can familiarize themselves with augmented reality systems
through this module, conduct interactive experiments, and record real-time data. The
Systems 2024, 12, 7 6 of 24

module provides a foundation for analyzing the behavior and attention distribution
of participants, ensuring the accuracy and reliability of experimental results.
• Augmented reality interface module: This module provides researchers with a user-
friendly and reliable platform for conducting experiments and refining AR experiences.
It utilizes Unity, HoloLens device, Vuforia platform, and the Mixed Reality Toolkit
to create immersive AR scenes, seamlessly integrating virtual objects into real envi-
ronments enabling device locomotion-based virtual content tracking, specific image
recognition, and various interaction modalities. This integration establishes a uni-
fied framework, enhancing the overall cohesion and functionality of the augmented
reality system.
• User behavior interaction module: This module enables users to interact with the
augmented reality environment through various input methods, including voice
commands, gestures, and user interfaces. It provides a flexible and intuitive way for
users to manipulate virtual objects and navigate the system.
• Eye-tracking data acquisition module: This module stores the aggregated gaze data
locally, providing spatial location and timing information for subsequent statistical
analyses. This accurate and convenient platform offers researchers valuable insights
into users’ visual behavior patterns and interface design issues in virtual environments.
• AR experiment process management module: This module ensures the smooth
execution and management of augmented reality experiments. It provides tools for
designing and conducting experiments, collecting data, and managing experimental
processes, helping to improve the efficiency and accuracy of experiments while also
promoting the work of researchers.
To implement this system, the following software and hardware configurations are
required: the Unity development platform, HoloLens headset, and Augmented Reality
Toolkit. Unity is a powerful game engine and development platform that enables the
creation of interactive and immersive experiences, serving as the foundation for developing
the augmented reality visualization system. The HoloLens is a wearable mixed reality
device developed by Microsoft that combines virtual reality and augmented reality capabili-
ties, allowing users to interact with virtual objects in the real world. The Augmented Reality
Toolkit is a software library that provides tools and resources for developing augmented
reality applications, including features for 3D object recognition, tracking, and interaction,
which are essential for our system’s functionality.
After the scene data and system settings are completed, the system enhances multi-
modal perception and interaction by incorporating various modes of interaction. Users
wear HoloLens glasses to access the AR scene, and the system recognizes device informa-
tion through Vuforia scanning. Users can navigate and interact using voice commands,
and hand gestures are detected for precise manipulation. Physical props can also be used
to interact with virtual objects. Eye-tracking data are recorded for analysis, and an experi-
mental process management module streamlines the evaluation and improvement of the
system. This comprehensive approach improves the user experience and usability.

3.2. Member Management Module


The member management module is the foundation of the business process control
mechanism. This module can provide clear guidance and assistance to the participating
members, leading them to fully engage in the experimental environment of this augmented
reality visualization system. The member management module mainly includes four key
parts: system tutorial, system experiment, data recording, and data processing and analysis,
as shown in Figure 2.
Systems 2024, 12, 7 7 of 24

System tutorial

HoloLens operation Member


guide
AR interactive
control
System Interface display &
experiment switch

Scene control
Equipment control

Eye gaze

AR System Voice interaction Data processing


Data recording
& analysis
Management
control Video record File read
Experimental
Gaze time
playback
Heatmap
Coordinate angle
processing
Gaze trajectory Staring time
record analysis

Figure 2. Member management module. The illustration integrates four components: system tutorial,
system experiment, data recording, and data processing and analysis.

• System tutorial: Before conducting the visualization system experiment, participating


members first undergo system tutorial learning. Through the system tutorial, partici-
pants can gain a detailed understanding of the operational steps and processes in the
augmented reality experimental environment using the Hololens device. The system
tutorial aims to provide necessary guidance, enabling participants to familiarize them-
selves with the system’s functionality and interaction methods, and ensuring their
correct usage of the system for subsequent experiments.
• System experiment: After completing the system tutorial, participants enter the
formal system experiment phase. Participating members interact with the augmented
reality system scenario through actions such as clicking, gazing, and voice commands.
The experiment design allows participants to freely explore the system’s features and
characteristics, collecting data during the experiment.
• Data recording: During the experiment, the system can record real-time experimental
data of participating members. This includes recording system interaction videos, eye
gaze coordinates, gaze duration, and eye gaze trajectory heatmaps. Accurate recording
of participants’ behavior and attention focus provides a necessary foundation for
subsequent data analysis.
• Data processing and analysis: After the experiment, the experimental data for each
participant are processed and analyzed. This includes experiment replays, analysis of
participants’ gaze data, and plotting scatter diagrams representing participants’ eye
gaze ranges. Through statistical analysis and visualization techniques, the behavior
patterns and attention distributions of participants during the experiment can be
revealed, supporting further analysis and conclusions.
The member management module of this system ensures the controllability and re-
peatability of the augmented reality visualization system experiment, ensuring the accuracy
and reliability of the experimental results. Additionally, valuable empirical data and refer-
ences are provided for future research work and improvements in system performance.
Systems 2024, 12, 7 8 of 24

3.3. Augmented Reality Interface Module


The AR interface module acts as a conduit between the system and the HoloLens
device, augmenting multi-modal perception and interactivity for a more immersive user
experience. It brings virtual reality scenes into users’ actual visual field by overlaying
AR content onto the HoloLens headset display. The module offers researchers a reliable,
user-friendly HoloLens research platform, enabling concentration on experimental design
and AR experience refinement without contending with convoluted technical intricacies.
To achieve this, the module capitalizes on Unity’s integration with HoloLens to render and
showcase captivating AR scenes. As shown in Figure 3, the AR interface module serves
well as a bridge between the system and the device.

System

Voice
Engine Visual
Gesture

Eye
Scene Vuforia
Air Tap

Device
MRTK Unity locomotion

AR
Interface

Hololens Device

Figure 3. Composition of the Augmented Reality Interface Module. Unity and MRTK provide
technical support for the AR interface to achieve scene construction and multi-modal interaction,
connecting systems and devices through the AR interface, offering a reliable and convenient tool for
AR research and development.

Unity, as a prevalent cross-platform game engine, can constitute the primary devel-
opment framework. Its abundant tools and engine support facilitate the crafting of 3D
scenes and user interfaces. Unity is leveraged to create and render AR scenes encompassing
virtual objects, 3D models, and user interface elements. Its robust graphics engine assimi-
lates virtual content into the HoloLens headset, while the device’s innate spatial mapping
and gesture recognition integrate virtual objects seamlessly into real environments for
remarkably authentic AR experiences.
The module’s development harnessed the Mixed Reality Toolkit (MRTK), an open-
source toolkit furnishing fundamental components and features to streamline cross-platform
AR application development. The module buttresses diverse interaction modalities, includ-
ing gesture control, air tap, voice commands, and eye-tracking. Catering to varied research
requirements, it offers flexible customization capabilities to introduce novel virtual objects
and adjust scene layouts while facilitating the storage and visualization of user behavior
data for analysis.
In order to enrich the AR function of the system and device, we also added device
locomotion features to the AR Interface Module. The device locomotion features in AR
systems can track real objects and overlay virtual content on them to enhance interactivity
and tangibility through recognition and positioning methods. In this system, we primarily
utilize Vuforia’s scanning capability to implement device locomotion. Vuforia is a cross-
Systems 2024, 12, 7 9 of 24

platform AR application development platform with robust tracking and performance on


various hardware, including mobile devices and mixed-reality head-mounted displays
(HMDs), such as Microsoft HoloLens [39].
In this system, the objects to be recognized are imported into the Vuforia recognition
library to generate a corresponding Unity package with star ratings reflecting recognition
quality. The Unity package is then imported into Unity. In Unity, the Vuforia is set up, and
the AR camera is called to interact with the objects to be recognized. When a real-world
object is recognized, virtual content is bound to it, and users can interact through touch,
rotation, tilt, or other gestures. Specifically, Vuforia’s scanning function first performs
image recognition. The user uses the camera on the mobile device to scan and recognize
specific images, logos, objects, or scenes from the real world. These images are usually
specific patterns or markers used to determine the user’s position and orientation. Next,
Vuforia extracts visual feature points such as corners and edges from the recognized images.
These feature points are used to build a feature database for matching virtual content to
physical objects. By matching the real-time image against feature points in the feature
database, Vuforia tracks the position and orientation of the user’s device in real-time. This
ensures the alignment of virtual content with the physical world. Finally, once the user’s
position and orientation are determined, Vuforia overlays the virtual content in the user’s
view aligned with the physical object, using rendering techniques to ensure consistency of
lighting, perspective, and scale between the virtual and real world.

3.4. User Behavior Interaction Module


User behavior involves multi-modal perception and interaction, including clicking,
voice, gesture recognition, physical manipulation, etc. It needs to be integrated with the
AR module to enable users to observe virtual information and interact with it on mobile
devices. This provides users with a more intuitive and efficient interactive experience. The
module consists of three main parts: multi-modal perception, interactive objects, and a
feedback mechanism, as shown in Figure 4.

Observation
Visual
Feedback Hovering

mechanism Touch start


Auditory
Touch end

Trigger events
Button
Interactive Instant event

objects Selection
Physical mapping
Movement

Touch 3D content
Clicking
Real-time behavior

Multimodal Remote operation


Ray
perception Tap in the air

Directly command
Voice
No gesture

Figure 4. Composition of user behavior interaction module. The illustration integrates three compo-
nents: multi-modal perception, interactive objects, and a feedback mechanism.
Systems 2024, 12, 7 10 of 24

In multi-modal perception, the module recognizes user input methods such as voice,
gestures, and touch to capture real-time behavior and needs. This allows users to select 3D
content by clicking or using a ray emitted from their hand. Interactive objects in the 3D
world can trigger events, such as touching buttons and 3D objects, allowing users to directly
interact with the system through wearable devices. The feedback mechanism provides
users with timely feedback on their operations. This can be visual, such as highlighting and
finger cursor feedback, or auditory, with sound effects at different user selection statuses
(including observation, hovering, touch start, touch end, etc.).
By combining these modules, the system offers users a highly intelligent and person-
alized interactive experience. By integrating various input methods, users can interact
with virtual information more intuitively and efficiently. Additionally, the inclusion of
interactive objects and a feedback mechanism ensures that users receive timely and infor-
mative feedback on their actions, further enhancing their understanding and control of
the system. Overall, the multi-modal perception and interaction module greatly enhances
the interactive experience and enables users to effectively collaborate and innovate in
real-world scenarios.

3.5. Eye-Tracking Data Acquisition Module


The eye-tracking data acquisition module capitalizes on the integrated eye-tracking
system of the HoloLens augmented reality headset to gather gaze data required by re-
searchers through customized eye-tracking scripts. The system encompasses dedicated
eye-tracking cameras and sensors that enable high-fidelity, low-latency tracking, along
with automated pupil finding and head movement compensation. This module consists of
a data collection sub-module and a data processing sub-module.

3.5.1. Data Collection Methods


The data collection sub-module activates when users interact with the augmented
reality environment, producing real-time heat maps based on user gaze patterns. It overlays
these patterns on augmented reality objects and UI elements to reflect user interactions. It
continuously seizes the 3D spatial coordinates of users’ gaze points in the augmented reality
scene to dissect visual exploration behavior. The module pinpoints specific elements and
areas attended to by users during interactions, gauging the time users spend looking at them
to evaluate the appeal and cognitive load, with more prolonged gaze duration typically
betokening greater interest or cognitive load. Furthermore, the module investigates gaze
point sequences to gain insights into users’ information processing tactics and attention
distribution in the augmented reality environment.

3.5.2. Data Processing Methods


The data processing sub-module stores the aggregated gaze data locally on the
HoloLens device, containing spatial location and timing information. Location information
logs the x, y, and z coordinates of users’ gaze points during all interactions with augmented
reality objects and UI elements, along with corresponding timestamps. This enables the
generation of scatter plots, scan paths, and areas of interest based on aggregated gaze points
over time. Timing information embodies the duration users spent looking at various AR
interface components and augmented reality objects throughout the interaction. Contrast-
ing total gaze times on different interface elements can identify areas needing optimization
to refine the user experience. Gaze duration furnishes quantitative temporal insights into
visual information processing during AR interactions. The preserved eye-tracking data fa-
cilitates subsequent statistical analyses to uncover users’ visual behavior patterns, interface
design issues, and more. This HoloLens-based eye-tracking approach offers researchers a
handy and accurate platform for gathering virtual environment interaction data.
Systems 2024, 12, 7 11 of 24

3.6. AR Experiment Process Management Module


The augmented reality experiment process management module is designed for ad-
ministrators, including the following functions, as shown in Figure 5. Administrators can
create experiment projects by inputting basic information such as the name, description,
and objectives of the experiment. This allows for better organization and tracking of dif-
ferent experiments, providing a clear understanding of each project’s purpose and goals.
In addition, administrators have full control over the design of the experiment process.
They can add, edit, and delete experiment steps, allowing for customization and tailoring
of the experiment process to meet specific requirements. Detailed information, such as
step names, descriptions, keywords, images, and videos, can be provided for each step,
ensuring clarity and accuracy in the experiment design.

Experiment
project creation

Experiment design

Add Edit Delete N

Experiment
execution

Experiment process
management

Variable
Skip Repeat
order

Data collection and


analysis

End

Figure 5. Experimental process management flowchart. This comprehensive flowchart outlines


the process of designing, executing, and monitoring augmented reality experiments. It enables
customization, real-time monitoring, overall process management, and data collection for in-depth
analysis, ensuring efficient and insightful experimental administration.

Once the experiment is designed, users can execute it using the module’s interface.
The interface provides real-time information on the progress and results of the experiment,
allowing users to stay on track and monitor the experiment’s execution. This ensures that
the experiment is carried out smoothly and effectively. Administrators can also manage the
overall experiment process. They can create, edit, and delete experiment processes, setting
Systems 2024, 12, 7 12 of 24

the sequence and steps for the experiment. This ensures a logical and efficient flow of the
experiment, improving organization and management.
The module automatically collects data during the experiment process. This includes
user operations, eye-tracking data, and user feedback. The collected data can be used
for further analysis and evaluation of the experiment, providing valuable insights for
administrators and researchers. The module offers a range of functions that empower
administrators to create, design, execute, and analyze augmented reality experiments.
Its goal is to enhance the efficiency and quality of experiments, benefiting both users
and researchers.

4. Results
4.1. AR Interactive Experiment
4.1.1. Interactive Experiment Design
To test the system’s performance, we conducted a user study experiment to examine
the usability of the different modules. A total of 25 people were recruited through social
networks and student organizations on the university campus. We integrated the system
into a smart home scenario, as shown in Figure 6, and designed a series of experiments
for participants to test the system’s functionality and user experience. First, the real-time
status of the smart device was visualized and displayed above the device, and second,
the user could set up the smart device using the user interface. In order to enhance the
user’s perception of the environment, the system also visualizes the sensor sensing range
in relation to data communication. In addition, to simplify the interaction, the system
is designed with a scene-switching function, which realizes the rapid transformation of
device configuration in different scenes. During the experiment, participants’ interactive
actions were recorded, and time markers were used. After the experiment, participants
were asked to rate the performance of the system. All subjects gave their informed consent
for inclusion before they participated in the study. The protocol was approved by the Ethics
Committee of the affiliated university (2023ZDSYLL354-P01).

Figure 6. AR based smart home scene setting environment.

All participants were first required to complete basic AR operation tutorials to help
them learn and become familiar with the AR system and its functions. Firstly, users
were asked to observe the visualization state of the device and perform the interaction
test of the user interface; secondly, users were asked to observe the sensing perception
range and remove the physical objects that we had placed in the range in advance; and
Systems 2024, 12, 7 13 of 24

lastly, users were asked to switch and observe the communication relationship of the
device as well as the status information during different scenes. This was followed by a
formal experimental session in which participants were required to configure their smart
devices according to different scenario descriptions and prompts in conjunction with their
personal needs, as shown in Table 1. They were also asked to explain the reasons for
their settings to the experimenter after completing the tasks. In the formal experiment,
participants made corresponding smart device configuration decisions by observing the
smart device status, sensing range, and communication relationships in the scenarios and
were prompted by the experiment descriptions. The participants were tested separately.
Firstly, the experimenter introduced the research background, and the participants read and
signed the informed consent form. Subsequently, the participants in the AR group wore AR
glasses and underwent glasses calibration. After familiarizing themselves with the basic
operations, they completed the above scenario, setting tasks in sequence. Instructions for
each task were given throughout the experiment, and participants were instructed by the
experimenter to proceed to the next task after completing the previous one.

Table 1. Task scene and description.

Scene Description
Privacy Scene Participants were asked to imagine setting up corresponding settings
in privacy scenarios to minimize the risk of privacy exposure.
Leaving Scene Participants were asked to imagine setting up energy-saving, home-
cleaning, and house safety functions when leaving home for work.
Parlor Scene Participants were asked to imagine having friends as guests at home
and to provide a light and comfortable environment. They were also
asked to make corresponding settings while confidently chatting.
Sleeping Scene Participants were asked to imagine preparing to sleep at night and
needing a quiet environment. They were also asked to make corre-
sponding settings to avoid exposing their privacy.

4.1.2. Experiment Results


Multi-modal interaction data were collected from users in interactive experiments,
including click counts, consumed time, and other operational data. The counts of clicks
and the time consumed by users in different task sessions are shown in Figure 7. Since
these two factors are non-normally distributed, we conducted a Wilcoxon analysis and the
results of the correlation analysis are shown in Table 2. The results showed that in different
scenario setting tasks, the counts of clicks were positively correlated with the consumed
time (p < 0.0001, r = 0.73).

(a) TimeConsume by Task (b) TotalClickCount by Task

200

60

150
TimeConsume(s)

TotalClickCount

40

100

20
50

0 0
Privacy Scene Leaving Scene Parlor Scene Sleeping Scene Privacy Scene Leaving Scene Parlor Scene Sleeping Scene

Figure 7. Results of various tasks. (a) Distribution of consumed time for different tasks. (b) Distribu-
tion of click counts for different tasks.
Systems 2024, 12, 7 14 of 24

Table 2. Statistical data of consumed time and click counts by different tasks. We conducted the
Shapiro test for user consumption time and number of clicks in different scenarios and found that it
does not conform to normal distribution, so we used a nonparametric two-sample Wilcoxon’s rank
test to check whether the results were significant or not and calculated the correlation coefficient of
the two.

Click Counts Time Consumed (s)


p r
M SD M SD
Privacy Scene 34.08 17.04 89.2 53.69 p < 0.0001 0.78
Leaving Scene 13.72 7.84 56.24 37.66 p < 0.0001 0.83
Parlor Scene 12.12 8.27 58.40 39.46 p < 0.0001 0.71
Sleeping Scene 13.28 7.93 54.64 39.61 p < 0.0001 0.56
Total 73.2 41.08 258.48 170.42 p < 0.0001 0.73

The users also demonstrated their exploration and adaptability to the experimental
environment. During the experimental process, users can click to switch scenes, select
devices, and set operations to complete the experimental tasks. During the experiment, the
user performed multiple click operations, and the specific distribution is shown in Figure 8.
Sleeping Scene: 5.25%
Leaving Scene: 13.17%

Privacy Scene: 22.08%

Parlor Scene: 12.02%

Setting Click
Parlor Scene: 3.38%
Device Click

Mode Click

Leaving Scene: 4.81%

Privacy Scene: 24.04%

Sleeping Scene: 1.64%


Privacy Scene: 0.44%
Parlor Scene: 1.15%
Leaving Scene: 0.77%

Sleeping Scene: 11.25%

Figure 8. Distribution of click types among users in different tasks. This figure shows the user’s
operations on different tasks during the experiment.

The details of time and clicks consumed by different users to complete the tasks are
shown in Figure 9. In summary, by analyzing data on user click operations and consump-
tion time, the functional performance of the system can be evaluated, and improvement
suggestions can be proposed based on the evaluation results to optimize system perfor-
mance and user experience.
Systems 2024, 12, 7 15 of 24

Click Counts

Consumed Time
125 250

100 200

Consumed Time(s)
75 150
Click Counts

50 100

25 50

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
User

Figure 9. Time consumed and click counts by different users. By observing this figure, we can
understand the level of user participation in these tasks during the experiment.

In addition, during the testing of the sensing perception range, it was found that
there were errors in the user’s perception of the boundary of the virtual 3D range. In
the corresponding test session, users were asked to move the physical objects placed in
the sensing range along different distances and angles under the premise of moving the
smallest possible distance, and we recorded the position of the final object, measured
it, and obtained the distribution of the error perception, as shown in Figure 10. Since
the boundary perception errors are non-normally distributed, we performed a Wilcoxon
analysis to examine the user’s perceptions of straight surface perception errors (angle error)
and curved surface perception errors (distance error). The data show that the straight
surface error is significantly different from the surface error (p < 0.0001, r = 0.09).

(a) Distance Perception Error (b) Angle Perception Error


20

20 15
Frequency

Frequency

10

10

0 0

í 0 10 20 0 10 20 30
Distance Error(cm) Angle Error(°)

Figure 10. Errors in the user’s perception of the boundaries of the virtual 3D range. (a) Distribution
frequency of user distance perception error. (b) Distribution frequency of user angle perception error.

User evaluations of the ease of learning and interactivity of the system were collected
in different experimental sessions and the results are shown in Figure 11. Since the user
evaluation scores were non-normally distributed, we conducted a Wilcoxon analysis to test
this. Participants rated both the ease of learning and interactivity of the system highly, with
no significant difference between the two (p < 0.01, r = 0.38), indicating that their effects
are largely independent of each other. The experimental results suggest that there may be
limitations in how well people can perceive boundaries within virtual objects, but they also
demonstrate that the system is highly usable and engaging.
Systems 2024, 12, 7 16 of 24

100%

75% Learnability
Medium

Evaluation Score Agree


Strongly Agree
50%
Interactivity
Medium
Agree
25% Strongly Agree

0%

Visualization Control Connectivity Scene Switch

Figure 11. User evaluation of the ease of learning and interactivity of different functions.

4.2. Eye-Tracking Experiment


4.2.1. Eye-Tracking Experiment Design
The visual module is one of the most intuitive and important interaction modalities for
humans. The visual modality utilizes human visual perception and attention to guide users’
focus and their interactions. By leveraging visual cues, such as highlighting, animation, and
visual hierarchy, important information can be emphasized and capture users’ attention
within the interface.
To investigate relevant information about user attention and cognitive processes,
we designed a comprehensive eye-tracking data collection method in the context of this
augmented reality visualization system. During the experimental process, the system
records real-time gaze focus data on the graphical interface for each participant based on
their interactions with the system. This aids in identifying specific elements and areas
that users focus on during the interaction. Additionally, we designed the recording of the
duration participants gazed at each graphical interface to assess their levels of attention.
Longer gaze duration typically indicates greater interest or cognitive load.
In conducting eye-tracking experiments for augmented reality visualization systems,
we invited six participants to ensure breadth and reliability of the experiment. Initially,
participants underwent eye-tracking calibration on the AR device to accurately track and
record their gaze points during the experiment, accounting for variations in individuals’
eye characteristics. Following eye-tracking calibration, participants engaged in the formal
eye-tracking experiment integrated within the smart home scenario. In this experiment,
users first activated the real-time status visualization interface of the actual devices using
the AR device, which, in turn, triggered the eye-tracking module of the system. Building
on the aforementioned experimental steps, participants were free to explore the scene
based on their interests and carry out exploratory trials. Additionally, when users activated
the real-time status visualization interfaces of multiple devices through eye-tracking, we
prompted them to activate the system’s status management interface by fixating on either
the left or right palm, allowing them to view the real-time statuses of all devices in the
system. Throughout the eye-tracking interactions and experimental process described
above, the system recorded each user’s eye-tracking behavior, focus, and gaze data using
the eye-tracking data recording method we designed. Furthermore, to visually demonstrate
participants’ usage and manipulation of the visualization control panel within the system,
a heatmap of their eye gaze, generated based on their gaze duration and frequency on each
visualization interface, was overlaid on the interface, as shown in Figure 12.
Systems 2024, 12, 7 17 of 24

In summary, in this experiment, participants were allowed to engage in exploratory


attempts within the scenario, in addition to the tasks they were prompted to complete.
This open design provided participants with greater autonomy, encouraging more natural
interaction with the system in the augmented reality scenario. This design approach
facilitated the collection of more authentic eye-tracking behavior data, allowing us to better
understand user needs and improve the interaction design of the augmented reality system.
We evaluated the usability of the interaction interfaces in the AR system and captured users’
most authentic eye-tracking data in the augmented reality scenario. By analyzing users’
fixation points, fixation duration, and gaze areas, we assessed the interaction effectiveness
and efficiency between users and the interface. This evaluation helped us identify potential
issues in the interface design and provide improvement suggestions to enhance user
interaction experience and task completion efficiency.

Figure 12. Eye-tracking heatmap of participants in augmented reality system. It provides a visual
representation of the specific elements and areas that users focus on during eye-tracking experiments
in the interaction process.

4.2.2. Experiment Results


Through the comprehensive and detailed eye-tracking data collection method men-
tioned above, we collected experimental data for each participant. The collected eye-
tracking data serve as the basis for subsequent analysis. By using relevant data processing
tools and techniques, researchers can gain insights into users’ attention distribution, infor-
mation processing patterns, and cognitive load. These insights are crucial for improving
system interface design, optimizing user experiences, and conducting user research.
We processed gaze point data obtained from participants. Since augmented reality
visualization systems involve overlaying virtual reality onto the real world, the gaze
coordinates for each user focusing on the visualization graphical interface within the
system may vary based on their position. Therefore, we normalize the coordinates of
the gaze points for each user by recording the central coordinate point of the scene. The
processed data are then plotted on the same scatter plot to visually depict the eye’s visual
behavior and points of attention for each user. As the system enhances the real-world
three-dimensional environment, users’ graphical interface gaze data may present different
results in a two-dimensional plane compared to the three-dimensional environment. Thus,
we normalized participants’ eye attention data and plotted both the scatter plot in the
two-dimensional plane and the scatter plot in the three-dimensional environment. We
selected gaze data from each participant on two visual interfaces during the experiment, as
shown in Figure 13.
Systems 2024, 12, 7 18 of 24

Figure 13. These scatter plots depict participants’ eye-tracking gaze data on the visual interface,
generated during the eye-tracking experiment. They showcase the eye movement data results from
two different graphical interfaces within the system. The data have been normalized and standardized,
aligning it to a consistent coordinate system. On the left side, the scatter plot displays the eye-tracking
data results of the system’s graphical interface in a two-dimensional plane. On the right side, the
results present users’ three-dimensional eye-tracking data on the graphical interface in an augmented
reality setting. This chart reflects each user’s focal points during the eye-tracking experiment.

Different participants exhibited varying levels of interest in different scenes or in-


terfaces during the experiment. To evaluate users’ attention intensity towards different
elements, we recorded the duration of each participant’s gaze on each graphical interface.
Additionally, since participants may have multiple discontinuous fixations on the same
interface, we calculated the total gaze duration for each participant by summing the gaze
times on that particular interface. Longer gaze durations typically indicate higher interest
or cognitive load. The results are shown in Figure 14.
Through the eye-tracking experiments conducted in this system, we have collected
a substantial amount of user gaze data. Analyzing and discussing these data allows
us to gain a deeper understanding of the participants’ attention distribution and usage
preferences while using the system, providing valuable insights for optimizing the system’s
user experience and interface design.
Systems 2024, 12, 7 19 of 24

Figure 14. Total gaze duration of different participants on five different graphical interfaces in the
eye-tracking experiment. By calculating the total gaze duration of each participant on each interface,
we can extract the attention intensity of each user towards different elements. Longer gaze durations
typically indicate higher interest or cognitive load.

5. Discussion
5.1. Analysis of System Advantages and Usability Experiment Results
Previous research has highlighted that interactive scene visualization in immersive
virtual environments can offer decision support [40]. By utilizing AR visualization systems
for complex decisions, a more intuitive, real-time, multimodal, and collaborative decision-
making environment is provided, thus improving the quality and efficiency of decision
making. In the context of smart homes, visualizing privacy-invasive devices around the
user can assist the user in recognizing the presence of privacy devices and making necessary
adjustments [41]. Moreover, some researchers have implemented data type visualization
for common privacy-invasive devices, such as cameras and smart assistants, to aid user
decision making [42]. The system proposed in this paper introduces a visualization system
that can be applied to various scenarios, simplifying complex situations and helping users
make decisions in an immersive manner. The system provides users with a more immersive
and engaging experience by superimposing virtual objects onto the real world. This
enables users to perceive information more intuitively and naturally, thus deepening their
understanding and memory of the content. In addition, augmented reality visualization
systems can combine multiple sensory technologies, such as vision, hearing, and touch,
to provide information from multiple perspectives. This can enhance users’ perception of
the environment and help them better understand complex situations. This will enable
decision-makers to better understand and analyze data, consider factors and variables
more comprehensively, and make more informed and accurate decisions.
The results of the user experiments provide valuable insights into the effectiveness
of the system. The finding that users tended to click on the menu for device selection
indicates that the system successfully provided users with the flexibility to choose and
switch between different devices for interaction. This demonstrates the system’s capability
to support multi-modal perception and interaction, allowing users to utilize different input
devices based on their preferences or specific task requirements. Furthermore, during
the experiment, the number of user interactions and time consumption in the subsequent
scene setting tasks were significantly lower than in the first scene, suggesting that after
learning, users quickly become familiar with the system. This indicates that the system
has a learning curve, and with practice, users can become more adept at navigating and
interacting with the augmented reality environment. Users’ ratings of the ease of learning
Systems 2024, 12, 7 20 of 24

to operate the system with different functions became higher, which also suggests that
the system is user-friendly and easy to learn. Users found it relatively easy to grasp the
system’s functionalities and felt comfortable interacting with the virtual objects. The higher
ratings of interactivity compared to ease of learning suggest that users perceived the system
as highly interactive and engaging, even though it might have required some initial effort
to learn. In summary, the experimental data and analysis provided valuable insights into
the effectiveness and usability of the system. Experimental participants indicated that
the system provided a user-friendly and engaging experience and that the smart device-
based visualization system provided an important reference for decision making in their
scenarios. In summary, the experimental data and analysis provided valuable insights into
the effectiveness and usability of the system.

5.2. System Modules’ Usability Analysis


The user management module of this augmented reality visualization system com-
prises four parts: system tutorials, system experiments, data recording, and data analysis.
At the current stage, it demonstrates good system robustness. Considering participants’
backgrounds and skill levels, the system tutorials aim to help experiment participants
quickly grasp the operational procedures within the augmented reality interaction system
environment. They provide clear guidance and instructions, using concise and understand-
able language and illustrations, ensuring accurate execution of experimental tasks and
reducing misunderstandings about the experimental process.
Additionally, the system’s experimental design aligns with participants’ actual needs
and interaction styles. Employing multi-modal perceptual interaction experiments greatly
reduces the difficulty of participants’ experiments and improves their user experience.
The comprehensive experimental design allows precise recording of each participant’s
focus on specific elements and areas during the interaction experiment. By analyzing the
recorded user attention and gaze points, we can further refine the design of the system’s
visual graphical interface, thereby elevating users’ experience and immersion. Moreover,
the comprehensive data recording module captures participants’ real-time interaction
data using various formats such as images, text, and videos. The analysis of obtained
eye-tracking data enables a prompt understanding of participants’ actual usage patterns,
providing valuable empirical data and user recommendations for future research and
system performance improvements.
Furthermore, the system integrates a comprehensive data processing and analysis
module for a thorough examination of participants’ experimental data. Continuous en-
hancements to the user management module and the overall interaction experience are
derived from insights gained through participants’ data analysis results. These improve-
ments aim to meet the varied usage and exploration needs of participants with different
backgrounds. The visual representation of each participant’s attention distribution, in-
formation processing patterns, and cognitive load, based on eye-tracking data analysis
results, guides further optimization of the system’s interaction modes and visual graphical
interface distribution. Overall, this data-driven approach furnishes accurate evidence for
system optimization and future research.
In summary, while the member management module of the system performs well
at this stage, our pursuit is to optimize and enhance the augmented reality visualization
system. Designing personalized member management is our next research direction, con-
sidering the specific needs and interaction preferences of different participants, for example,
providing system tutorials and experiment difficulty levels tailored to participant’s skill
levels and proficiency to meet their learning and exploration needs.

5.3. Multi-Modal User Data Acquisition Method


The proposed AR visualization system aims to enhance multi-modal perception and
interaction by incorporating diverse sensory modalities. In this study, we conducted
two pivotal experiments, namely the AR interactive experiment and the eye-tracking ex-
Systems 2024, 12, 7 21 of 24

periment, to comprehensively assess the functionality of the overall system. Through these
two experiments, a diverse set of multi-modal data, including gestures, air taps, perception,
and eye-tracking data, was collected to understand the cognitive processes and experiences
of users during the interaction, facilitating a better overall evaluation of the system. In
the AR interactive experiment, the obtained experimental data reflected the system’s good
learnability and interactivity. It laid the foundation for improvements in boundary percep-
tion and user adaptability. Traditional multi-modal interaction research has predominantly
concentrated on visual, auditory, and tactile aspects [14,32], often overlooking eye-tracking
technology. In response, our system introduces eye-tracking technology to capture the
user’s gaze, facilitating a more natural and intuitive interaction. A multi-modal system that
incorporates sophisticated eye-tracking enables users to engage through gaze positioning,
gesture control, and tactile feedback, thereby enhancing user participation and immersive
experiences. Our eye-tracking experiment for the eye-tracking data acquisition module
built in this system involves collecting data such as trajectories and heatmaps on the AR
panel, providing valuable insights for analysts to examine users’ visual behavior patterns
and evaluate and enhance the system’s UI interface design. Through multi-user eye-
tracking experiments, we discovered the system’s sensitivity to eye movement calibration,
providing significant assistance for subsequent system improvements.
Overall, the system showed promise in capturing diverse user data modalities. How-
ever, the testing highlighted opportunities to improve multi-modal data collection accuracy
and reliability through adaptive calibrations, spatial reference standardization, and multi-
modal input fusion. Enhancing the system’s capabilities to seamlessly integrate these
modalities into natural interactions would further augment users’ sense of immersion and
engagement. The user data provide valuable insights to inform the iterative refinement of
the system’s multi-modal interaction design.

5.4. System Limitations and Future Expansion Points


The results of the user experiments indicate several limitations and areas for future
expansion in the augmented reality visualization system. One limitation is the potential
difficulty in accurately perceiving the depth and spatial relationships between virtual
and real-world objects. The lack of depth perception in the augmented reality overlays
can hinder users’ ability to interact effectively with the virtual objects. To address this,
future development could explore the integration of tangible user interfaces combined with
depth sensing technologies such as depth cameras or sensors, to provide users with more
accurate depth perception in the augmented reality environment. This would enable more
precise interaction with virtual objects and enhance the system’s multi-modal perception
capabilities.
Another limiting factor is that eye-tracking in current systems is only used for data
visualization. In terms of future expansion, the system could benefit from the integration
of gaze-based interaction techniques. By combining eye-tracking with the user interface,
users could perform actions such as object selection, navigation, and menu control through
their gaze. This would provide a more natural and intuitive interaction modality, reducing
the reliance on physical manipulation and further enhancing the system’s multi-modal
perception and interaction capabilities. Additionally, future development could focus on
incorporating advanced visualizations and overlays that take advantage of eye-tracking
data. For example, the system could dynamically adjust the size, position, or content of
augmented reality overlays based on users’ gaze patterns and visual attention. This would
enable a more personalized and context-aware augmented reality experience, enhancing
users’ perception and interaction with the virtual objects. Furthermore, the system could
benefit from the integration of machine learning algorithms to analyze and interpret users’
gaze data. By leveraging machine learning, the system could learn and adapt to individual
users’ gaze patterns, preferences, and behavior, further enhancing the personalized and
adaptive nature of the augmented reality experience.
Systems 2024, 12, 7 22 of 24

In conclusion, while the augmented reality visualization system has the potential
to enhance multi-modal perception and interaction as well as improve complex decision
making, there are limitations and areas for future expansion. Improving eye-tracking
accuracy, improving depth perception, incorporating gaze-based interaction techniques,
and leveraging machine learning algorithms are key areas to address. By expanding the
system’s capabilities in these areas, it can provide users with a more immersive, intuitive,
and personalized augmented reality experience and thus provide better support and
assistance for complex decision making.

6. Conclusions
This study proposes an augmented reality visualization system that focuses on the
potential of augmented reality visualization technologies in improving human decision
making through enhanced multi-modal perception and interaction. We conducted a se-
ries of experiments, including a multi-modal interaction experiment and an eye-tracking
experiment, within the context of a smart home system scenario, to evaluate the system’s
performance. The visualization system provides decision-aiding information, and the
multi-modal perception and interaction methods under AR, especially the eye-tracking
technology, provide an immersive decision-making environment under the scene, which
comprehensively improves the user’s ability to understand the information for decision
making. Our study contributes to the advancement of augmented reality and human–
computer interaction, presenting new possibilities for interactive visualization systems.
The results of the experiments indicate that the integration of eye-tracking enhances the
user experience and provides immersive interaction, allowing for a broader analysis of user
behavior. However, there are limitations to consider, such as boundary perception errors
and the limited application of eye-tracking, which restrict the system’s usability in certain
scenarios. To further advance this field, future research should focus on improving reality
perception, target recognition, and tracking to achieve diverse and natural interactions,
thereby enhancing the quality and efficiency of complex decision making.

Author Contributions: Conceptualization, L.C., Z.Z., H.Z. and X.S.; methodology, L.C., H.Z., Z.Z.
and X.S.; software, L.C., C.S., Y.W., X.Y., W.R. and Z.Z.; validation, L.C. and H.Z.; formal analysis, L.C.,
Z.Z., C.S. and Y.W.; writing: L.C., H.Z., C.S., Y.W., X.Y., W.R. and X.S.; visualization, L.C., C.S., Y.W.,
X.Y. and W.R.; supervision, H.Z. and X.S.; project administration, H.Z. and X.S.; funding acquisition,
H.Z. and X.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by National Key R&D Program of China (No. 2021YFB1600500)
and Marine Science and Technology Innovation Program of Jiangsu Province (No. JSZRHYKJ202308).
Institutional Review Board Statement: The study was conducted in accordance with the Declaration
of Helsinki, and the protocol was approved by the Ethics Committee of the affiliated university
(No. 2023ZDSYLL354-P01).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data will be made available on request.
Conflicts of Interest: The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:

MDPI Multidisciplinary Digital Publishing Institute


DOAJ Directory of open access journals
TLA Three letter acronym
LD Linear dichroism
AR Augmented Reality
UI User interfaces
TUIs Tangible user interfaces
Systems 2024, 12, 7 23 of 24

VR Virtual Reality
GUIs Graphical user interfaces
HMDs Head-mounted displays
MRTK Mixed Reality Toolkit

References
1. Cui, W. Visual Analytics: A Comprehensive Overview. IEEE Access 2019, 7, 81555–81573. [CrossRef]
2. Chen, C. Information visualization. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 387–403. [CrossRef]
3. Zhan, T.; Yin, K.; Xiong, J.; He, Z.; Wu, S.T. Augmented Reality and Virtual Reality Displays: Perspectives and Challenges.
iScience 2020, 23, 101397. [CrossRef] [PubMed]
4. Satriadi, K.A.; Smiley, J.; Ens, B.; Cordeil, M.; Czauderna, T.; Lee, B.; Yang, Y.; Dwyer, T.; Jenny, B. Tangible Globes for Data
Visualisation in Augmented Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems,
New York, NY, USA, 29 April–5 May 2022; [CrossRef]
5. Sadiku, M.; Shadare, A.; Musa, S.; Akujuobi, C.; Perry, R. Data Visualization. Int. J. Eng. Res. Adv. Technol. (IJERAT) 2016,
12, 2454–6135.
6. Keim, D. Information visualization and visual data mining. IEEE Trans. Vis. Comput. Graph. 2002, 8, 1–8. [CrossRef]
7. Xu, H.; Berres, A.; Liu, Y.; Allen-Dumas, M.R.; Sanyal, J. An overview of visualization and visual analytics applications in water
resources management. Environ. Model. Softw. 2022, 153, 105396. [CrossRef]
8. Zheng, J.G. Data visualization for business intelligence. In Global Business Intelligence; Routledge: London, UK, 2017; pp. 67–82.
[CrossRef]
9. Preim, B.; Lawonn, K. A Survey of Visual Analytics for Public Health. Comput. Graph. Forum 2020, 39, 543–580. [CrossRef]
10. White, S.; Kalkofen, D.; Sandor, C. Visualization in mixed reality environments. In Proceedings of the 2011 10th IEEE International
Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; p. 1. [CrossRef]
11. Martins, N.C.; Marques, B.; Alves, J.; Araújo, T.; Dias, P.; Santos, B.S. Augmented Reality Situated Visualization in Decision-
Making. Multimed. Tools Appl. 2022, 81, 14749–14772. [CrossRef]
12. Chen, K.; Chen, W.; Li, C.; Cheng, J. A BIM-based location aware AR collaborative framework for facility maintenance
management. Electron. J. Inf. Technol. Constr. 2019, 24, 360–380.
13. Ma, N.; Liu, Y.; Qiao, A.; Du, J. Design of Three-Dimensional Interactive Visualization System Based on Force Feedback Device.
In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18
May 2008; pp. 1780–1783. [CrossRef]
14. Han, W.; Schulz, H.J. Exploring Vibrotactile Cues for Interactive Guidance in Data Visualization. In Proceedings of the
13th International Symposium on Visual Information Communication and Interaction, VINCI ’20, New York, NY, USA, 8–10
December 2020. [CrossRef]
15. Su, Y.P.; Chen, X.Q.; Zhou, C.; Pearson, L.H.; Pretty, C.G.; Chase, J.G. Integrating Virtual, Mixed, and Augmented Reality into
Remote Robotic Applications: A Brief Review of Extended Reality-Enhanced Robotic Systems for Intuitive Telemanipulation and
Telemanufacturing Tasks in Hazardous Conditions. Appl. Sci. 2023, 13, 12129. [CrossRef]
16. Azuma, R.T. A Survey of Augmented Reality. Presence Teleoper. Virtual Environ. 1997, 6, 355–385. [CrossRef]
17. Tarng, W.; Tseng, Y.C.; Ou, K.L. Application of Augmented Reality for Learning Material Structures and Chemical Equilibrium in
High School Chemistry. Systems 2022, 10, 141. [CrossRef]
18. Gavish, N. The Dark Side of Using Augmented Reality (AR) Training Systems in Industry. In Systems Engineering in the Fourth
Industrial Revolution: Big Data, Novel Technologies, and Modern Systems Engineering; Wiley Online Library: Hoboken, NJ, USA, 2020;
pp. 191–201. [CrossRef]
19. Wu, H.K.; Lee, S.W.Y.; Chang, H.Y.; Liang, J.C. Current status, opportunities and challenges of augmented reality in education.
Comput. Educ. 2013, 62, 41–49. [CrossRef]
20. Akcayr, M.; Akcayır, G. Advantages and challenges associated with augmented reality for education: A systematic review of the
literature. Educ. Res. Rev. 2017, 20, 1–11. [CrossRef]
21. Nishimoto, A.; Johnson, A.E. Extending Virtual Reality Display Wall Environments Using Augmented Reality. In Proceedings of
the Symposium on Spatial User Interaction, SUI ’19, New York, NY, USA, 19–20 October 2019. [CrossRef]
22. Liu, B.; Tanaka, J. Virtual Marker Technique to Enhance User Interactions in a Marker-Based AR System. Appl. Sci. 2021, 11, 4379.
[CrossRef]
23. Gao, Q.H.; Wan, T.R.; Tang, W.; Chen, L. A Stable and Accurate Marker-Less Augmented Reality Registration Method. In
Proceedings of the 2017 International Conference on Cyberworlds (CW), Chester, UK, 20–22 September 2017; pp. 41–47. [CrossRef]
24. Ye, H.; Leng, J.; Xiao, C.; Wang, L.; Fu, H. ProObjAR: Prototyping Spatially-Aware Interactions of Smart Objects with AR-HMD.
In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 23–28
April 2023. [CrossRef]
25. Al-Ansi, A.M.; Jaboob, M.; Garad, A.; Al-Ansi, A. Analyzing augmented reality (AR) and virtual reality (VR) recent development
in education. Soc. Sci. Humanit. Open 2023, 8, 100532. [CrossRef]
26. Goh, E.S.; Sunar, M.S.; Ismail, A.W. Tracking Techniques in Augmented Reality for Handheld Interfaces. In Encyclopedia of
Computer Graphics and Games; Lee, N., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–10. [CrossRef]
Systems 2024, 12, 7 24 of 24

27. Moro, M.; Marchesi, G.; Hesse, F.; Odone, F.; Casadio, M. Markerless vs. Marker-Based Gait Analysis: A Proof of Concept Study.
Sensors 2022, 22, 2011. [CrossRef]
28. Zhang, Z.; Wen, F.; Sun, Z.; Guo, X.; He, T.; Lee, C. Artificial Intelligence-Enabled Sensing Technologies in the 5G/Internet of
Things Era: From Virtual Reality/Augmented Reality to the Digital Twin. Adv. Intell. Syst. 2022, 4, 2100228. [CrossRef]
29. Syed, T.A.; Siddiqui, M.S.; Abdullah, H.B.; Jan, S.; Namoun, A.; Alzahrani, A.; Nadeem, A.; Alkhodre, A.B. In-Depth Review of
Augmented Reality: Tracking Technologies, Development Tools, AR Displays, Collaborative AR, and Security Concerns. Sensors
2023, 23, 146. [CrossRef]
30. Khurshid, A.; Grunitzki, R.; Estrada Leyva, R.G.; Marinho, F.; Matthaus Maia Souto Orlando, B. Hand Gesture Recognition for
User Interaction in Augmented Reality (AR) Experience. In Virtual, Augmented and Mixed Reality: Design and Development; Chen,
J.Y.C., Fragomeni, G., Eds.; Springer: Cham, Switzerland, 2022; pp. 306–316.
31. Aouam, D.; Benbelkacem, S.; Zenati, N.; Zakaria, S.; Meftah, Z. Voice-based Augmented Reality Interactive System for Car’s
Components Assembly. In Proceedings of the 2018 3rd International Conference on Pattern Analysis and Intelligent Systems
(PAIS), Tebessa, Algeria, 24–25 October 2018; pp. 1–5. [CrossRef]
32. Kaimoto, H.; Monteiro, K.; Faridan, M.; Li, J.; Farajian, S.; Kakehi, Y.; Nakagaki, K.; Suzuki, R. Sketched Reality: Sketching
Bi-Directional Interactions Between Virtual and Physical Worlds with AR and Actuated Tangible UI. In Proceedings of the
35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA, 29 October–2
November 2022. [CrossRef]
33. Ishii, H.; Ullmer, B. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of the ACM
SIGCHI Conference on Human Factors in Computing Systems, CHI ’97, New York, NY, USA, 22–27 March 1997; pp. 234–241.
[CrossRef]
34. Löffler, D.; Tscharn, R.; Hurtienne, J. Multimodal Effects of Color and Haptics on Intuitive Interaction with Tangible User
Interfaces. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, TEI ’18,
New York, NY, USA, 18–21 March 2018; pp. 647–655. [CrossRef]
35. Shaer, O.; Hornecker, E. Tangible User Interfaces: Past, Present, and Future Directions. Found. Trends Hum.-Comput. Interact. 2010,
3, 1–137. [CrossRef]
36. Zuckerman, O.; Gal-Oz, A. To TUI or not to TUI: Evaluating performance and preference in tangible vs. graphical user interfaces.
Int. J. Hum.-Comput. Stud. 2013, 71, 803–820. [CrossRef]
37. Baykal, G.; Alaca, I.V.; Yantaç, A.; Göksun, T. A review on complementary natures of tangible user interfaces (TUIs) and early
spatial learning. Int. J. Child-Comput. Interact. 2018, 16, 104–113. [CrossRef]
38. He, F.; Hu, X.; Shi, J.; Qian, X.; Wang, T.; Ramani, K. Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in
Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York,
NY, USA, 23–28 April 2023. [CrossRef]
39. Unity. Vuforia SDK Overview. Available online: https://docs.unity3d.com/2018.4/Documentation/Manual/vuforia-sdk-
overview.html. (accessed on 13 November 2023).
40. Filonik, D.; Buchan, A.; Ogden-Doyle, L.; Bednarz, T. Interactive Scenario Visualisation in Immersive Virtual Environments for
Decision Making Support. In Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum
and Its Applications in Industry, VRCAI ’18, New York, NY, USA, 2–3 December 2018. [CrossRef]
41. Prange, S.; Shams, A.; Piening, R.; Abdelrahman, Y.; Alt, F. PriView—Exploring Visualisations to Support Users’ Privacy
Awareness. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA,
8–13 May 2021. [CrossRef]
42. Bermejo Fernandez, C.; Lee, L.H.; Nurmi, P.; Hui, P. PARA: Privacy Management and Control in Emerging IoT Ecosystems Using
Augmented Reality. In Proceedings of the 2021 International Conference on Multimodal Interaction, ICMI ’21, New York, NY,
USA, 18–22 October 2021; pp. 478–486. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like