Enhancing Multi-Modal Perception and Interaction - An Augmented Reality Visualization System For Complex Decision Making
Enhancing Multi-Modal Perception and Interaction - An Augmented Reality Visualization System For Complex Decision Making
Article
Enhancing Multi-Modal Perception and Interaction: An
Augmented Reality Visualization System for Complex
Decision Making
Liru Chen 1 , Hantao Zhao 1,2, *, Chenhui Shi 1 , Youbo Wu 1 , Xuewen Yu 1 , Wenze Ren 1 , Ziyi Zhang 3
and Xiaomeng Shi 4
1 School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China;
[email protected] (L.C.)
2 Purple Mountain Laboratories, Nanjing 211189, China
3 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China
4 School of Transportation, Southeast University, Nanjing 211189, China
* Correspondence: [email protected]
Abstract: Visualization systems play a crucial role in industry, education, and research domains by
offering valuable insights and enhancing decision making. These systems enable the representation
of complex workflows and data in a visually intuitive manner, facilitating better understanding,
analysis, and communication of information. This paper explores the potential of augmented
reality (AR) visualization systems that enhance multi-modal perception and interaction for complex
decision making. The proposed system combines the physicality and intuitiveness of the real world
with the immersive and interactive capabilities of AR systems. By integrating physical objects
and virtual elements, users can engage in natural and intuitive interactions, leveraging multiple
sensory modalities. Specifically, the system incorporates vision, touch, eye-tracking, and sound as
multi-modal interaction methods to further improve the user experience. This multi-modal nature
enables users to perceive and interact in a more holistic and immersive manner. The software and
hardware engineering of the proposed system are elaborated in detail, and the system’s architecture
and preliminary function testing results are also included in the manuscript. The findings aim to
Citation: Chen, L.; Zhao, H.; Shi, C.; aid visualization system designers, researchers, and practitioners in exploring and harnessing the
Wu, Y.; Yu, X.; Ren, W.; Zhang, Z.; Shi, capabilities of this integrated approach, ultimately leading to more engaging and immersive user
X. Enhancing Multi-Modal Perception
experiences in various application domains.
and Interaction: An Augmented
Reality Visualization System for
Keywords: visualization systems; AR visualization systems; multi-modal perception and interaction;
Complex Decision Making. Systems
user experience
2024, 12, 7. https://doi.org/10.3390/
systems12010007
ability to face complex decisions. By combining these two methods, users can utilize
multiple sensory modes for natural and intuitive interaction. Previous tangible user
interface (TUI) study explored the “tangible virtual interaction” between tangible Earth
instruments and virtual data visualization and proposed that head-worn AR displays
allow seamless integration between virtual visualization and contextual tangible references
such as physical Earth instruments [4]. In addition to enhancing the user experience, the
integration of AR and visualization systems also brings benefits in terms of accessibility and
inclusivity. Users with motion impairments can use their body posture and movements to
manipulate virtual objects, enabling them to interact more effectively with virtual elements,
thereby overcoming the limitations of traditional input devices.
This study proposed an advanced augmented reality visualization system that in-
corporates multi-modal perception and interaction methods. This cutting-edge system
seamlessly integrates virtual elements into the real-world environment, enhancing users’
interaction with their surroundings. By employing various multi-modal interaction meth-
ods, including visual, tactile, and auditory, users can easily identify and engage with virtual
elements superimposed onto their physical reality. The system also enables interactive
feedback, allowing users to physically interact with virtual objects, enhancing the overall
sense of realism. In addition, our system incorporates eye-tracking technology, which
provides a more intuitive and natural interactive visualization, increasing a certain degree
of convenience.
The rest of this article is organized as follows. Firstly, we conducted a literature review
on visualization systems, AR technology, and virtual user interfaces. Then, we introduced
the AR visualization system for improving the user experience, including system archi-
tecture and functional design. Subsequently, the results of AR memory eye-tracking data
using the system were presented. Finally, the data analysis and functionality of the system
were discussed, as well as its limitations and future research prospects. Overall, the system
described in this article provides a deep understanding of the integration aspects of AR vi-
sualization systems, showcasing their functionality and potential applications. By exploring
and harnessing the capabilities of this integrated approach, we can unlock new possibilities
for enhancing multi-modal perception and interaction, ultimately revolutionizing the way
we interact with visualized data and workflows.
2. Literature Review
2.1. Visualization Systems
Visualization systems play a crucial role in aiding the comprehension and analysis
of data [5]. These systems allow users to transform raw data into visual representations,
providing a more intuitive and interactive way to explore and understand information.
Visualization systems offer numerous benefits that contribute to their widespread adoption
in various domains. One of the primary advantages is the ability to uncover patterns
and relationships that may not be apparent in raw data. By presenting data in a visual
form, users can easily identify trends, outliers, and correlations, leading to more informed
decision-making processes [6].
Visualization systems apply in a wide range of domains, including scientific research,
business analytics, and healthcare. In scientific research, visualization systems have been
instrumental in understanding complex phenomena, such as environmental monitoring [7].
In business analytics, visualization systems are used for communication, information seek-
ing, analysis, and decision support [8]. In healthcare, visualization systems aid in electronic
medical records and medical decision making, enhancing patient care and outcomes [9].
As technology continues to advance, visualization systems are expected to evolve in
various ways. One emerging trend is the integration of virtual reality (VR) and augmented
reality technologies into visualization systems [10]. In the context of augmented reality,
visualization systems can leverage the capabilities of AR technology to present data in a
more intuitive and context-aware manner. Various visualization techniques, such as 3D
models, graphs, charts, and spatial layouts, have been explored to enhance data exploration
Systems 2024, 12, 7 3 of 24
and understanding. Martins [11] proposed a visualization framework for AR that enhances
data exploration and analysis. The framework leverages the capabilities of AR to provide
interactive visualizations in real-time, allowing users to manipulate and explore data
from different perspectives. By combining AR with visualization techniques, users can
gain deeper insights and make more informed decisions. Additionally, there has been a
growing interest in collaborative visualization systems using AR. Chen [12] developed a
collaborative AR visualization system that enables multiple users to interact and visualize
data simultaneously. The system supports co-located and remote collaborations, enhancing
communication and understanding among users.
In recent years, there has been a growing interest in incorporating multi-modal feed-
back, including visual and tactile cues, to create more immersive and intuitive experiences.
Haptic feedback has been explored as an essential component of visualization systems to
provide users with a tactile sense of virtual objects. Haptic feedback can enhance the user’s
perception of shape, texture, and force, allowing for a more realistic and immersive experi-
ence. Several studies have investigated the integration of haptic feedback into visualization
systems, such as the use of force feedback devices [13] or vibrotactile feedback [14]. These ap-
proaches enable users to feel and manipulate virtual objects, enhancing their understanding
and engagement with the data. In summary, the integration of multi-modal perception and
interaction in visualization systems, particularly through the use of augmented reality and
user interfaces, has been an active area of research. Previous studies have demonstrated the
benefits of combining visual and tactile feedback to create more immersive and intuitive
experiences [14]. The visualization system proposed in this study presents data that are
challenging to visualize in text or daily life, such as a certain range of information. It utilizes
a 3D format, in contrast to previous information that is entirely virtual or detached from
associated equipment. This study provides timely and reliable information assistance in
user decision making by mapping virtual information onto tangible physical objects and
through multi-modal feedback, including visual and auditory cues.
the practical value of our system, addressing diverse user needs in different domains and
increasing its utility and potential for widespread application.
Another challenge is the design and development of intuitive and natural user in-
terfaces for AR systems. Traditional input devices, such as keyboards and mice, may not
be suitable for AR interactions. Therefore, researchers have explored alternative input
methods, including gesture recognition and voice commands, to enhance user engagement
and interaction [30–32]. However, a single gesture recognition or voice command may only
provide limited interaction options, limit the user’s operating methods, and have reliability
and accuracy issues, while lacking diversity and flexibility. Therefore, multiple interaction
methods should be provided in AR applications to ensure the widespread adoption and
successful implementation of AR technology in various applications.
To address these challenges and provide users with a richer and more immersive
experience, this study combines AR and visualization systems. By integrating visual,
auditory, and tactile multi-modal perception and interaction, AR applications can offer a
more comprehensive and engaging user experience. This approach expands the possibilities
for interaction and enhances the user’s ability to manipulate and explore virtual objects in
the real world.
on preferences, fostering a tighter connection between the system and users, ultimately
enhancing user satisfaction and improving overall experiences.
Performance Layer
Multi-modal perception
User interface Physical mapping
and interaction
Business Layer
Data Layer
Figure 1. The main design and implementation module of the system. The illustration integrates
three components: performance layer, business layer, and data layer.
module provides a foundation for analyzing the behavior and attention distribution
of participants, ensuring the accuracy and reliability of experimental results.
• Augmented reality interface module: This module provides researchers with a user-
friendly and reliable platform for conducting experiments and refining AR experiences.
It utilizes Unity, HoloLens device, Vuforia platform, and the Mixed Reality Toolkit
to create immersive AR scenes, seamlessly integrating virtual objects into real envi-
ronments enabling device locomotion-based virtual content tracking, specific image
recognition, and various interaction modalities. This integration establishes a uni-
fied framework, enhancing the overall cohesion and functionality of the augmented
reality system.
• User behavior interaction module: This module enables users to interact with the
augmented reality environment through various input methods, including voice
commands, gestures, and user interfaces. It provides a flexible and intuitive way for
users to manipulate virtual objects and navigate the system.
• Eye-tracking data acquisition module: This module stores the aggregated gaze data
locally, providing spatial location and timing information for subsequent statistical
analyses. This accurate and convenient platform offers researchers valuable insights
into users’ visual behavior patterns and interface design issues in virtual environments.
• AR experiment process management module: This module ensures the smooth
execution and management of augmented reality experiments. It provides tools for
designing and conducting experiments, collecting data, and managing experimental
processes, helping to improve the efficiency and accuracy of experiments while also
promoting the work of researchers.
To implement this system, the following software and hardware configurations are
required: the Unity development platform, HoloLens headset, and Augmented Reality
Toolkit. Unity is a powerful game engine and development platform that enables the
creation of interactive and immersive experiences, serving as the foundation for developing
the augmented reality visualization system. The HoloLens is a wearable mixed reality
device developed by Microsoft that combines virtual reality and augmented reality capabili-
ties, allowing users to interact with virtual objects in the real world. The Augmented Reality
Toolkit is a software library that provides tools and resources for developing augmented
reality applications, including features for 3D object recognition, tracking, and interaction,
which are essential for our system’s functionality.
After the scene data and system settings are completed, the system enhances multi-
modal perception and interaction by incorporating various modes of interaction. Users
wear HoloLens glasses to access the AR scene, and the system recognizes device informa-
tion through Vuforia scanning. Users can navigate and interact using voice commands,
and hand gestures are detected for precise manipulation. Physical props can also be used
to interact with virtual objects. Eye-tracking data are recorded for analysis, and an experi-
mental process management module streamlines the evaluation and improvement of the
system. This comprehensive approach improves the user experience and usability.
System tutorial
Scene control
Equipment control
Eye gaze
Figure 2. Member management module. The illustration integrates four components: system tutorial,
system experiment, data recording, and data processing and analysis.
System
Voice
Engine Visual
Gesture
Eye
Scene Vuforia
Air Tap
Device
MRTK Unity locomotion
AR
Interface
Hololens Device
Figure 3. Composition of the Augmented Reality Interface Module. Unity and MRTK provide
technical support for the AR interface to achieve scene construction and multi-modal interaction,
connecting systems and devices through the AR interface, offering a reliable and convenient tool for
AR research and development.
Unity, as a prevalent cross-platform game engine, can constitute the primary devel-
opment framework. Its abundant tools and engine support facilitate the crafting of 3D
scenes and user interfaces. Unity is leveraged to create and render AR scenes encompassing
virtual objects, 3D models, and user interface elements. Its robust graphics engine assimi-
lates virtual content into the HoloLens headset, while the device’s innate spatial mapping
and gesture recognition integrate virtual objects seamlessly into real environments for
remarkably authentic AR experiences.
The module’s development harnessed the Mixed Reality Toolkit (MRTK), an open-
source toolkit furnishing fundamental components and features to streamline cross-platform
AR application development. The module buttresses diverse interaction modalities, includ-
ing gesture control, air tap, voice commands, and eye-tracking. Catering to varied research
requirements, it offers flexible customization capabilities to introduce novel virtual objects
and adjust scene layouts while facilitating the storage and visualization of user behavior
data for analysis.
In order to enrich the AR function of the system and device, we also added device
locomotion features to the AR Interface Module. The device locomotion features in AR
systems can track real objects and overlay virtual content on them to enhance interactivity
and tangibility through recognition and positioning methods. In this system, we primarily
utilize Vuforia’s scanning capability to implement device locomotion. Vuforia is a cross-
Systems 2024, 12, 7 9 of 24
Observation
Visual
Feedback Hovering
Trigger events
Button
Interactive Instant event
objects Selection
Physical mapping
Movement
Touch 3D content
Clicking
Real-time behavior
Directly command
Voice
No gesture
Figure 4. Composition of user behavior interaction module. The illustration integrates three compo-
nents: multi-modal perception, interactive objects, and a feedback mechanism.
Systems 2024, 12, 7 10 of 24
In multi-modal perception, the module recognizes user input methods such as voice,
gestures, and touch to capture real-time behavior and needs. This allows users to select 3D
content by clicking or using a ray emitted from their hand. Interactive objects in the 3D
world can trigger events, such as touching buttons and 3D objects, allowing users to directly
interact with the system through wearable devices. The feedback mechanism provides
users with timely feedback on their operations. This can be visual, such as highlighting and
finger cursor feedback, or auditory, with sound effects at different user selection statuses
(including observation, hovering, touch start, touch end, etc.).
By combining these modules, the system offers users a highly intelligent and person-
alized interactive experience. By integrating various input methods, users can interact
with virtual information more intuitively and efficiently. Additionally, the inclusion of
interactive objects and a feedback mechanism ensures that users receive timely and infor-
mative feedback on their actions, further enhancing their understanding and control of
the system. Overall, the multi-modal perception and interaction module greatly enhances
the interactive experience and enables users to effectively collaborate and innovate in
real-world scenarios.
Experiment
project creation
Experiment design
Experiment
execution
Experiment process
management
Variable
Skip Repeat
order
End
Once the experiment is designed, users can execute it using the module’s interface.
The interface provides real-time information on the progress and results of the experiment,
allowing users to stay on track and monitor the experiment’s execution. This ensures that
the experiment is carried out smoothly and effectively. Administrators can also manage the
overall experiment process. They can create, edit, and delete experiment processes, setting
Systems 2024, 12, 7 12 of 24
the sequence and steps for the experiment. This ensures a logical and efficient flow of the
experiment, improving organization and management.
The module automatically collects data during the experiment process. This includes
user operations, eye-tracking data, and user feedback. The collected data can be used
for further analysis and evaluation of the experiment, providing valuable insights for
administrators and researchers. The module offers a range of functions that empower
administrators to create, design, execute, and analyze augmented reality experiments.
Its goal is to enhance the efficiency and quality of experiments, benefiting both users
and researchers.
4. Results
4.1. AR Interactive Experiment
4.1.1. Interactive Experiment Design
To test the system’s performance, we conducted a user study experiment to examine
the usability of the different modules. A total of 25 people were recruited through social
networks and student organizations on the university campus. We integrated the system
into a smart home scenario, as shown in Figure 6, and designed a series of experiments
for participants to test the system’s functionality and user experience. First, the real-time
status of the smart device was visualized and displayed above the device, and second,
the user could set up the smart device using the user interface. In order to enhance the
user’s perception of the environment, the system also visualizes the sensor sensing range
in relation to data communication. In addition, to simplify the interaction, the system
is designed with a scene-switching function, which realizes the rapid transformation of
device configuration in different scenes. During the experiment, participants’ interactive
actions were recorded, and time markers were used. After the experiment, participants
were asked to rate the performance of the system. All subjects gave their informed consent
for inclusion before they participated in the study. The protocol was approved by the Ethics
Committee of the affiliated university (2023ZDSYLL354-P01).
All participants were first required to complete basic AR operation tutorials to help
them learn and become familiar with the AR system and its functions. Firstly, users
were asked to observe the visualization state of the device and perform the interaction
test of the user interface; secondly, users were asked to observe the sensing perception
range and remove the physical objects that we had placed in the range in advance; and
Systems 2024, 12, 7 13 of 24
lastly, users were asked to switch and observe the communication relationship of the
device as well as the status information during different scenes. This was followed by a
formal experimental session in which participants were required to configure their smart
devices according to different scenario descriptions and prompts in conjunction with their
personal needs, as shown in Table 1. They were also asked to explain the reasons for
their settings to the experimenter after completing the tasks. In the formal experiment,
participants made corresponding smart device configuration decisions by observing the
smart device status, sensing range, and communication relationships in the scenarios and
were prompted by the experiment descriptions. The participants were tested separately.
Firstly, the experimenter introduced the research background, and the participants read and
signed the informed consent form. Subsequently, the participants in the AR group wore AR
glasses and underwent glasses calibration. After familiarizing themselves with the basic
operations, they completed the above scenario, setting tasks in sequence. Instructions for
each task were given throughout the experiment, and participants were instructed by the
experimenter to proceed to the next task after completing the previous one.
Scene Description
Privacy Scene Participants were asked to imagine setting up corresponding settings
in privacy scenarios to minimize the risk of privacy exposure.
Leaving Scene Participants were asked to imagine setting up energy-saving, home-
cleaning, and house safety functions when leaving home for work.
Parlor Scene Participants were asked to imagine having friends as guests at home
and to provide a light and comfortable environment. They were also
asked to make corresponding settings while confidently chatting.
Sleeping Scene Participants were asked to imagine preparing to sleep at night and
needing a quiet environment. They were also asked to make corre-
sponding settings to avoid exposing their privacy.
200
60
150
TimeConsume(s)
TotalClickCount
40
100
20
50
0 0
Privacy Scene Leaving Scene Parlor Scene Sleeping Scene Privacy Scene Leaving Scene Parlor Scene Sleeping Scene
Figure 7. Results of various tasks. (a) Distribution of consumed time for different tasks. (b) Distribu-
tion of click counts for different tasks.
Systems 2024, 12, 7 14 of 24
Table 2. Statistical data of consumed time and click counts by different tasks. We conducted the
Shapiro test for user consumption time and number of clicks in different scenarios and found that it
does not conform to normal distribution, so we used a nonparametric two-sample Wilcoxon’s rank
test to check whether the results were significant or not and calculated the correlation coefficient of
the two.
The users also demonstrated their exploration and adaptability to the experimental
environment. During the experimental process, users can click to switch scenes, select
devices, and set operations to complete the experimental tasks. During the experiment, the
user performed multiple click operations, and the specific distribution is shown in Figure 8.
Sleeping Scene: 5.25%
Leaving Scene: 13.17%
Setting Click
Parlor Scene: 3.38%
Device Click
Mode Click
Figure 8. Distribution of click types among users in different tasks. This figure shows the user’s
operations on different tasks during the experiment.
The details of time and clicks consumed by different users to complete the tasks are
shown in Figure 9. In summary, by analyzing data on user click operations and consump-
tion time, the functional performance of the system can be evaluated, and improvement
suggestions can be proposed based on the evaluation results to optimize system perfor-
mance and user experience.
Systems 2024, 12, 7 15 of 24
Click Counts
Consumed Time
125 250
100 200
Consumed Time(s)
75 150
Click Counts
50 100
25 50
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
User
Figure 9. Time consumed and click counts by different users. By observing this figure, we can
understand the level of user participation in these tasks during the experiment.
In addition, during the testing of the sensing perception range, it was found that
there were errors in the user’s perception of the boundary of the virtual 3D range. In
the corresponding test session, users were asked to move the physical objects placed in
the sensing range along different distances and angles under the premise of moving the
smallest possible distance, and we recorded the position of the final object, measured
it, and obtained the distribution of the error perception, as shown in Figure 10. Since
the boundary perception errors are non-normally distributed, we performed a Wilcoxon
analysis to examine the user’s perceptions of straight surface perception errors (angle error)
and curved surface perception errors (distance error). The data show that the straight
surface error is significantly different from the surface error (p < 0.0001, r = 0.09).
20 15
Frequency
Frequency
10
10
0 0
í 0 10 20 0 10 20 30
Distance Error(cm) Angle Error(°)
Figure 10. Errors in the user’s perception of the boundaries of the virtual 3D range. (a) Distribution
frequency of user distance perception error. (b) Distribution frequency of user angle perception error.
User evaluations of the ease of learning and interactivity of the system were collected
in different experimental sessions and the results are shown in Figure 11. Since the user
evaluation scores were non-normally distributed, we conducted a Wilcoxon analysis to test
this. Participants rated both the ease of learning and interactivity of the system highly, with
no significant difference between the two (p < 0.01, r = 0.38), indicating that their effects
are largely independent of each other. The experimental results suggest that there may be
limitations in how well people can perceive boundaries within virtual objects, but they also
demonstrate that the system is highly usable and engaging.
Systems 2024, 12, 7 16 of 24
100%
75% Learnability
Medium
0%
Figure 11. User evaluation of the ease of learning and interactivity of different functions.
Figure 12. Eye-tracking heatmap of participants in augmented reality system. It provides a visual
representation of the specific elements and areas that users focus on during eye-tracking experiments
in the interaction process.
Figure 13. These scatter plots depict participants’ eye-tracking gaze data on the visual interface,
generated during the eye-tracking experiment. They showcase the eye movement data results from
two different graphical interfaces within the system. The data have been normalized and standardized,
aligning it to a consistent coordinate system. On the left side, the scatter plot displays the eye-tracking
data results of the system’s graphical interface in a two-dimensional plane. On the right side, the
results present users’ three-dimensional eye-tracking data on the graphical interface in an augmented
reality setting. This chart reflects each user’s focal points during the eye-tracking experiment.
Figure 14. Total gaze duration of different participants on five different graphical interfaces in the
eye-tracking experiment. By calculating the total gaze duration of each participant on each interface,
we can extract the attention intensity of each user towards different elements. Longer gaze durations
typically indicate higher interest or cognitive load.
5. Discussion
5.1. Analysis of System Advantages and Usability Experiment Results
Previous research has highlighted that interactive scene visualization in immersive
virtual environments can offer decision support [40]. By utilizing AR visualization systems
for complex decisions, a more intuitive, real-time, multimodal, and collaborative decision-
making environment is provided, thus improving the quality and efficiency of decision
making. In the context of smart homes, visualizing privacy-invasive devices around the
user can assist the user in recognizing the presence of privacy devices and making necessary
adjustments [41]. Moreover, some researchers have implemented data type visualization
for common privacy-invasive devices, such as cameras and smart assistants, to aid user
decision making [42]. The system proposed in this paper introduces a visualization system
that can be applied to various scenarios, simplifying complex situations and helping users
make decisions in an immersive manner. The system provides users with a more immersive
and engaging experience by superimposing virtual objects onto the real world. This
enables users to perceive information more intuitively and naturally, thus deepening their
understanding and memory of the content. In addition, augmented reality visualization
systems can combine multiple sensory technologies, such as vision, hearing, and touch,
to provide information from multiple perspectives. This can enhance users’ perception of
the environment and help them better understand complex situations. This will enable
decision-makers to better understand and analyze data, consider factors and variables
more comprehensively, and make more informed and accurate decisions.
The results of the user experiments provide valuable insights into the effectiveness
of the system. The finding that users tended to click on the menu for device selection
indicates that the system successfully provided users with the flexibility to choose and
switch between different devices for interaction. This demonstrates the system’s capability
to support multi-modal perception and interaction, allowing users to utilize different input
devices based on their preferences or specific task requirements. Furthermore, during
the experiment, the number of user interactions and time consumption in the subsequent
scene setting tasks were significantly lower than in the first scene, suggesting that after
learning, users quickly become familiar with the system. This indicates that the system
has a learning curve, and with practice, users can become more adept at navigating and
interacting with the augmented reality environment. Users’ ratings of the ease of learning
Systems 2024, 12, 7 20 of 24
to operate the system with different functions became higher, which also suggests that
the system is user-friendly and easy to learn. Users found it relatively easy to grasp the
system’s functionalities and felt comfortable interacting with the virtual objects. The higher
ratings of interactivity compared to ease of learning suggest that users perceived the system
as highly interactive and engaging, even though it might have required some initial effort
to learn. In summary, the experimental data and analysis provided valuable insights into
the effectiveness and usability of the system. Experimental participants indicated that
the system provided a user-friendly and engaging experience and that the smart device-
based visualization system provided an important reference for decision making in their
scenarios. In summary, the experimental data and analysis provided valuable insights into
the effectiveness and usability of the system.
periment, to comprehensively assess the functionality of the overall system. Through these
two experiments, a diverse set of multi-modal data, including gestures, air taps, perception,
and eye-tracking data, was collected to understand the cognitive processes and experiences
of users during the interaction, facilitating a better overall evaluation of the system. In
the AR interactive experiment, the obtained experimental data reflected the system’s good
learnability and interactivity. It laid the foundation for improvements in boundary percep-
tion and user adaptability. Traditional multi-modal interaction research has predominantly
concentrated on visual, auditory, and tactile aspects [14,32], often overlooking eye-tracking
technology. In response, our system introduces eye-tracking technology to capture the
user’s gaze, facilitating a more natural and intuitive interaction. A multi-modal system that
incorporates sophisticated eye-tracking enables users to engage through gaze positioning,
gesture control, and tactile feedback, thereby enhancing user participation and immersive
experiences. Our eye-tracking experiment for the eye-tracking data acquisition module
built in this system involves collecting data such as trajectories and heatmaps on the AR
panel, providing valuable insights for analysts to examine users’ visual behavior patterns
and evaluate and enhance the system’s UI interface design. Through multi-user eye-
tracking experiments, we discovered the system’s sensitivity to eye movement calibration,
providing significant assistance for subsequent system improvements.
Overall, the system showed promise in capturing diverse user data modalities. How-
ever, the testing highlighted opportunities to improve multi-modal data collection accuracy
and reliability through adaptive calibrations, spatial reference standardization, and multi-
modal input fusion. Enhancing the system’s capabilities to seamlessly integrate these
modalities into natural interactions would further augment users’ sense of immersion and
engagement. The user data provide valuable insights to inform the iterative refinement of
the system’s multi-modal interaction design.
In conclusion, while the augmented reality visualization system has the potential
to enhance multi-modal perception and interaction as well as improve complex decision
making, there are limitations and areas for future expansion. Improving eye-tracking
accuracy, improving depth perception, incorporating gaze-based interaction techniques,
and leveraging machine learning algorithms are key areas to address. By expanding the
system’s capabilities in these areas, it can provide users with a more immersive, intuitive,
and personalized augmented reality experience and thus provide better support and
assistance for complex decision making.
6. Conclusions
This study proposes an augmented reality visualization system that focuses on the
potential of augmented reality visualization technologies in improving human decision
making through enhanced multi-modal perception and interaction. We conducted a se-
ries of experiments, including a multi-modal interaction experiment and an eye-tracking
experiment, within the context of a smart home system scenario, to evaluate the system’s
performance. The visualization system provides decision-aiding information, and the
multi-modal perception and interaction methods under AR, especially the eye-tracking
technology, provide an immersive decision-making environment under the scene, which
comprehensively improves the user’s ability to understand the information for decision
making. Our study contributes to the advancement of augmented reality and human–
computer interaction, presenting new possibilities for interactive visualization systems.
The results of the experiments indicate that the integration of eye-tracking enhances the
user experience and provides immersive interaction, allowing for a broader analysis of user
behavior. However, there are limitations to consider, such as boundary perception errors
and the limited application of eye-tracking, which restrict the system’s usability in certain
scenarios. To further advance this field, future research should focus on improving reality
perception, target recognition, and tracking to achieve diverse and natural interactions,
thereby enhancing the quality and efficiency of complex decision making.
Author Contributions: Conceptualization, L.C., Z.Z., H.Z. and X.S.; methodology, L.C., H.Z., Z.Z.
and X.S.; software, L.C., C.S., Y.W., X.Y., W.R. and Z.Z.; validation, L.C. and H.Z.; formal analysis, L.C.,
Z.Z., C.S. and Y.W.; writing: L.C., H.Z., C.S., Y.W., X.Y., W.R. and X.S.; visualization, L.C., C.S., Y.W.,
X.Y. and W.R.; supervision, H.Z. and X.S.; project administration, H.Z. and X.S.; funding acquisition,
H.Z. and X.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by National Key R&D Program of China (No. 2021YFB1600500)
and Marine Science and Technology Innovation Program of Jiangsu Province (No. JSZRHYKJ202308).
Institutional Review Board Statement: The study was conducted in accordance with the Declaration
of Helsinki, and the protocol was approved by the Ethics Committee of the affiliated university
(No. 2023ZDSYLL354-P01).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data will be made available on request.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
VR Virtual Reality
GUIs Graphical user interfaces
HMDs Head-mounted displays
MRTK Mixed Reality Toolkit
References
1. Cui, W. Visual Analytics: A Comprehensive Overview. IEEE Access 2019, 7, 81555–81573. [CrossRef]
2. Chen, C. Information visualization. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 387–403. [CrossRef]
3. Zhan, T.; Yin, K.; Xiong, J.; He, Z.; Wu, S.T. Augmented Reality and Virtual Reality Displays: Perspectives and Challenges.
iScience 2020, 23, 101397. [CrossRef] [PubMed]
4. Satriadi, K.A.; Smiley, J.; Ens, B.; Cordeil, M.; Czauderna, T.; Lee, B.; Yang, Y.; Dwyer, T.; Jenny, B. Tangible Globes for Data
Visualisation in Augmented Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems,
New York, NY, USA, 29 April–5 May 2022; [CrossRef]
5. Sadiku, M.; Shadare, A.; Musa, S.; Akujuobi, C.; Perry, R. Data Visualization. Int. J. Eng. Res. Adv. Technol. (IJERAT) 2016,
12, 2454–6135.
6. Keim, D. Information visualization and visual data mining. IEEE Trans. Vis. Comput. Graph. 2002, 8, 1–8. [CrossRef]
7. Xu, H.; Berres, A.; Liu, Y.; Allen-Dumas, M.R.; Sanyal, J. An overview of visualization and visual analytics applications in water
resources management. Environ. Model. Softw. 2022, 153, 105396. [CrossRef]
8. Zheng, J.G. Data visualization for business intelligence. In Global Business Intelligence; Routledge: London, UK, 2017; pp. 67–82.
[CrossRef]
9. Preim, B.; Lawonn, K. A Survey of Visual Analytics for Public Health. Comput. Graph. Forum 2020, 39, 543–580. [CrossRef]
10. White, S.; Kalkofen, D.; Sandor, C. Visualization in mixed reality environments. In Proceedings of the 2011 10th IEEE International
Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; p. 1. [CrossRef]
11. Martins, N.C.; Marques, B.; Alves, J.; Araújo, T.; Dias, P.; Santos, B.S. Augmented Reality Situated Visualization in Decision-
Making. Multimed. Tools Appl. 2022, 81, 14749–14772. [CrossRef]
12. Chen, K.; Chen, W.; Li, C.; Cheng, J. A BIM-based location aware AR collaborative framework for facility maintenance
management. Electron. J. Inf. Technol. Constr. 2019, 24, 360–380.
13. Ma, N.; Liu, Y.; Qiao, A.; Du, J. Design of Three-Dimensional Interactive Visualization System Based on Force Feedback Device.
In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18
May 2008; pp. 1780–1783. [CrossRef]
14. Han, W.; Schulz, H.J. Exploring Vibrotactile Cues for Interactive Guidance in Data Visualization. In Proceedings of the
13th International Symposium on Visual Information Communication and Interaction, VINCI ’20, New York, NY, USA, 8–10
December 2020. [CrossRef]
15. Su, Y.P.; Chen, X.Q.; Zhou, C.; Pearson, L.H.; Pretty, C.G.; Chase, J.G. Integrating Virtual, Mixed, and Augmented Reality into
Remote Robotic Applications: A Brief Review of Extended Reality-Enhanced Robotic Systems for Intuitive Telemanipulation and
Telemanufacturing Tasks in Hazardous Conditions. Appl. Sci. 2023, 13, 12129. [CrossRef]
16. Azuma, R.T. A Survey of Augmented Reality. Presence Teleoper. Virtual Environ. 1997, 6, 355–385. [CrossRef]
17. Tarng, W.; Tseng, Y.C.; Ou, K.L. Application of Augmented Reality for Learning Material Structures and Chemical Equilibrium in
High School Chemistry. Systems 2022, 10, 141. [CrossRef]
18. Gavish, N. The Dark Side of Using Augmented Reality (AR) Training Systems in Industry. In Systems Engineering in the Fourth
Industrial Revolution: Big Data, Novel Technologies, and Modern Systems Engineering; Wiley Online Library: Hoboken, NJ, USA, 2020;
pp. 191–201. [CrossRef]
19. Wu, H.K.; Lee, S.W.Y.; Chang, H.Y.; Liang, J.C. Current status, opportunities and challenges of augmented reality in education.
Comput. Educ. 2013, 62, 41–49. [CrossRef]
20. Akcayr, M.; Akcayır, G. Advantages and challenges associated with augmented reality for education: A systematic review of the
literature. Educ. Res. Rev. 2017, 20, 1–11. [CrossRef]
21. Nishimoto, A.; Johnson, A.E. Extending Virtual Reality Display Wall Environments Using Augmented Reality. In Proceedings of
the Symposium on Spatial User Interaction, SUI ’19, New York, NY, USA, 19–20 October 2019. [CrossRef]
22. Liu, B.; Tanaka, J. Virtual Marker Technique to Enhance User Interactions in a Marker-Based AR System. Appl. Sci. 2021, 11, 4379.
[CrossRef]
23. Gao, Q.H.; Wan, T.R.; Tang, W.; Chen, L. A Stable and Accurate Marker-Less Augmented Reality Registration Method. In
Proceedings of the 2017 International Conference on Cyberworlds (CW), Chester, UK, 20–22 September 2017; pp. 41–47. [CrossRef]
24. Ye, H.; Leng, J.; Xiao, C.; Wang, L.; Fu, H. ProObjAR: Prototyping Spatially-Aware Interactions of Smart Objects with AR-HMD.
In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 23–28
April 2023. [CrossRef]
25. Al-Ansi, A.M.; Jaboob, M.; Garad, A.; Al-Ansi, A. Analyzing augmented reality (AR) and virtual reality (VR) recent development
in education. Soc. Sci. Humanit. Open 2023, 8, 100532. [CrossRef]
26. Goh, E.S.; Sunar, M.S.; Ismail, A.W. Tracking Techniques in Augmented Reality for Handheld Interfaces. In Encyclopedia of
Computer Graphics and Games; Lee, N., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–10. [CrossRef]
Systems 2024, 12, 7 24 of 24
27. Moro, M.; Marchesi, G.; Hesse, F.; Odone, F.; Casadio, M. Markerless vs. Marker-Based Gait Analysis: A Proof of Concept Study.
Sensors 2022, 22, 2011. [CrossRef]
28. Zhang, Z.; Wen, F.; Sun, Z.; Guo, X.; He, T.; Lee, C. Artificial Intelligence-Enabled Sensing Technologies in the 5G/Internet of
Things Era: From Virtual Reality/Augmented Reality to the Digital Twin. Adv. Intell. Syst. 2022, 4, 2100228. [CrossRef]
29. Syed, T.A.; Siddiqui, M.S.; Abdullah, H.B.; Jan, S.; Namoun, A.; Alzahrani, A.; Nadeem, A.; Alkhodre, A.B. In-Depth Review of
Augmented Reality: Tracking Technologies, Development Tools, AR Displays, Collaborative AR, and Security Concerns. Sensors
2023, 23, 146. [CrossRef]
30. Khurshid, A.; Grunitzki, R.; Estrada Leyva, R.G.; Marinho, F.; Matthaus Maia Souto Orlando, B. Hand Gesture Recognition for
User Interaction in Augmented Reality (AR) Experience. In Virtual, Augmented and Mixed Reality: Design and Development; Chen,
J.Y.C., Fragomeni, G., Eds.; Springer: Cham, Switzerland, 2022; pp. 306–316.
31. Aouam, D.; Benbelkacem, S.; Zenati, N.; Zakaria, S.; Meftah, Z. Voice-based Augmented Reality Interactive System for Car’s
Components Assembly. In Proceedings of the 2018 3rd International Conference on Pattern Analysis and Intelligent Systems
(PAIS), Tebessa, Algeria, 24–25 October 2018; pp. 1–5. [CrossRef]
32. Kaimoto, H.; Monteiro, K.; Faridan, M.; Li, J.; Farajian, S.; Kakehi, Y.; Nakagaki, K.; Suzuki, R. Sketched Reality: Sketching
Bi-Directional Interactions Between Virtual and Physical Worlds with AR and Actuated Tangible UI. In Proceedings of the
35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA, 29 October–2
November 2022. [CrossRef]
33. Ishii, H.; Ullmer, B. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of the ACM
SIGCHI Conference on Human Factors in Computing Systems, CHI ’97, New York, NY, USA, 22–27 March 1997; pp. 234–241.
[CrossRef]
34. Löffler, D.; Tscharn, R.; Hurtienne, J. Multimodal Effects of Color and Haptics on Intuitive Interaction with Tangible User
Interfaces. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, TEI ’18,
New York, NY, USA, 18–21 March 2018; pp. 647–655. [CrossRef]
35. Shaer, O.; Hornecker, E. Tangible User Interfaces: Past, Present, and Future Directions. Found. Trends Hum.-Comput. Interact. 2010,
3, 1–137. [CrossRef]
36. Zuckerman, O.; Gal-Oz, A. To TUI or not to TUI: Evaluating performance and preference in tangible vs. graphical user interfaces.
Int. J. Hum.-Comput. Stud. 2013, 71, 803–820. [CrossRef]
37. Baykal, G.; Alaca, I.V.; Yantaç, A.; Göksun, T. A review on complementary natures of tangible user interfaces (TUIs) and early
spatial learning. Int. J. Child-Comput. Interact. 2018, 16, 104–113. [CrossRef]
38. He, F.; Hu, X.; Shi, J.; Qian, X.; Wang, T.; Ramani, K. Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in
Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York,
NY, USA, 23–28 April 2023. [CrossRef]
39. Unity. Vuforia SDK Overview. Available online: https://docs.unity3d.com/2018.4/Documentation/Manual/vuforia-sdk-
overview.html. (accessed on 13 November 2023).
40. Filonik, D.; Buchan, A.; Ogden-Doyle, L.; Bednarz, T. Interactive Scenario Visualisation in Immersive Virtual Environments for
Decision Making Support. In Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum
and Its Applications in Industry, VRCAI ’18, New York, NY, USA, 2–3 December 2018. [CrossRef]
41. Prange, S.; Shams, A.; Piening, R.; Abdelrahman, Y.; Alt, F. PriView—Exploring Visualisations to Support Users’ Privacy
Awareness. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA,
8–13 May 2021. [CrossRef]
42. Bermejo Fernandez, C.; Lee, L.H.; Nurmi, P.; Hui, P. PARA: Privacy Management and Control in Emerging IoT Ecosystems Using
Augmented Reality. In Proceedings of the 2021 International Conference on Multimodal Interaction, ICMI ’21, New York, NY,
USA, 18–22 October 2021; pp. 478–486. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.