23.adaptive Projection AR With Object Recognition Based On Deep Learning
23.adaptive Projection AR With Object Recognition Based On Deep Learning
Figure 1: Adaptive Projection AR system scenarios (a) Our hardware and Window recognition: Weather broadcast. (b) Clock
recognition: To do list UI. (c) Potted plant recognition: Water the pot. (d) Sports ball recognition: Shooting game.
51
IUI ’19 Companion, March 17–20, 2019, Marina del Rey, CA, USA Yoon Jung Park et al.
2 OBJECT ADAPTIVE PROJECTION AR 2.4.2 Spatial User Interaction. To utilize a real-world space-without
SYSTEM touch sensors-touch interactions are realized using a depth recogni-
tion camera. The Wilson method was applied for the touch detection
2.1 Hardware system of user and projected space[6]. Depth data is captured using a depth
Systems are needed to cover the entire user’s surroundings, and camera. When a user interacts within the defined threshold, it rec-
there must be recognition of the user and space. For this, a 360- ognizes touch. Through such a spatial interface, it is possible to
degree controlled system consists of a depth camera and a projector account for interactions of the user and the projected data.
attached to the pan-tilt system with two servo motors. To drive the
pan-tilt system, Arduino has been attached to control the two servo 3 SCENARIO
motors. The system was also given location mobility through the The proposed system can operate as an adaptive information de-
installation of a portable wheel-based stand. livery tool. As a first scenario, when a user interacts with a win-
dow(Figure 1-(a)), the weather is sent to the surface that is most
2.2 Construction AR environment optimal for delivery and close to the user. For the second scenario,
To deliver projected AR data to an un-predefined environment, the smart assistant can provide important events or plans to the user
a 3D map is constructed. After extracting features from a recog- through the interaction between the user and a clock(Figure 1-(b))
nized image, a feature matching method[4] was applied, finding or a potted plant(Figure 1-(c)). The last scenario involves delivering
the pose by comparing features with those from a previous frame. entertainment contents to the user. If a user touches or grabs a ball,
In this study, in order to determine the projected matrix, surface the system can project a target board in various places within the
feature detection was used to extract meaningful data from the 3D space, thus creating an aiming and throwing game(Figure 1-(d)).
environment point cloud. Using the constructed 3D map, potential
projection locations are recommended, and (after enhancement)
4 CONCLUSION
data is delivered to users. Surface domains were extracted using Through the suggested AR system, it is possible to achieve 360-
the RHT algorithm based on the segmentation. Then, from the degree reconstruction of the user’s surroundings, together with
extracted surface, the maximum projection domains were extracted. deep-learning based object recognition, thereby delivering proper
The final location was determined by minimizing the projection data, contents, and user interactions in real-time. Through object
distortion with perpendicular domains of the surface domains. Co- recognition based on deep learning, it is possible to deliver highly
ordinates of the selected optimal surfaces are kept and are used relevant data to real-life objects. Spatial interactions were con-
to determine the most optimal surface with the analysis of user structed and provided for intuitive interactions with the system.
location and interactions with objects. Through this, foundations that can bring a pervasive AR environ-
ment closely into real life can be arranged. In the future, seamless de-
livery based on user movement recognition and context-awareness
2.3 Deep learning based Object recognition techniques must be designed, which can be substituted for inter-
Using the proposed hardware system, it is possible to deliver AR actions between the user and real-world objects. Also, through de-
projected space into the real-world through the construction of 3D signing and constructing more various interaction methods, other
spatial data. However, there are a lot of difficulties when it comes scenarios are intended to be designed and verified besides the sug-
to delivering appropriate and proper information, contents, and gested scenario.
UI to real-life objects through AR projection outside the research
settings that have not been pre-defined. Deep-learning-based object ACKNOWLEDGMENTS
recognition aims to overcome this drawback. In the case of this sug- This work was supported by the National Research Foundation of
gested research, YOLOv3[3] is applied to the COCO and the Open Korea(NRF) grant funded by the Korea government(MSIP) (No.NRF-
Images dataset, which holds a quick process time is used to deliver 2018R1A2A1A05078628).
proper real-time information to real-world objects. It recognizes
objects instantaneously with the RGB-D camera attached to the REFERENCES
hardware, selects proper data about objects in contact with the user, [1] Andreas Rene Fender, Hrvoje Benko, and Andy Wilson. 2017. MeetAlive: Room-
and delivers it to the most optimal surface through projection. Scale Omni-Directional Display System for Multi-User Content and Control Shar-
ing. In Proceedings of the 2017 ACM International Conference on Interactive Surfaces
and Spaces. ACM, 106–115.
[2] Willow Garage. 2017. Ork: Object recognition kitchen. https://wg-perception.
2.4 Interaction github.io/object_recognition_core/
2.4.1 Object-User Interaction. It must deliver the most relevant [3] Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement.
arXiv (2018).
information and interface with the real-life objects through interac- [4] Thomas Whelan, Michael Kaess, Hordur Johannsson, Maurice Fallon, John J
tions between the user and real-life objects. Hand detection is made Leonard, and John McDonald. 2015. Real-time large-scale dense RGB-D SLAM
possible by obtaining data about the center and end of the user’s with volumetric fusion. The International Journal of Robotics Research 34, 4-5
(2015), 598–626.
hand through Kinect SDK. This enables the delivery of data and in- [5] Andrew Wilson, Hrvoje Benko, Shahram Izadi, and Otmar Hilliges. 2012. Steerable
teractions, which is based on the bounding box (object recognition) augmented reality with the beamatron. In Proceedings of the 25th annual ACM
symposium on User interface software and technology. ACM, 413–422.
and the presence of an interaction (the user’s hand). In cases of the [6] Andrew D Wilson. 2010. Using a depth camera as a touch sensor. In ACM interna-
presence of interactions between the user and objects, it provides tional conference on interactive tabletops and surfaces. ACM, 69–72.
proper information, user interface, and contents to the object.
52