N0410010204
N0410010204
Received June 27, 2022 빳 Revised July 8, 2022 빳 Accepted July 8, 2022
ABSTRACT
In the development of a deep reinforcement learning-based autonomous operation model of cranes, the control of heavy object’s horizontal sway is an
important issue that directly affects crane operation safety. In kinematics, however, the motion control of a heavy object with pendulum motion is
classified as an underactuated system in which the degree of freedom of movement of objects is larger than the number of manipulable actions of
controllers. This increases the variance of rewards expected from action and state samples in reinforcement learning, and raises the problem of sample
efficiency, which means the number of samples effective for learning. Therefore, this study analyzes the sample efficiency that occurs when learning the
reinforcement learning model for sway control of cranes using Proximal Policy Optimization and Generative Adversarial Imitation Learning (GAIL)
techniques. To this end, this study established a virtual physical environment capable of simulating the movement of a construction crane, and expert
demonstration data samples were collected for GAIL. Finally, the effect of PPO and GAIL on sample efficiency was analyzed through the experiment. The
results show that the reinforcement learning technique applied to the experiment is effective in improving the sample efficiency and learning
performance of the crane model.
Keyword : Autonomous Crane, Deep Reinforcement Learning, Sample Efficiency, Generative and Adversarial Imitation Learning
1. 서 론
(underactuated system) ,
.
, .
(Wu and Xia, 2014) , (Sawodny et al., 2002).
(Fang and Cho, 2017; Ramli et al., 2017). (Deep Reinforcement Learning, DRL)
1
서울대학교 대학원 건축학과 석사과정(Master’s Student, Department of Architecture and Architectural Engineering, Seoul National University, inwon33@
snu.ac.kr)
2
서울대학교 대학원 건축학과 박사과정(Ph. D. Student, Department of Architecture and Architectural Engineering, Seoul National University, dewichon@
naver.com)
3
교신저자 ․ 서울대학교 건축학과 연구교수(Corresponding Author, Research Professor, Department of Architecture and Architectural Engineering, Seoul
National University, [email protected])
4
서울대학교 건축학과 교수(Professor, Department of Architecture and Architectural Engineering, Seoul National University, [email protected])
5
서울대학교 건축학과 교수(Professor, Department of Architecture and Architectural Engineering, Seoul National University, [email protected])
Copyright © 2022 Korean Society of Automation and Robotics in Construction. This is an Open Access article distributed under the terms of the Creative
Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use,
distribution, and reproduction in any medium, provided the original work is properly cited.
| Vol.1 No.2 | 19
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서
,
.
,
, (Proximal Policy Op- .
timization, PPO) (Inverse Reinforcement Learn- Unity ML-Agents
ing, IRL) (Generative Adver- (Juliani et al., 2018),
sarial Imitation Learning, GAIL) .
.
2.2 강화학습에서의 표본 효율성
2. 선행연구 분석
.
2.1 가상물리환경 기반 강화학습
,
, 3 3
.
(Shah et al., 2018). 2 ( / , / )
,
, (Sawodny et al., 2002).
(Matsumoto et al., 2020).
. ,
,
. (Yang et al., 2019).
Alex Dosovitskiy(2017) Unreal Engine 4 , ( : )
CARLA ( : ) ( :
. , , )
Classical .
Module Pipeline, End-to-End, 3 -
,
. Savva et al.(2017) .
MINOS
. ,
. . ()
, 15° , ()
.
. .
GAIL
: 1) (hook) () , 2)
. , 3) .
,
. 3.2 상태 공간 및 보상함수
(State space)
. . Table 1
GAIL .
| Vol.1 No.2 | 21
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서
Position ,
State space
Sway ,
+1.0: distance and &
Sparse
-1.0: distance and
Reward function +0.15: distance and
Dense -0.2: distance and (after agent got +0.15)
-1.0/maxstep at each step
Figure 2. Training environment to collect experts’ demonstrations: (1)front view, (2)trolley top view, (3)side view, and (4)rear view
5. 결 론
DRL
.
GAIL
Figure 3. (a) learning curves and (b) episode length of each model
.
. GAIL GAIL
. .
.
.
1~2M
1.0 GAIL 1.0 . ,
, 3M .
GAIL 0.25 GAIL 0.5 ( )
. .
,
.
| Vol.1 No.2 | 23
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서
요 지
핵심용어 :