0% found this document useful (0 votes)
20 views

N0410010204

This study presents deep reinforcement learning methods aimed at suppressing the horizontal sway of construction cranes in a virtual environment, addressing safety concerns in crane operations. The research analyzes sample efficiency in reinforcement learning using Proximal Policy Optimization and Generative Adversarial Imitation Learning techniques, demonstrating improved learning performance. The findings indicate that the applied reinforcement learning techniques effectively enhance sample efficiency for crane sway control.

Uploaded by

Sung Woo Shin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

N0410010204

This study presents deep reinforcement learning methods aimed at suppressing the horizontal sway of construction cranes in a virtual environment, addressing safety concerns in crane operations. The research analyzes sample efficiency in reinforcement learning using Proximal Policy Optimization and Generative Adversarial Imitation Learning techniques, demonstrating improved learning performance. The findings indicate that the applied reinforcement learning techniques effectively enhance sample efficiency for crane sway control.

Uploaded by

Sung Woo Shin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Journal of Construction Automation and Robotics pISSN: 2800-0552, eISSN: 2951-116X

Vol. 1 No. 2, pp. 19-24 / July, 2022 DOI: https://doi.org/10.55785/JCAR.1.2.19

가상물리환경에서의 건설 크레인 수평 흔들림 억제를 위한


심층 강화학습 기법
Deep Reinforcement Learning Methods for Suppressing Horizontal Sway of
Construction Cranes in Virtual Physical Environment

김인원1 ․ 김남균2 ․ 정민혁3 ․ 안창범4 ․ 박문서5


Kim, In Won1 ․ Kim, Nam Kyoun2 ․ Jung, Min Hyuk3 ․ Ahn, Chang Bum4 ․ Park, Moon Seo5

Received June 27, 2022 빳 Revised July 8, 2022 빳 Accepted July 8, 2022

ABSTRACT
In the development of a deep reinforcement learning-based autonomous operation model of cranes, the control of heavy object’s horizontal sway is an
important issue that directly affects crane operation safety. In kinematics, however, the motion control of a heavy object with pendulum motion is
classified as an underactuated system in which the degree of freedom of movement of objects is larger than the number of manipulable actions of
controllers. This increases the variance of rewards expected from action and state samples in reinforcement learning, and raises the problem of sample
efficiency, which means the number of samples effective for learning. Therefore, this study analyzes the sample efficiency that occurs when learning the
reinforcement learning model for sway control of cranes using Proximal Policy Optimization and Generative Adversarial Imitation Learning (GAIL)
techniques. To this end, this study established a virtual physical environment capable of simulating the movement of a construction crane, and expert
demonstration data samples were collected for GAIL. Finally, the effect of PPO and GAIL on sample efficiency was analyzed through the experiment. The
results show that the reinforcement learning technique applied to the experiment is effective in improving the sample efficiency and learning
performance of the crane model.
Keyword : Autonomous Crane, Deep Reinforcement Learning, Sample Efficiency, Generative and Adversarial Imitation Learning

1. 서 론

(underactuated system) ,
.
, .
(Wu and Xia, 2014) , (Sawodny et al., 2002).

(Fang and Cho, 2017; Ramli et al., 2017). (Deep Reinforcement Learning, DRL)

1
서울대학교 대학원 건축학과 석사과정(Master’s Student, Department of Architecture and Architectural Engineering, Seoul National University, inwon33@
snu.ac.kr)
2
서울대학교 대학원 건축학과 박사과정(Ph. D. Student, Department of Architecture and Architectural Engineering, Seoul National University, dewichon@
naver.com)
3
교신저자 ․ 서울대학교 건축학과 연구교수(Corresponding Author, Research Professor, Department of Architecture and Architectural Engineering, Seoul
National University, [email protected])
4
서울대학교 건축학과 교수(Professor, Department of Architecture and Architectural Engineering, Seoul National University, [email protected])
5
서울대학교 건축학과 교수(Professor, Department of Architecture and Architectural Engineering, Seoul National University, [email protected])

Copyright © 2022 Korean Society of Automation and Robotics in Construction. This is an Open Access article distributed under the terms of the Creative
Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use,
distribution, and reproduction in any medium, provided the original work is properly cited.

| Vol.1 No.2 | 19
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서

(Zhao et al., 2020; Sallab et al., 2017). 75


SNUCG , , ,
(sample efficiency) Multi-Sensory
. . Rong et al.(2020) End-to-End
LGSVL
(Botvinick et al., 2019), .
(Yu, 2018; Kiran et al., 2021).

,
.
,
, (Proximal Policy Op- .
timization, PPO) (Inverse Reinforcement Learn- Unity ML-Agents
ing, IRL) (Generative Adver- (Juliani et al., 2018),
sarial Imitation Learning, GAIL) .
.
2.2 강화학습에서의 표본 효율성
2. 선행연구 분석
.
2.1 가상물리환경 기반 강화학습
,
, 3 3
.
(Shah et al., 2018). 2 ( / , / )
,
, (Sawodny et al., 2002).
(Matsumoto et al., 2020).
. ,
,
. (Yang et al., 2019).
Alex Dosovitskiy(2017) Unreal Engine 4 , ( : )
CARLA ( : ) ( :
. , , )
Classical .
Module Pipeline, End-to-End, 3 -
,
. Savva et al.(2017) .
MINOS

20 | Korean Society of Automation and Robotics in Construction |


가상물리환경에서의 건설 크레인 수평 흔들림 억제를 위한 심층 강화학습 기법

. ,

2.3 표본 효율성 향상을 위한 모방학습


(Imitation Learning)
(Kiran et al., 2021).
,

(Kober and Peters, 2010).


Figure 1. Parameters of the training environment and the crane model

(behavior cloning) 3. DRL 모형 개발

. 3.1 가상물리환경에서의 크레인 및 작업 정의


(expert demon-
stration data) , Fig. 1 .
.
.
, .
,
(Bhattacharyya et al., 2020). . ,
GAIL .
(Ho .
and Ermon, 2016). GAIL - () , () ()

. . ()

, 15° , ()
.
. .

GAIL
: 1) (hook) () , 2)

. , 3) .
,
. 3.2 상태 공간 및 보상함수
(State space)
. . Table 1
GAIL .

. () () ()

| Vol.1 No.2 | 21
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서

Table 1. State space and reward function

Position    ,   
State space
Sway  , 
+1.0: distance  and    &   
Sparse
-1.0: distance  and   
Reward function +0.15: distance  and   
Dense -0.2: distance  and    (after agent got +0.15)
-1.0/maxstep at each step

, (), () 3.3 정책 네트워크


.
10 . .
(Reward function) , 256 2
.
Table 1 . . (PPO)
(Schulman et al., 2017).
() () ()
, () () 4. 실 험
. .
4.1 실험 설정
  PPO GAIL
     cos          (1)

GAIL
.
(m: , g: , l: )
Fig. 2
, ()
90 50,000 .
. GAIL
, Table 1 , PPO GAIL
(sparse reward) . 0.25, 0.5, 0.75, 1.0
4 .
.
,  4.2 결과 및 논의
(dense reward) Fig. 3 ,
. () ()  (time step )
,  . PPO (Default)
. GAIL
, .
. Default
(max step) , ,
1,000 .

22 | Korean Society of Automation and Robotics in Construction |


가상물리환경에서의 건설 크레인 수평 흔들림 억제를 위한 심층 강화학습 기법

Figure 2. Training environment to collect experts’ demonstrations: (1)front view, (2)trolley top view, (3)side view, and (4)rear view

5. 결 론

DRL

.
GAIL

Figure 3. (a) learning curves and (b) episode length of each model
.
. GAIL GAIL
. .
.
.
1~2M
1.0 GAIL 1.0 . ,
, 3M .
GAIL 0.25 GAIL 0.5 ( )
. .
,
.

| Vol.1 No.2 | 23
김인원 ․ 김남균 ․ 정민혁 ․ 안창범 ․ 박문서

In ISARC. Proceedings of the International Symposium on Auto-


감사의 글
mation and Robotics in Construction, IAARC Publications, 37, pp.
457-464.
/ Ramli, L., Mohamed, Z., Abdullahi, A. M., Jaafar, H. I., and Lazim, I. M.
( : 21CTAP-C163785-01). (2017). Control strategies for crane systems: A comprehensive
review. Mechanical Systems and Signal Processing, 95, pp. 1-23.
Rong, G., Shin, B.H., Tabatabaee, H., Lu, Q., Lemke, S., Možeiko, M.,
References Boise, E., Uhm, G., Gerow, M., Mehta, S., and Agafonov, E. (2020).
Lgsvl simulator: A high fidelity simulator for autonomous driving.
In 2020 IEEE 23rd International conference on intelligent trans-
Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., and
portation systems (ITSC) (pp. 1-6), IEEE.
Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends
Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017). Deep
in Cognitive Sciences, 23(5), pp. 408-422.
reinforcement learning framework for autonomous driving.
Bhattacharyya, R., Wulfe, B., Phillips, D., Kuefler, A., Morton, J., Senanayake,
Electronic Imaging, 19, pp. 70-76.
R., and Kochenderfer, M. (2020). Modeling human driving behavior
Savva, M., Chang, A. X., Dosovitskiy, A., Funkhouser, T., and Koltun, V.
through generative adversarial imitation learning. arXiv preprint
(2017). MINOS: Multimodal indoor simulator for navigation in
arXiv:2006.06412.
complex environments. arXiv preprint arXiv:1712.03931.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V. (2017).
Sawodny, O., Aschemann, H., and Lahres, S. (2002) An automated
CARLA: An open urban driving simulator. In Conference on Robot
gantry crane as a large workspace robot. Control Engineering
Learning, (pp. 1-16), PMLR.
Practice, 10(12), pp. 1323-1338.
Fang, Y., and Cho, Y. K. (2017). Effectiveness analysis from a cognitive
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O.
perspective for a real-time safety assistance system for mobile
(2017). Proximal policy optimization algorithms. arXiv preprint
crane lifting operations. Journal of Construction Engineering and
arXiv:1707.06347.
Management, 143(4), 05016025.
Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2018). Airsim: High-fidelity
Ho, J., and Ermon, S. (2016). Generative adversarial imitation learning.
visual and physical simulation for autonomous vehicles. In Field
Advances in Neural Information Processing Systems, 29.
and Service Robotics (pp. 621-635), Springer, Cham.
Juliani, A., Berges, V. P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy,
Wu, Z., and Xia, X. (2014). Optimal motion planning for overhead
C., Gao, Y., Henry, H., Mattar, M., and Lange, D. (2018). Unity: A
cranes. IET Control Theory & Applications, 8(17), pp. 1833-1842.
general platform for intelligent agents. arXiv preprint arXiv:1809.
Yang, R., Jiang, C., Miao, Y., Ma, J., Zhang, X., Yang, T., and Sun, N.
02627.
(2019). A flexible rope crane experiment system. Applications of
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani,
Modeling and Simulation, 3(1), pp. 11-17.
S., and Pérez, P. (2021). Deep reinforcement learning for autono-
Yu, Y. (2018). Towards Sample Efficient Reinforcement Learning. In
mous driving: A survey, IEEE Transactions on Intelligent Transpor-
IJCAI (pp. 5739-5743).
tation Systems.
Zhao, W., Queralta, J. P., and Westerlund, T. (2020). Sim-to-real transfer
Kober, J., and Peters, J. (2010). Imitation and reinforcement learning.
in deep reinforcement learning for robotics: a survey. In 2020 IEEE
IEEE Robotics & Automation Magazine, 17(2), pp. 55-62.
Symposium Series on Computational Intelligence (SSCI) (pp. 737-744),
Matsumoto, K., Yamaguchi, A., Oka, T., Yasumoto, M., Hara, S., Iida,
IEEE.24.
M., and Teichmann, M. (2020). Simulation-based Reinforcement
Learning Approach towards Construction Machine Automation,

요 지

핵심용어 :

24 | Korean Society of Automation and Robotics in Construction |

You might also like