Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning

This document discusses the use of Deep Reinforcement Learning techniques to develop near-optimal autonomous agents for the Open Racing Car Simulator (TORCS). It details the methodologies employed, including Soft Actor-Critic and Rainbow DQN algorithms, and explores various reward shaping, exploration, and generalization strategies. The study emphasizes the importance of simulation in training self-driving cars before real-world implementation.

Uploaded by

mxrv

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning

Uploaded by

mxrv

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Autonomous Car Racing in Simulation Environment

Using Deep Reinforcement Learning

Kıvanç Güçkıran1 , Bülent Bolat2

1
Electronic and Communication Engineering Department, Yildiz Technical University, Istanbul, Turkey
[email protected]
2
Electronic and Communication Engineering Department, Yildiz Technical University, Istanbul, Turkey
[email protected]

Abstract—Self-Driving Cars are, currently a hot topic

throughout the globe thanks to the advancements in Deep
Learning techniques on computer vision problems. Since driving
simulations are fairly important before real life autonomous
implementations, there are multiple driving-racing simulations
for testing purposes. The Open Racing Car Simulation (TORCS)
is a highly portable open source car racing -self-driving- simu-
lation. While it can be used as a game in which human players
compete with scripted agents, TORCS provides observation and
action API to develop an artificial intelligence agent. This study
explores near-optimal Deep Reinforcement Learning agents for
TORCS environment using Soft Actor-Critic and Rainbow DQN
algorithms, exploration and generalization techniques.
Keywords—Deep Reinforcement Learning, TORCS, Self-
Driving Car

I. I NTRODUCTION
Self-driving cars are one of the most important and promis- Figure 1: Example race
ing assets of our time. It has been a challenge and inspiration
for researchers and engineers throughout the world for decades.
With the introduction of Deep Learning practices and computer
vision techniques, autonomous vehicles in the near future is became accessible. In this study, we will try to find a near-
not a dream [1]. The development cycle of the self-driving optimal driver for TORCS environment using Deep Reinforce-
cars must start from simulations since creating driving data ment Learning techniques. The rest of the paper is organized
for training would be inefficient, risky and time-consuming. as follows. In Section II, information regarding our approaches
TORCS, The Open Racing Car Simulator, is open, flexible is given. Our methodology, algorithms and strategies we have
and has a portable interface for AI development [2]. Example used are explained in Section III. In Section IV, results of
race is shown in Figure 1. our approaches are given and the last section has concluding
remarks.
Achieving autonomous agents is a very complicated task.
There are multiple practices in machine learning to train
II. P RELIMINARIES
agents to learn and act. First one is supervised learning, in
which the dataset contains the data and the ground truth Reinforcement Learning (RL) is a goal-oriented machine
labels, agents will try to predict true labels. Second one learning practice which tries to maximize the agent’s cumu-
is unsupervised learning, instead of having labels, this time lative rewards. The reward is given when an agent behaves
only data is provided and the agents need to correlate and towards the goal or achieves the goal. Negative reward as
group the data together. The last one, which is our method, punishment is also plausible in some cases.
reinforcement learning, creates its own data by interacting with
the environment [3]. This is the best-suited approach since RL agents also receive observations/states from environ-
most of the time there will not be any ground truth actions for ment and acts upon them. This cycle goes on until an optimal
agents to learn. agent is found. Figure 2 depicts this behaviour. This behavior is
formulated in Markov Decision Process and MDPs are defined
Latest improvements in Deep Learning affected every area with five components, which are:
of computation, and one of the areas affected is Reinforcement
Learning. With the GPUs’ huge performance increase, Rein- • S: State space
forcement Learning practices with Deep Learning techniques
• A: Action space
• R: Reward function
978-1-7281-2868-9/19/$31.00 c 2019 IEEE
that value functions become unstable. To prevent this, SAC
utilizes a target value network and updates it via Polyak
averaging [7].
Q values are the policy’s target density function. To achieve
differentiation for the policy updates, the re-parameterization
trick is used. Overall, Soft Actor-Critic with Maximum En-
tropy Learning is a data efficient and stable algorithm.
Figure 2: Reinforcement Learning Setting [3]
B. Rainbow DQN

• P : Transition function Rainbow DQN [8] utilizes multiple improvements on DQN

[9] together. These improvements are;
• γ: Discount factor
• Double Q Learning [6] - This method is used to
State space depends on the environment. All possible
overcome overestimation problem on Q networks.
agent’s perception of the environment forms the state space.
Similar to state space, action space defined as all possible ac- • Priority Experience Replay [10] - Experiences for up-
tions that agents can use and utilize. Reward function depends dates are picked with a priority. Mostly used parameter
on state and action and defines when and how much reward for priority is TD error.
is received with a state-action couple. Transition function
addresses which state is transitioned to, after an action is taken • Dueling Networks [11] - Sometimes, choosing the
in a state. The last one is the discount factor, which calculates exact action does not matter too much, but the value
how much an agent takes future rewards into consideration. function estimation is still important. This method
guarantees value calculation in all cases.
When MDP is known, there are common practices as
Dynamic Programming to solve MDP via visiting all state • Noisy Networks [12] - Exploration in environments
action pairs recursively [4]. On the other hand, when MDP with Q learning mostly depends on the action-
is not known to the agent, Reinforcement Learning practices exploration with epsilon-greedy methods. Noisy Net-
are used. In RL practices trajectories are sampled from envi- works introduces the capability of parameter space
ronment and agents learn from them. There are mainly two exploration. Parameter change drives state and action
approaches to Reinforcement Learning problems, value-based exploration indirectly.
methods, and policy-based methods. There are also, hybrid
methods like Actor-Critic, which in addition to policies, value These algorithms together form the Rainbow DQN ap-
networks are also present within the algorithms. proach. We have also used C51 output to obtain further
Value-based methods define their policy via acting greedily improvements on Q value distribution [13].
to the value function and this way, it uses its current knowledge
on the environment. This leads to sub-optimal policies since
agents need to explore new trajectories to obtain optimality. C. TORCS
There are multiple approaches to this dilemma like epsilon-
greedy strategy, where sometimes agent acts randomly using TORCS provides an API for AI agents to act and learn
epsilon value. from. This API has several observations like angle, speed,
damage, gear, etc. We are using 6 observations with a total
Policy-based methods try to maximize their cumulative of 29 dimensions from API, which are:
reward, mapping states to actions. This time, exploration is
generally done via adding noise to actions. There are also • Angle - 1: Angle between the tangent of the track and
hybrid methods which utilize a value function within policy- the car
based methods. These methods are called Actor-Critic meth-
ods. • Track - 19: Lidar sensor on the front of the car
scanning 180 degrees
A. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Re-
inforcement Learning with a Stochastic Actor (SAC) • TrackPos - 1: Distance from the middle of the track,
greater than 0.5 if off the track
SAC uses a modified RL objective function using maxi-
mum entropy formulation [5]. The algorithm tries to maximize • Speed - 3: Cartesian speeds where the x-axis is always
entropy, in addition to policy updates. This way, the agent is pointing the front of the car
encouraged to explore unseen and unknown states.
• Wheel speeds - 4: Angular speeds(rad/s) for each
There are two Q networks to estimate expected rewards for wheel
policy updates. These two networks are used to minimize Q
value overestimating like Double Q learning [6]. Q function is • RPM - 1: Engine speed
trained with another V function and since these two networks
are co-dependent on each other, there is a significant chance For actions, we are using acceleration, brake and steer.
III. M ETHODOLOGY • Tau: 10−3
This section emphasizes on our implementation of the stud- • N-Step Weight Parameter: 1
ies explained beforehand. First, architectures for the algorithms
• N-Step Q Regularization Parameter: 10−7
used in this study will be explained, then reward shaping and
termination topics will be detailed, and lastly, exploration, • Buffer Size: 105
generalization and environmental changes will be explained
thoroughly. Our codebase and implementation can be found at • Batch Size: 32
https://github.com/kivancguckiran/torcs-rl-agent. • Learning Rate: 10−4
• Adam Epsilon: 10−8
A. Architecture
• Adam Weight Decay: 10−7
1) SAC: Each neural network architecture for SAC consists
of fully connected layers with 512, 256, and 128 weights • PER Alpha: 0.6
respectively as seen in Figure 3. We have used ReLU acti-
• PER Beta: 0.4
vations on hidden layers and Gaussian distribution on actions
with TanH activation on the output layer. We have observed • PER Epsilon: 10−6
improvements on SAC when we added a single LSTM layer
before the output layer. Hyperparameters for SAC-LSTM are • Gradient Clip: 10
below. These parameters are selected via trial and error using • Prefill Buffer Size: 104
a simple grid search.
• C51 - V Minimum: −300
• Gamma: 0.99
• C51 - V Maximum: 300
• Tau: 10−3 • C51 - Atom Size: 1530
• Batch Size: 32 • NoisyNet Initial Variance: 0.5
• Step Size: 16
B. Reward Shaping
• Episode Buffer: 103
Like many TORCS AI developers, we also have noticed
• Actor Learning Rate: 3.10−4 the fast left-right maneuvers (slaloming) on a straight track.
• Value Learning Rate: 3.10−4 We have tried multiple reward functions to stabilize the car.
In addition to this, when the agent steers off track and turns
• Q Learning Rate: 3.10−4 backward, environment resets. In this situation, agent does not
• Entropy Learning Rate: 3.10−4 try to recover from the state. We have also tried to recover
from this.
• Policy Update Interval: 2
1) Reward Functions: The parameters used in reward func-
• Initial Random Action: 104 tions are defined as,
We have implemented a custom buffer for the LSTM. Since • Vx : Longitudinal velocity
LSTM needs sequential samples, standard experience replay is
inappropriate. Thus, we buffer a whole episode sequentially • Vy : Lateral velocity
and select episodes randomly according to batch size on • θ: Angle between car and track axis
training time. Then, we randomly pick an index from each
episode and train with the following 16 samples. • trackpos: Distance between center of the road and car

We have used auto entropy tuning using log probabilities. Reward functions that we have tried are formulated below.
Before LSTM, we have also deployed NSTACK mechanism.
We serialized 4 past states and used as input. When we • No Trackpos: Track position is ignored.
noticed that LSTM outperforms NSTACK approach, we have Vx cos θ − |Vx sin θ|
abandoned it and continued with LSTM.
• Trackpos: Track position is taken into consideration.
2) Rainbow DQN: DQN network consists of three fully
conected layers of 128 weights each as seen in Figure 4. Vx cos θ − |Vx sin θ| − |Vx trackpos|
The activation functions of hidden layers are ReLU, like SAC
• EndToEnd [14]: Car’s angle as penalty is not used.
architecture, but outputs are 51 atom distribution over Q values,
namely C51. As stated before, we are using Noisy-Net for Vx (cos θ − |trackpos|)
exploration as opposed to the epsilon-greedy mechanism. Hy-
perparameters for Rainbow DQN are below. These parameters • DeepRLTorcs [15]: Track position penalty is dis-
are selected via trial and error using a simple grid search. counted with car’s angle. Car’s angle as penalty is
used. Additionally, lateral velocity as penalty is used
• N-Step: 3 with discounting towards car’s angle.
• Gamma: 0.99 Vx cos θ − |Vx sin θ| − |2Vx sin θtrackpos| − Vy cos θ
State and
State 512 256 128 LSTM Value 512 256 128 LSTM Value State 512 256 128 LSTM Action
Action

(a) SAC V Network (b) SAC Q Network (c) SAC Policy Network

Figure 3: SAC Architecture

1) Try-Brake: Try Brake mechanism is like Stochastic

Braking [15]. After a certain amount of timesteps, the agent
is forced to use brake 10% of the time, again for a certain
amount of time. This way we hope that agent will learn to
speed up in a straight track and brake before and while turns.
These trials are done according to a Gaussian distribution as
seen in Figure 5.

0.10

0.08
Brake Percentage

0.06
C51 Value
State 128 128 128
Distributions
0.04
Figure 4: DQN Architecture
0.02

0.00
• Sigmoid: Same as above reward function, only dif- 0 50000 100000 150000 200000
ference is rewards are flatlined at center to overcome Timestep
slaloming. Figure 5: Try-Brake Distribution
Vx sigmoid(3 cos θ) − Vx sin θ − Vy sigmoid(3 cos θ)

Our agent uses “DeepRLTorcs” reward function formulated D. Generalization

above. We did not see any improvements with “Sigmoid”
We have added nearly every road track to train and test on
function but, it looks promising to overcome slaloming on a
to generalize agent’s behavior on unseen tracks. This way, we
straight track since it soft clips the cosine of the longitudinal
try to prevent agent to overfit and memorize the tracks. We
velocity. Reward shaping with plateaus in the center of the
have avoided using Spring track since it is very long.
track might overcome slaloming in the future.
The list of tracks used for training and test are listed in the
2) Termination: The active episode is terminated if no first column of Table IV. Agents are trained in these tracks in
progress is achieved within 100 timesteps. Progress is defined a circular fashion. Each track is trained for 5 episodes then
as achieving 5km/h. Similar to this, the agent is provided with skips to next track.
an additional 100 timesteps to recover from turning backward.
This way we want to see agent try to get on track after spins.
E. Action Spaces
We have prepared optimal action spaces to make learning
C. Exploration easier and faster for the agent. Below are two of the imple-
mentations and explanations of the environments we have tried.
Exploration in this environment is done by maximizing
State and action values are normalized between −1 and +1.
entropy in SAC and NoisyNets in DQN algorithm. But learning
to utilize brake is a challenge since using the brake action 1) Continuous Action Space: In this environment, we have
decreases reward. This way agent avoids using the brake action reduced to action size to 2. First action value is used for
altogether. We have employed the Try-Brake mechanism to both accelerating and braking. Since the agent should not use
overcome this problem. them together, defining two actions for them is unnecessary.
Smaller values than zero are used for brake values and greater
values are used for accelerating. Second action value is used 180
for steering. We use this environment for SAC algorithm.
2) Discretized Action Space: Since our DQN algorithm is 160
suitable for discrete actions, we have discretized action space
into 21 actions. There are 7 steering points on 3 intervals. 140
The first interval is for accelerating and steering, the second
interval is only steering and the last interval is for braking and 120
steering. Actions are depicted in Table I.
100
Acceleration Brake Steer
+1 -1 {-1, -0.66, -0.33, 0, 0.33, 0.66, 1}
-1 +1 {-1, -0.66, -0.33, 0, 0.33, 0.66, 1} 80
-1 -1 {-1, -0.66, -0.33, 0, 0.33, 0.66, 1} Max Speed
Avg Speed
Table I: Discretized Actions 0 500 1000 1500 2000 2500 3000 3500 4000

Figure 7: Episode vs Speeds. Maximum and Standard Devia-

IV. R ESULTS tion can be seen in Table III.
We have achieved significant success with both of these
approaches. SAC-LSTM agent was more data-efficient and Type Standard Deviation Maximum
Max Speed 24.94 216.42
performant in terms of track consistency and speed. Episode Avg Speed 28.52 164.59
versus Score can be seen in Figure 6. Scores, which is the
accumulated rewards, are aggregated over 250 episodes for Table III: Standard Deviation and Min, Max values of Speeds
brevity.

Final agents were trained around 6.106 timesteps, 5.103

800 DQN episodes on hardware with Intel i9-9900k CPU and GeForce
SAC-LSTM RTX 2060 GPU.

600 V. C ONCLUSION
We have implemented two different algorithms for TORCS
Score

400 with great success. Agents complete tracks most of the time
around 140 km/h average speed and around 190 km/h max-
imum speed. It is observed that agents generalize well on
200 unseen tracks. Before Try-Brake implementation, the agent
did not use the brake action at all to prevent obtaining low
rewards. This seems to be fixed with the implementation of
0 the Try-Brake mechanism.
0 500 1000 1500 2000 2500 3000
Episode Another problem we have faced was fast left-right ma-
neuvers, also mentioned as slaloming, partly solved via re-
Figure 6: Episode vs Score. Maximum and Standard Deviation ward shaping and LSTM. We noticed that this problem is
can be seen in Table II. an issue of the reward function, thus better reward shaping
might overcome this in the near future. Since these maneuvers
happen frequently at high speeds, cases other than racing might
Algorithm Standard Deviation Maximum not face this problem. Race with SAC-LSTM agent against
DQN 306.25 1255.94 scripted bots and races between Rainbow DQN and SAC-
SAC-LSTM 433.22 1395.0
LSTM agents from our trained agent’s perspective can be
Table II: Standard Deviation and Maximum values of Scores viewed from https://youtu.be/f82EBvPKyDI.
We argue that the reasons behind SAC and SAC-LSTM
Further analysis will be done on the SAC-LSTM agent. performed better from Rainbow DQN algorithm are because of
Since this study is designed for a car race, speed is the crucial the exploration methods and the continuous action space. SAC
factor. Performance of the agent in terms of speeds versus tries to maximize entropy and this allows the agent to explore
episode can be seen in Figure 7. Values are aggregated over uncertain regions of the action space. Additionally, since
250 episodes. SAC’s policy network contains continuous actions, braking,
accelerating and steering can be controlled with continuous
As discussed before, there are multiple tracks in the envi- actions, unlike Rainbow DQN’s discretized 27 actions. This
ronment with different difficulties. Table IV shows results for difference might have been helpful to obtain stability on the
each track for SAC-LSTM algorithm. road.
Track Max Score Avg Score Max Speed Avg Speed
forza 1395.0 742.95 216.0 143.27 [9] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
g-track-1 954.0 611.36 200.36 131.56 Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski
g-track-2 1182.0 912.06 205.65 149.32 et al., “Human-level control through deep reinforcement learning,”
g-track-3 1154.0 574.16 182.0 109.11 Nature, vol. 518, no. 7540, p. 529, 2015.
ole-road-1 1103.0 209.70 216.42 121.65 [10] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience
ruudskogen 1294.0 575.58 199.78 117.08
street-1 1208.0 597.32 198.0 117.44
replay,” arXiv preprint arXiv:1511.05952, 2015.
wheel-1 1248.0 758.45 206.05 139.16 [11] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and
wheel-2 1152.0 636.49 216.19 128.87 N. De Freitas, “Dueling network architectures for deep reinforcement
aalborg 922.0 183.84 189.16 80.54 learning,” arXiv preprint arXiv:1511.06581, 2015.
alpine-1 1264.0 822.09 203.18 118.94
alpine-2 1119.0 677.13 194.43 103.02 [12] M. Fortunato, M. G. Azar, B. Piot, J. Menick, I. Osband, A. Graves,
e-track-1 1080.0 325.40 207.54 119.02 V. Mnih, R. Munos, D. Hassabis, O. Pietquin et al., “Noisy networks
e-track-2 1212.0 907.00 179.65 104.55 for exploration,” arXiv preprint arXiv:1706.10295, 2017.
e-track-4 1293.0 881.47 214.73 152.13 [13] M. G. Bellemare, W. Dabney, and R. Munos, “A distributional perspec-
e-track-6 1201.0 583.34 213.59 130.12
tive on reinforcement learning,” in Proceedings of the 34th International
eroad 1264.0 883.87 200.59 131.87
e-track-3 1383.0 895.41 211.87 137.35 Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp.
449–458.
Table IV: Scores on different tracks. Max value achieved for [14] M. Jaritz, R. De Charette, M. Toromanoff, E. Perot, and F. Nashashibi,
each column represented as bold. “End-to-end race driving with deep reinforcement learning,” in 2018
IEEE International Conference on Robotics and Automation (ICRA).
IEEE, 2018, pp. 2070–2075.
[15] B. Renukuntla, S. Sharma, S. Gadiyaram, V. Elango, and V. Sakaray,
“The road to be taken, a deep reinforcement learning approach towards
This study is a step towards using Deep Reinforcement autonomous navigation,” https://github.com/charlespwd/project-title,
Learning practices for self-driving cars. It can be seen that 2017.
these methods are capable of learning to drive without super-
vision. Furthermore, these agents are easily transferable to the
real world robotics platforms. Together with other machine
learning practices, Deep Reinforcement Learning methods are
expected to be used actively in the autonomous car industry.

ACKNOWLEDGEMENTS
This work was part of the term project for BLG604E
Deep Reinforcement Learning course from ITU. The project
consisted of an autonomous car race with other participants
from the class. Our agent became the fastest and won first
place.
We thank Can Erhan, for his contributions to the code base
and implementations. We also want to thank Onur Karadeli,
for his points regarding reward functions.

R EFERENCES
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
no. 7553, p. 436, 2015.
[2] B. Wymann, E. Espié, C. Guionneau, C. Dimitrakakis, R. Coulom, and
A. Sumner, “Torcs, the open racing car simulator,” Software available
at http://torcs. sourceforge. net, vol. 4, no. 6, 2000.
[3] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 2018.
[4] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp.
34–37, 1966.
[5] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan,
V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic
algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
[6] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double q-learning,” in Thirtieth AAAI Conference on Artificial
Intelligence, 2016.
[7] B. T. Polyak and A. B. Juditsky, “Acceleration of stochastic approxima-
tion by averaging,” SIAM Journal on Control and Optimization, vol. 30,
no. 4, pp. 838–855, 1992.
[8] M. Hessel, J. Modayil, H. Van Hasselt, T. Schaul, G. Ostrovski,
W. Dabney, D. Horgan, B. Piot, M. Azar, and D. Silver, “Rainbow:
Combining improvements in deep reinforcement learning,” in Thirty-
Second AAAI Conference on Artificial Intelligence, 2018.

FETTE Compression Machine Double Rotary
82% (17)
FETTE Compression Machine Double Rotary
798 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
CC Assignments Previous
100% (3)
CC Assignments Previous
49 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Stockhammer TCP 2019
No ratings yet
Stockhammer TCP 2019
37 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
Origins of Life Questions and Debates
No ratings yet
Origins of Life Questions and Debates
12 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
100% (2)
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
65 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
No ratings yet
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
132 pages
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
No ratings yet
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
4 pages
Untitled document
No ratings yet
Untitled document
11 pages
Disertatie
No ratings yet
Disertatie
5 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
DL questions
No ratings yet
DL questions
30 pages
PGP Report Sachin t22060
No ratings yet
PGP Report Sachin t22060
20 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
No ratings yet
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
28 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Get Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF ebook with Full Chapters Now
100% (2)
Get Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF ebook with Full Chapters Now
40 pages
Deepmind Control Suite
No ratings yet
Deepmind Control Suite
24 pages
Reinforcement Learning Workflows For Ai
No ratings yet
Reinforcement Learning Workflows For Ai
39 pages
case
No ratings yet
case
6 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
No ratings yet
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
10 pages
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
100% (5)
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
62 pages
Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown pdf download
100% (2)
Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown pdf download
44 pages
Instant ebooks textbook (Ebook) Foundations of Deep Reinforcement Learning: Theory and Practice in Python by Laura Graesser; Wah Loon Keng ISBN 9780135172490, 0135172497 download all chapters
100% (9)
Instant ebooks textbook (Ebook) Foundations of Deep Reinforcement Learning: Theory and Practice in Python by Laura Graesser; Wah Loon Keng ISBN 9780135172490, 0135172497 download all chapters
65 pages
03-04-lessonarticle
No ratings yet
03-04-lessonarticle
5 pages
15
No ratings yet
15
17 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Soft Actor-Critic:: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor
No ratings yet
Soft Actor-Critic:: Off-Policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor
14 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
ML-UNIT2
No ratings yet
ML-UNIT2
17 pages
1611.01606v1
No ratings yet
1611.01606v1
13 pages
Deep Reinforcement Learning
100% (4)
Deep Reinforcement Learning
48 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Download Full Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF All Chapters
100% (4)
Download Full Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF All Chapters
62 pages
11757-Article Text-15285-1-2-20201228 PDF
No ratings yet
11757-Article Text-15285-1-2-20201228 PDF
8 pages
Op Tim Ization
No ratings yet
Op Tim Ization
19 pages
ReinforcementLearning
No ratings yet
ReinforcementLearning
3 pages
RL Project - Deep Q-Network Agent Presentation
No ratings yet
RL Project - Deep Q-Network Agent Presentation
15 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
Paper 1
No ratings yet
Paper 1
7 pages
2023_week7_modelbasedRL_updated
No ratings yet
2023_week7_modelbasedRL_updated
56 pages
A Short Survey On Memory Based RL
No ratings yet
A Short Survey On Memory Based RL
18 pages
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
No ratings yet
Serge Levine Course Introduction To Reinforcement Learning 3: RL Introduction
46 pages
Lecture Week12
No ratings yet
Lecture Week12
37 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
AI Mannual 1
No ratings yet
AI Mannual 1
57 pages
2022 Lrec-1 704
No ratings yet
2022 Lrec-1 704
7 pages
Functional Benefits
No ratings yet
Functional Benefits
18 pages
Course Outline: - Computer Security
No ratings yet
Course Outline: - Computer Security
21 pages
Borrego (2017) PDF
No ratings yet
Borrego (2017) PDF
10 pages
Venkat Resume
No ratings yet
Venkat Resume
2 pages
ITIL 4 Foundation_Sample Paper1 (Prova Esame Con Risposte)
No ratings yet
ITIL 4 Foundation_Sample Paper1 (Prova Esame Con Risposte)
20 pages
Sprawl Goons Upgraded A5 v1
No ratings yet
Sprawl Goons Upgraded A5 v1
14 pages
ICT in Our Everyday Lives
No ratings yet
ICT in Our Everyday Lives
20 pages
Ral Colour Chart
No ratings yet
Ral Colour Chart
7 pages
Linux-Pro
No ratings yet
Linux-Pro
20 pages
IS311 SAD Final Document Cirunay - Guiraldo - Macapala - Point of Sales and Inventory System For DA28 Enterprises Hardware
No ratings yet
IS311 SAD Final Document Cirunay - Guiraldo - Macapala - Point of Sales and Inventory System For DA28 Enterprises Hardware
201 pages
Arduino Projects
No ratings yet
Arduino Projects
92 pages
Class 10 Unit2 Electronic Spreadsheet Notes (1)
No ratings yet
Class 10 Unit2 Electronic Spreadsheet Notes (1)
8 pages
DS-EE8000-EN-V10-0914 - Highlighted V01
No ratings yet
DS-EE8000-EN-V10-0914 - Highlighted V01
13 pages
A Proxy-Authorized Public Auditing Scheme For Cyber-Medical Systems Using AI-IoT
No ratings yet
A Proxy-Authorized Public Auditing Scheme For Cyber-Medical Systems Using AI-IoT
12 pages
Red Hat Enterprise Linux-8-Security Hardening-En-Us
No ratings yet
Red Hat Enterprise Linux-8-Security Hardening-En-Us
135 pages
KSMP Quick Guide - Suppliers - Supplier Self Registration - tcm17-87043
No ratings yet
KSMP Quick Guide - Suppliers - Supplier Self Registration - tcm17-87043
6 pages
Apic-Ist 2022 Program Ksii 0617
No ratings yet
Apic-Ist 2022 Program Ksii 0617
44 pages
GFT 324a
No ratings yet
GFT 324a
2 pages
Vegan Spoon (Project File)
No ratings yet
Vegan Spoon (Project File)
142 pages
7ffurr Cna, LT - : Summative Assessment - (2011), CF) D Mathematics / Class - X / X
No ratings yet
7ffurr Cna, LT - : Summative Assessment - (2011), CF) D Mathematics / Class - X / X
10 pages
Q3 Module3 G9 Computer-Programming Bautista-NHS-1
No ratings yet
Q3 Module3 G9 Computer-Programming Bautista-NHS-1
10 pages
Quebec Route 245: Route 245 Is A Short North/south Highway On The South
No ratings yet
Quebec Route 245: Route 245 Is A Short North/south Highway On The South
2 pages
AxiomII Modbus Map With Serial 20150303 Sorted
No ratings yet
AxiomII Modbus Map With Serial 20150303 Sorted
111 pages
Maharashtra State Board of Technical Education, Mumbai: "Decide Product and Analyse Its Good and Bad Features"
No ratings yet
Maharashtra State Board of Technical Education, Mumbai: "Decide Product and Analyse Its Good and Bad Features"
14 pages
E4 User Manual-V1.1
No ratings yet
E4 User Manual-V1.1
31 pages
Syllabus For 202210 - FR311 - 22424 COMM COMPETENCES
No ratings yet
Syllabus For 202210 - FR311 - 22424 COMM COMPETENCES
3 pages