0% found this document useful (0 votes)
32 views

Reinforcement Learning in Controls - Conor Healy

Uploaded by

conor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Reinforcement Learning in Controls - Conor Healy

Uploaded by

conor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Reinforcement Learning in Controls

Conor Healy#1
Department of Electrical and Computer Engineering, Kennesaw State University
1100 South Marietta Pkwy SE, Marietta, GA 30060, United States Of America
1
[email protected]

Abstract— The purpose of this article is to do an in-depth algorithms, natural language processing
exploration into the topic of reinforcement learning within
the field of machine learning. It will discuss its origins, its algorithms and code enabling robotic systems
developments overtime, its uses across a number of different to move around as naturally as living
industries and its various applications to control systems. organisms do across irregular terrain and
within industrial environments.
Keywords— Include at least 5 keywords or phrases

I. INTRODUCTION II. BACKGROUND


Reinforcement learning is a concept in In order to properly discuss the ins and outs
machine learning that primarily deals with the of reinforcement learning, one must first
careful selection of desirable training criteria discuss both the origins of machine learning
and implementation of machine learning and computer science as well.
models to properly adapt to the designated Believe it or not the foundations of machine
goals of a specific project. Typically a system learning happened very early on in the history
will have a desired end goal defined clearly so of computer science. Some of the initial
that during the testing process the actions mathematical foundations of neural network
taken will be evaluated with regards to how architecture were hypothesized in the 1940s.
well it achieves said stated goal. The primary reason behind this not taking off
Oftentimes reinforcement learning systems during that time was a lack of powerful
need to have their performance criteria broken computing hardware. The framework for the
down so that they understand the necessary ideas involved were capable of doing the job
intermediate steps that are required to reach in theory, but the field would need to wait
the final end goal. When it comes to until the late 1900s and early 2000s for more
reinforcement learning, careful selection of the significant progress to be made in the field.
training criteria is an absolute must in order to Your typical ML reinforcement learning
have a functioning product at the end of the program requires very capable graphics cards
day. as well as highly specialized desktop
With the ever growing prevalence of computers with hardware that is typically
capable computing devices, the industry has associated with visual processing technology.
found itself in a situation where theoretical That being said, though some of this hardware
algorithms of the past are proving themselves can be pricey, it's still within the range of
to be more feasible than ever thought of affordability for your everyday working
before. Modern-day AI programming makes person, reducing the barrier to entry so that
use of hardware capabilities that are many people looking to investigate these
drastically more powerful than decades past. programs can. In fact, the average video game
This increase in hardware has allowed for the enthusiast likely already has the necessary
practical usage of image recognition
hardware to process these algorithms if they created by research organizations who have
desired to. graciously open sourced their hard work in
As of right now the field primarily concerns hopes to foster a community of machine
itself with teaching systems to visually learning programmers with different ideas on
recognize objects, perform tasks in simulated how to implement efficient code.
environments and generally optimize the The next step in the process deals primarily
behavior of systems so it produces the desired with training the code. At first the program
output behavior. Many of these systems doesn't know how to do its job well in the
encounter highly idealized scenarios or do not slightest. It essentially makes poor guesses in
have a broad enough knowledge base to hopes to get the correct answer. Over time
enable effective results for unforeseen events though, after running test after test, receiving
so a large quantity of varied training data is the results and modifying the behavior for
necessary. countless generations, the versions of the
program that improve are held on to while the
III. IMPLEMENTATION AND EVALUATION
ones that did worse get discarded. This
Essentially the process of reinforcement process continues over time until the most
learning requires four main steps. The successful version of the algorithm is chosen
designer of the system must collect a large as the final product.
amount of data, then they must organize and Once a desirable, or simply acceptable,
label the data. After this they must implement model has been produced it gets put to use and
a training program that repeatedly tests, tested in the field. For image recognition
evaluates and modifies the Decision making systems this is done by testing its ability to
for the autonomous system. After enough time recognize images that it's never seen before
spent training the program would ideally be but are still a part of the classes it was trained
capable of real-world applications. on. For auditory recognition systems it's
For step one, accumulating vast amounts of testing new audio files from the categories of
data, researchers tend to scour the internet for things it was made to listen to.
usable image and auditory data with the use of When these methods are used to optimize a
web scraping software. The larger the data set, control system they typically strive to
the more capable their program can be given maximize the desired outcome and minimize
that they were exposed to a large number of inaccuracies. For example, in a robotic system
different case scenarios for each defined class some actuators responsible for balancing and
of object. Alternatively many programmers moving a device may need to optimize its
opted to use previously accumulated data that kinematic equations so that the system stays
were made open source to the public. Then balanced. Alternatively, it may need to know
they try to explore different code architecture when and where to accurately move their parts
so that they can produce more efficient in order to traverse an environment. These
algorithms using the same data sets. systems can take multiple inputs and use that
In the second step, the system designer must as an indicator for the desired output. This is
take the collection of data, organize and label currently used a lot in Wall Street. Stock
it according to the desired criteria that they trading bots will look at a number of factors
hope their algorithms will learn and manage for different stocks, the trends in those stocks
during their typical operations. As one might and the market as a whole then it'll use that
expect this is very time-consuming and information to determine whether they should
labor-intensive, so many individuals find it's invest or divest from a particular company.
preferable to make use of the For
aforementioned pre-made data sets often
Strikingly, hundreds of transactions like these reliability of said drugs. With more research
can take place over the course of seconds. the accuracy of this system will only increase
so who knows what could be on the horizon?
IV. CONCLUSION
Another major area of development deals
In conclusion, reinforcement learning is the with IBM's Watson. This is a machine learning
process of training a machine learning system that is capable of processing millions
algorithm to perform an optimized task. of medical documents, whether they be
Although many of these algorithms are research papers or documented medical
dealing with visual and auditory data these history of patients. It can identify even the
systems have applications to physical control most obscure symptoms, cross-reference them
systems as well. They can manage everything with its knowledge base and identify helpful
from climate control in warehouses to treatments that would have required countless
autonomous cars and robotic systems. medical experts to diagnose and treat.
V. FUTURE OF THE FIELD Although this research may enable many
Given the sheer amount of research and positive outcomes for society at large there
development being poured into this field of could also be a number of unforeseen
study the future seems to hold a lot of negative consequences to address and plan
promise. There are plenty of applications to for. There are concerns that facial recognition
the medical industry. Systems can be made to software may further erode people's privacy.
identify tumors automatically within MRI Also, with Deepfake algorithms
scans. New medical drugs could be developed misinformation could foreseeably get worse.
faster and more cheaply than ever. Even a kind Facial-recognition software is a fairly
of automated doctor could be produced in the interesting application of reinforcement
likeness of IBM's Watson. learning. It could potentially be used to find
One of the major advancements being lost people or be used as a form of biometric
lauded by those researching medical scanner to allow a person access to something
applications for reinforcement learning is drug they want to keep secure. That being said it
development. There's a problem in medicine can also be implemented in ways that would
referred to as the protein folding problem be harmful. Some governments around the
which basically refers to the general difficulty world have already been using facial
of predicting the behavior of a drug as it recognition software to identify protesters and
interacts with a patient's anatomy. There are clampdown on political dissent. It doesn't
methods of determining the chemical makeup always get used for this application. In fact,
of certain proteins but there's a number of sometimes it's used to identify violent rioters
ways in which a protein can fold and that or other forms of outright criminals, but it’s
affects the way it interacts with its clear that some boundaries need to be drawn
surroundings. A professional can develop a in order to prevent the abuse of this
protein with the exact same chemical makeup technology.
but if it's not folded correctly its Behavior can Another interesting, yet concerning,
become erratic and damaging. After years of application of reinforcement learning is a
study, however, a research organization known concept known as Deepfakes. A Deepfake is
as DeepMind was able to develop an AI the process of using deep reinforcement
system that predicts the proper structure of learning to transfer the likeness of someone
proteins with a reliability of 90 percent. This else's face onto footage of another person.
could speed up the pace of drug development Movie production agencies have been using it
significantly while also increasing the for overlaying star actors' faces onto stunt
doubles to make a movie more immersive. It's Reinforcement Learning, 29 Sep. 2022,
https://www.spiceworks.com/tech/artificial-intelligence/articles
also been used to digitally de-age people so /what-is-reinforcement-learning/.
that they look younger on camera. It’s easy to
[9] Shapiro, Linda G., and George C. Stockman. “Computer
see the benefit in these applications, however Vision.” Prentice Hall, 2001.
there is also something that’s troubling about
how easily this could increase the amount of [10] Lucian Buşoniu, Tim de Bruin, Domagoj Tolić, Jens Kober,
Ivana Palunko, “Reinforcement learning for control:
disinformation through the creation of false Performance, stability, and deep approximators”, Annual
video footage of influential figures. Politicians Reviews in Control, Volume 46, 2018
[11] Wiggers, Kyle. “Google's Robotic Hand AI Can Learn to
could have falsified videos of them instigating Rotate Baoding Balls with Minimal Training Data.”
violence or strain international treaties VentureBeat, VentureBeat, 27 Sept. 2019,
https://venturebeat.com/ai/google-robotic-hand-ai-can-rotate-ba
between allying nations. oding-balls-with-under-4-hours-of-training-data/.
So, after everything just discussed, the [12] Nicholson, Chris. “A Beginner's Guide to Deep Reinforcement
Learning.” Pathmind, 2020,
benefits and shortcomings of reinforcement https://wiki.pathmind.com/deep-reinforcement-learning.
learning and machine learning in general [13] Odekunle, Adedapo O., "Reinforcement Learning, Intelligent
Control and their Applications in Connected and Autonomous
should be evident. It holds the promise of Vehicles" (2019). Electronic Theses and Dissertations. 1878.
bettering the medical industry, robotics and [14] Sutton, Richard S., and Andrew G. Barto. “Reinforcement
Learning: An Introduction” The MIT Press, 2020.
manufacturing, but it also needs decent, well [15] Mirzaei Buini, Hamid, “Control System Design Automation
thought out regulation to ensure the best Using Reinforcement Learning” UC Irvine Electronic Theses
and Dissertations, 2018.
outcomes. [16] Zhe Wang, Tianzhen Hong, “Reinforcement learning for
building controls: The opportunities and challenges”, Applied
REFERENCES Energy, Volume 269, 2020.
[17] Meyn, Sean. “Reinforcement Learning Part 1: Control Systems
[1] Russell, Stuart, and Peter Norvig. “Artificial Intelligence: A & RL”, Simons Institute, Dec. 18, 2020.
Modern Approach.” Pearson Education Limited, 2022. [18] Aghli, Nima. “A Reinforcement Learning Approach to
[2] Bowyer, Caleb M. “What Kinds of Controls Are Possible in Autonomous Speed Control in Robotic Systems.” School of
Reinforcement Learning Problems?” Medium, Level Up Computing at Florida Institute of Technology, May 2017.
Coding, 22 June 2022, [19] DeLellis, Francesco, “Control-Tutored Reinforcement
https://levelup.gitconnected.com/what-kinds-of-controls-are-po Learning: Towards the Integration of Data-Driven and
ssible-in-reinforcement-learning-problems-9a30530fc65. Model-Based Control.” Proceedings of Machine Learning
[3] Mwiti, Derrick. “10 Real-Life Applications of Reinforcement Research, 2022
Learning.” Neptune.ai, 18 Nov. 2022, [20] Vincze, David. "Fuzzy rule interpolation and reinforcement
https://neptune.ai/blog/reinforcement-learning-applications. learning.” IEEE 15th International Symposium on Applied
[4] Ho, Daniel, and Kanishka Rao. “Toward Generalized Machine Intelligence and Informatics, 2017.
Sim-to-Real Transfer for Robot Learning.” – Google AI Blog, [21] Korkmaz, Ezgi. "Deep Reinforcement Learning Policies Learn
https://ai.googleblog.com/2021/06/toward-generalized-sim-to-r Shared Adversarial Features Across MDPs." Thirty-Sixth
eal-transfer.html. AAAI Conference on Artificial Intelligence, 2022.
[5] Kilinc, Ozsel, and Giovanni Montana. “Reinforcement [22] Francois-Lavet, Vincent. "An Introduction to Deep
Learning for Robotic Manipulation Using Simulated Reinforcement Learning." Foundations and Trends in Machine
Locomotion Demonstrations - Machine Learning.” Learning, 2018.
SpringerLink, Springer US, 24 Nov. 2021, [23] Riveret, Regis; Gao, Yang. "A probabilistic argumentation
https://link.springer.com/article/10.1007/s10994-021-06116-1. framework for reinforcement learning agents". Autonomous
[6] Vavra, Chris. “Seven Tips for Implementing Machine Learning Agents and Multi-Agent Systems, 2019.
in Controls Environments.” Control Engineering, 18 Oct. 2022, [24] van Hasselt, Hado; Hessel, Matteo; Aslanides, John. "When to
https://www.controleng.com/articles/seven-tips-for-implementi use parametric models in reinforcement learning?" Advances in
ng-machine-learning-in-controls-environments/. Neural Information Processing Systems, 1992.
[7] Johnson, Daniel. “Reinforcement Learning: What Is, [25] Williams, Ronald J. "A class of gradient-estimating algorithms
Algorithms, Types & Examples.” Guru99, 19 Nov. 2022, for reinforcement learning in neural networks" Proceedings of
https://www.guru99.com/reinforcement-learning-tutorial.html. the IEEE First International Conference on Neural Networks,
[8] Kanade, Vijay. “What Is Reinforcement Learning? Working, 1987.
Algorithms, and Uses.” Everything You Should Know About

You might also like