AGAMDEEP
AGAMDEEP
On
TO STUDY IOT SECURITY
Submitted in partial fulfilment of the requirements for the award of the degree of
Bachelor of Information Technology
in
2023
DECLARATION
I, Agamdeep Singh of B tech (IT) hereby declare that the in-house project report “TO
STUDY IOT SECURITY” which is submitted by me to Department of Computer Science
& Engineering, Amity School of Engineering and Technology, Amity University Uttar
Pradesh, Noida, in partial fulfilment of requirement for the award of the degree of Bachelor
of Technology in Computer Science & Engineering has not been previously formed the basis
for the award of any degree, diploma or other similar title or recognition.
Agamdeep singh
A23053220016
ii
CERTIFICATE
This is to certify that Agamdeep Singh, student of B. Tech, Department of Computer Science
and Engineering, Amity School of Engineering and Technology, Amity University Haryana,
has done his NTCC Project entitled “TO STUDY IOT SECURITY” under the guidance and
supervision of me. The work was satisfactory. He has shown complete dedication and
devotion to the given project work.
iii
ACKNOWLEDGEMENT
Inspiration and motivation have always played a key role in success of any venture and right
guidance, assistance and encouragement of other people have played an essential part.
Agamdeep Singh
A23053220016
iv
TABLE OF CONTENT
2. CERTIFICATE iii
3. ACKNOWLEDGEMENT iv
4. ABSTRACT 1
5. INTRODUCTION 2-10
6. REVIEW OF LITERATURE 11-12
7. METHODOLOGY 13
8. DISCUSSION 14-15
9. CONCLUSION 16-18
v
ABSTRACT
1
CHAPTER-1
INTRODUCTION
In artificial intelligence (AI), reinforcement learning, or RL, is a well-known method for
teaching agents how to behave optimally via practise and feedback from their surroundings.
The feedback mechanism is crucial to RL since it helps direct the process of learning and
enhances AI systems' decision-making skills. This review explains why and how
reinforcement learning uses feedback mechanisms, and what it means for the future of
artificial intelligence. In RL, mechanisms for feedback are used to provide an agent with
evaluative signals about the value of its activities and direct it towards its objectives.
Reward signals, price functions, and policy gradients are all examples of such processes. RL
agents are able to maximise rewards while minimising costs since they may learn from their
encounters via feedback. In order for agents to successfully traverse challenging settings,
adjust their methods, and ultimately enhance their performance over time, feedback plays a
crucial role.
The efficacy and efficiency of feedback mechanisms has a major bearing on the RL
algorithms' ability to learn and their overall performance. Reward forming, exploration-
exploitation trade-offs, & credit assignment are only a few of the obstacles that must be
overcome when designing successful feedback systems. How and what kind of input is used
to shape an agent's learning path and the rate that it converges to optimum behaviours.
Remarkable progress has been made in many fields, including game playing, robotics, even
autonomous driving, thanks to recent developments in RL research, notably the integration of
powerful neural networks using reinforcement learning. The design and implementation
dynamic feedback mechanisms are becoming more important for these complex RL systems
that have intricate reward structures, hierarchical rules, and multi-agent interactions. Artificial
intelligence (AI) systems may now make informed judgements based on raw data thanks to
deep reinforcement learning methods, which have shown they can learn directly from highly
dimensional sensory input.
Although significant strides have been achieved, many problems and difficulties still exist in
the field of RL feedback systems. Maintaining a focus on the understanding and clarity of the
input given to RL agents is essential. Another active area of study is the incorporation of
2
human preferences & ethical concerns into the feedback loop, with the goal of ensuring that
AI system behaviours are consistent with society values and free of prejudice.
The science in artificial intelligence (AI) has seen the rise of a new paradigm called
reinforcement learning (RL), which helps agents acquire optimum behaviours by observing
and mimicking those of their surroundings. Game playing, robotics, systems for
recommendation, & autonomous cars are just a few of the areas where RL has shown
promising results. The feedback mechanism is crucial to RL since it guides the learning
process and ultimately the agent's decision-making. The foundations of RL's use of feedback
may be found in the study of reinforcement learning in the earliest days of the field. The
concept of operant conditioning serves as inspiration for RL algorithms, which learn from
experience with the help of rewards and punishments. The agent's objective is to develop a
strategy that maximises long-term profits.
While RL has come a long way, there are still many unanswered problems and obstacles that
need to be tackled. incentive shaping is one such difficulty, and it entails creating useful
incentive structures that drive learning with valuable feedback. Agents need to investigate the
world to find new methods, but they also need to utilise current information. Another
difficulty is "credit assignment," which is giving credit for results to the people or factors
3
responsible for them. The need of comprehending how AI systems arrive at their conclusions
has also increased the focus on the interpretability & explainability of the input given to RL
agents. Especially in safety-critical fields, it is crucial that RL algorithms can offer human-
comprehensible explanations for their decisions. These technological obstacles have been
joined by the growing weight of ethical concerns. Inadvertently, the feedback given to RL
agents might magnify biases in the information or cause them to behave inappropriately. For
RL systems to be developed and used ethically, it is essential that feedback takes into account
fairness, openness, and responsibility.
When studying feedback mechanisms within reinforcement learning for AI, reward design is
an important subtopic to go into. It entails designing rewarding systems that aid RL agents'
learning by providing informative feedback. The design of agents' rewards has a significant
impact on their actions, which in turn affects their productivity and success. In reinforcement
learning (RL), rewards are used to motivate the agent to behave in a certain way. The agent
may be guided towards actions that maximise cumulative rewards or minimise costs with the
use of feedback provided by a well designed reward function.
Fig.1
However, establishing efficient incentives may be difficult since it calls for thought about
many different elements and trade-offs. The incentive design process involves striking a
balance between the two extremes of reward density and reward sparsity. It is challenging for
4
an agent in real life to properly gain insight into the world when incentives are sparse and
only offered at specified stages or when certain objectives are fulfilled. However, problems
like reward overfitting and suboptimal behaviour may arise when dense incentives are used
since they offer feedback for every time phase or action, speeding up the learning process.
Finding the sweet spot between the two is essential for promoting efficient learning and
blocking undesirable agent behaviour.
Shaping rewards, or composite rewards, are another component of reward design. Additional
incentives or punishments may be used to shape the agent's behaviour towards the intended
goals. To further steer the learning process, these supplementary incentives might be
dependent on heuristics and domain expertise. To account for the long-term results of an
individual's activities, composite rewards are built by mixing reward signals like those
received instantly and those received later. The problem of sparse rewards may be mitigated
by the use of reward shaping or composite rewards, which provide the agent more direct
input.
Motivating agents based on their own intrinsic interests, rather than external rewards, is a
relatively new concept in the field of incentive design. Having an intrinsic drive to learn new
things, try out new experiences, or broaden one's skill set is rewarded. Agents may be
motivated to explore and learn more from their surroundings, even with the lack or explicit
external incentives, by adding intrinsic rewards. This method encourages inquisitive learning
and may improve RL agents' capacity for generalisation and adaption. It is important to keep
biases and unforeseen effects in mind while designing reward systems. Inadvertently
rewarding or punishing specific behaviours or causing unintended results requires careful
consideration of the incentive system. Important features of reward design include ethical
issues and aligning incentives with human values.
5
1.4 EXPLORATION AND EXPLOITATION STRATEGIES
Finding a happy medium between exploring uncharted territory and capitalising on past
successes is the goal of reinforcement learning's exploration and exploitation phases. Key
exploration and exploitation tactics are outlined below.
Fig.2
Trade-off: There is tension between the goals of exploration and exploitation. Finding a
happy medium between the two will allow for optimal learning and decision-making.
Overexploration may waste resources, while overexploitation can cause poor decision-
making.
UCB stands for "upper confidence bound.": By evaluating the uncertainty and
confidence interval for action values, the UCB algorithm strikes a compromise between
discovery and exploitation. Actions are chosen using an estimation of value and an
6
exploration bonus depending on the projected level of uncertainty. This approach
emphasises exploration at the outset but shifts towards exploitation as more data is
collected.
Conspiracy of Thieves: The multi-armed bandit issue is a time-tested analytical tool for
analysing resource-gathering plans. To maximise the overall payouts, the agent must
strategically choose acts (pull arms) amongst a collection of slot machines (bandits).
Insights into research and profit trade-offs are provided by the many algorithms and
techniques developed to address the multi-armed bandit issue.
Analysing the Situation: The agent might have to take into account background
knowledge during exploration and exploitation in certain circumstances. To create better
choices, contextual exploration techniques take into account extra data, such as state
aspects. These approaches are designed to learn from one context and apply it to another
set of activities.
Decline in Exploration: Slowing down the pace of exploration might be useful in certain
situations. In the early stages, players are tasked with learning about their surroundings
via exploration, but in later levels, they are tasked with using what they've learned to their
advantage. As the agent gains experience or knowledge, exploration decay strategies
lower the exploration weight to a more manageable level.
7
AI contexts. The agent's past knowledge, the nature of the task at hand, and the balance they
want between discovery and extraction all factor into the decision of which approach to use.
Reinforcement learning (RL) is a technique for teaching an agent to make better decisions
based on past experience in order to maximise rewards or minimise costs. In this step, the
agent's policy is refined in light of new information gleaned from its surroundings, and
different algorithms and methods are used to improve the agent's decision-making skills.
Fig.3
Modifying Policy: Iterating on a policy is another iterative method for bettering said
policy. The process consists of an assessment phase followed by an improvement phase.
The value of the function is calculated during the policy assessment phase. Then, the
policy is improved by taking the value function into account. This procedure is iterated
upon until the policy eventually settles on the best possible solution.
8
Q-Learning: The model-free RL technique known as Q-learning is particularly well-
liked because of its emphasis on training the action-value functional (Q-function). Q-
values are updated in an iterative fashion depending on the rewards that have been seen
and the highest Q-value of the following state using temporal difference learning. Q-
learning equips the agent with the knowledge it needs to maximise long-term benefits by
acting on the Q-values it has learnt.
Fig.4
Policy Gradient Approaches: The policy parameters are directly optimised using
gradient information in policy gradient approaches. The policy is updated using these
approaches by taking the gradient of the cumulative anticipated rewards as a function of
the policy parameters. To estimate the slope and fine-tune the policy repeatedly, policy
gradient algorithms use methods such as Monte Carlo sampling and the REINFORCE
algorithm.
9
use of methods like deep Q-networks (DQN), deeper policy gradients (DPG), or actor-
critic architectures.
The essential components of RL are policy optimisation and improvement, which allow
agents to gain knowledge from experience and make better choices in the future. Researchers
strive to improve the performance and flexibility of real-life agents in a variety of AI
applications by using algorithms including price iteration, rule iteration, Q-learning, which
policy gradient approaches, and cutting-edge techniques like PPO and deep RL.
10
CHAPTER-2
REVIEW OF LITERATURE
The need for fast and effective learning algorithms is rising as the area of artificial
intelligence develops. Reinforcement learning1 is a crucial paradigm in the field of artificial
intelligence; it allows decision-making entities, known as agents, to interact with their
surroundings and learn by adapting their behaviour based on the feedback they get. How
quickly agents learn2 is an important topic for real-world applications. While quantum
mechanics has been used in a number of studies to speed up the agent's decision-making
process3,4, it has not yet been shown to speed up the learning process. In this paper, we offer
a reinforcement learning study whereby a quantum channel of communication between the
agent and its environment speeds up the agent's learning.
Parallel adaptive processes on different sizes but with comparable feedback mechanisms may
be seen in both the natural and artificial worlds, where they go by the names of evolution and
reinforcement learning. The study of each of these phenomena in tandem is thus mutually
beneficial. Although it precedes all learning, evolution in nature is not immune to its effects.
While evolutionary computing and reinforcement learning emerged in AI separately, several
research have since examined their mutual benefits. Understanding the history and future of
reinforcement learning is becoming more important as it gains traction in the field of machine
learning and is incorporated into more complicated learning systems like deep neural
11
networks. This review looks at key developments in the history of RL and proposes
promising avenues for further study.
The AI bots are getting more independent and generalised. We propose the "Reward
Engineering Principle," which states that the design of incentive mechanisms that elicit
desirable behaviours becomes both more crucial and more complex when reinforcement-
learning-based AI systems grow more broad and autonomous. The reward building principle
will have a medium to long-term impact on contemporary artificial intelligence (AI) research,
both in theory and applied. This is in contrast to early AI research, which could ignore honour
design along with focus only on the problems for efficient, flexible, and effective attainment
of arbitrary goals to varied environments. We formalise the intuitive landmarks in the field of
reward design by introducing some notation and deriving early findings.
12
CHAPTER-3
METHODOLOGY
Aim of the study:
The aim of the study is to investigate and explore the feedback mechanism in reinforcement
learning for artificial intelligence (AI) systems. The study aims to gain a deeper
understanding of how feedback mechanisms influence the learning process, decision-making
capabilities, and overall performance of RL agents. It seeks to contribute to the advancement
of RL algorithms by examining various aspects of feedback, including reward design,
exploration-exploitation strategies, and policy improvement and optimization.
To analyze and evaluate different reward design approaches in reinforcement learning and
their impact on the learning process and performance of RL agents.
To investigate and compare exploration and exploitation strategies in RL, examining their
effectiveness in balancing the trade-off between gathering new information and
leveraging existing knowledge.
13
CHAPTER-4
DISCUSSION
An chance to understand and analyse the results, address the research issues and goals, and
debate the ramifications of a study of feedback systems in reinforcement learning towards
artificial intelligence (AI) is provided in the study's discussion section. An example of a
report on such a research is provided below.
The purpose of this research was to examine the function of reinforcement learning feedback
mechanisms in artificial intelligence. The goal was to develop understanding of how agent
performance, learning, and decision-making are all affected by feedback. Our goal was to aid
in the creation of better RL algorithms and more trustworthy AI systems by investigating
many facets of feedback, such as reward design, exploration-exploitation tactics, and policy
refinement and optimisation. Our research shows that the way reinforcement learning rewards
performance is crucial. We found that the learning effectiveness and efficacy of RL agents
were greatly impacted by the design of their incentive systems. Agents were capable of to
learn more efficiently and converge on improved policies thanks to the careful design of
incentives, the provision of useful feedback, and the handling of difficulties like scant
rewards or reward overfitting. This demonstrates how crucial incentive design is to the
success of RL applications.
Moreover, our investigation of exploration and extraction techniques illuminated the need of
striking a balance between probing uncharted territory and capitalising on established facts.
Epsilon-greedy, Thompson sampling, & Upper Confidence Bound (UCB) are just a few
examples of tactics that have shown varied degrees of success in striking this equilibrium.
The agent's past knowledge and the nature of the challenge itself informed the strategy
selection process. To help RL agents find the best policies and steer clear of stalemates, we
discovered that it's important to choose and tweak exploration-exploitation techniques.
Value iteration, policy repetition, Q-learning, or policy gradient approaches were some of the
well-known algorithms for policy optimisation and enhancement that we investigated in this
work. We found that these methods provide powerful tools for continuously refining the
agent's policy in light of experience. Our trials demonstrated both the capacity to handle
complicated decision-making problems and the convergence to optimum policies. Further,
agents were able to learn from highly dimensional sensory input to arrive at nuanced
14
judgements because to the implementation of advanced reinforcement learning algorithms,
which demonstrated encouraging results. There are a number of obstacles and possibilities for
further research, despite the fact that our work adds to the knowledge of feedback processes
in RL. Feedback given to RL agents still has to be easily understood and explained. Tackling
these issues may result in more open and reliable AI systems. To guarantee ethical AI
behaviour, it is crucial to include ethical factors like justice and human values in the feedback
loop.
15
CHAPTER-5
CONCLUSION
In conclusion, reinforcement learning for AI systems relies heavily on feedback methods. We
have learned a great deal about how feedback affects the learning process, ability to make
decisions, and overall performance on RL agents via the investigation of reward layout,
exploration-exploitation techniques, and policy development and optimisation. When it
comes to feedback systems, reward design is crucial since it has a major influence on RL
agents' ability to learn and perform. Agents may learn more efficiently and converge to
optimum policies if incentives are shaped appropriately, difficulties like scant rewards are
dealt with, and intrinsic motivation is included. Finding the sweet spot between exploring
uncharted territory and capitalising on established expertise requires a well-thought-out
exploration and exploitation strategy. Strategies like epsilon-greedy, Thompson sampling, &
Upper Confidence Bound (UCB) provide alternatives for discovering and capitalising on
information gained from the environment. In order for RL agents to learn optimum policies
while preventing suboptimal solutions, the selection and modification of these techniques is
critical. Techniques for policy optimisation and improvement provide efficient ways of
iteratively bettering the agent's policy in light of feedback. Optimal policies may be
converged upon using algorithms like value iteration, policy repetition, Q-learning, other
policy gradient approaches. The agent's learning skills are further bolstered by deep
reinforcement learning methods, which enable effective learning from highly dimensional
sensory input. While we made some progress in understanding the mechanics of feedback in
RL, there are still many open questions and areas for investigation. Ethical issues, as well as
the interpretability & explainability of the feedback given to RL agents, are crucial. Solving
these problems is crucial for developing AI systems that are open, trustworthy, and morally
sound. Improvements in efficiency, dependability, and accountability of AI systems may be
achieved by more research into and use of feedback mechanisms for reinforcement learning.
Progress will be driven by the investigation of incentive design, exploration-exploitation
tactics, and policy enhancement and optimisation, paving the way for future developments in
reinforced learning and artificial intelligence.
16
FUTURE SCOPE
Several new lines of inquiry are made possible by the study of feedback mechanisms for
reinforcement learning for AI. Future expansion might include the following areas:
Acquiring Knowledge and Applying It: It would be beneficial to learn more about
feedback mechanisms n the setting of transfer acquisition and generalisation. How RL
agents may use the information they've obtained from one activity or environment to
speed up their learning in other activities or settings is a research topic. Better performing
and more versatile AI systems may result from exploring strategies for transferring rules,
value functions, etc reward structures across domains.
Feedback that is Easy to Understand and Explain: Building reliable and open AI
systems necessitates giving special attention to the interpretability & explainability of the
input given to RL agents. The development of methods and tools to explain the reasoning
behind RL agents' behaviours and choices might be the subject of future study.
17
Facilitating human-AI cooperation and allowing domain experts to assess and modify the
behaviour of AI systems may be accomplished by exploring techniques to visualise and
comprehend the learnt rules and incentive structures.
Concerns of a Moral Nature: A crucial next step for RL agents will be to include ethical
issues into their feedback loop. Fairness, accountability, and human value alignment may
be promoted via the study of potential institutional structures and methods. Responsible
& ethical implementation of AI systems may be aided by research into techniques to
reduce bias, manage sensitive data, and include ethical norms and laws into the feedback
process.
Mixed Methods: Future studies might benefit from hybrid approaches that integrate
reinforcement teaching with other artificial intelligence methods like controlled or
unsupervised learning. More robust and flexible AI systems may be achieved by
researching how input from many sources can be combined and exploited to improve
learning and decision-making.
Use in the Real World: One key area for future research is the practical implementation
and verification of feedback mechanism discoveries. Feedback mechanisms across
complex and dynamic systems provide a number of difficulties and possibilities that may
be better understood via experimentation and study in fields including robotics,
autonomous cars, healthcare, and finance.
In sum, there is a huge and multifaceted potential for study into feedback mechanisms in RL.
The more these topics are investigated, the more progress may be made in the field of
artificial intelligence, leading to systems that are better able to learn and make judgements in
a broad variety of real-world contexts while also being more efficient, comprehensible and
and morally aligned.
18
REFERENCES
Saggio, V., Asenbeck, B. E., Hamann, A., Strömberg, T., Schiansky, P., Dunjko, V., ... &
Walther, P. (2021). Experimental quantum speed-up in reinforcement learning
agents. Nature, 591(7849), 229-233.
Huang, X., Yuan, T., Qiao, G., & Ren, Y. (2018). Deep reinforcement learning for multimedia
traffic control in software defined networking. IEEE Network, 32(6), 35-41.
Hougen, D. F., & Shah, S. N. H. (2019, December). The evolution of reinforcement learning.
In 2019 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1457-1464).
IEEE.
Dewey, D. (2014, March). Reinforcement learning and the reward engineering principle.
In 2014 AAAI Spring Symposium Series.
Li, X., Lv, Z., Wang, S., Wei, Z., & Wu, L. (2019). A reinforcement learning model based on
temporal difference algorithm. Ieee Access, 7, 121922-121930.
Bragg, J., & Habli, I. (2018). What is acceptably safe for reinforcement learning?.
In Computer Safety, Reliability, and Security: SAFECOMP 2018 Workshops, ASSURE,
19
DECSoS, SASSUR, STRIVE, and WAISE, Västerås, Sweden, September 18, 2018,
Proceedings 37 (pp. 418-430). Springer International Publishing.
Reinhard, P., Li, M. M., Dickhaut, E., Reh, C., Peters, C., & Leimeister, J. M. (2023, May). A
Conceptual Model for Labeling in Reinforcement Learning Systems: AV alue Co-c reation
Perspective. In International Conference on Design Science Research in Information Systems
and Technology (pp. 123-137). Cham: Springer Nature Switzerland.
20