Artificial Intelligence Operated Elevator Using RL AIOERL
Artificial Intelligence Operated Elevator Using RL AIOERL
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Our paper explores the implementation of an Model-Free Reinforcement Learning (MFRL) is a subset of
Artificial Intelligence (AI) operated elevator system aimed at RL that doesn't require knowledge of the environment's
reducing user waiting times in a residential complex. With two dynamics or transition probabilities. In other words, the agent
elevators servicing a 14-story building, each floor learns directly from experiences without explicitly modeling the
accommodating six flats with approximately four residents per environment. MFRL algorithms are particularly useful in
home, efficiency is paramount. Leveraging AI algorithms, our
scenarios where the environment is complex, and obtaining a
system dynamically adjusts elevator operations based on user
demand patterns, traffic flow, and predictive analysis, ensuring precise model is difficult or impractical. Instead, these
minimal wait times and optimal passenger distribution. By algorithms focus on learning optimal policies through
integrating AI into elevator management, we aim to enhance exploration and exploitation of the environment's state-action
user experience and streamline vertical transportation in high- space.
density residential settings. Exploration of Model-Free Reinforcement Learning Methods:
Various MFRL methods have been developed to tackle
Key Words: Artificial Intelligence, elevator optimization,
residential complexes, waiting time reduction, predictive different types of problems, each with its strengths and
analysis, passenger distribution, efficiency improvement, weaknesses. Some common MFRL algorithms include Q-
traffic flow management. Learning, SARSA (State-Action-Reward-State-Action), Deep
Q-Networks (DQN), and Policy Gradient methods such as
REINFORCE. Each of these approaches has unique
characteristics and is suited to specific types of tasks and
INTRODUCTION environments.
For the elevator optimization problem described in our
The advent of Artificial Intelligence (AI) has revolutionized
paper, a suitable MFRL method would be Deep Q-Networks
various domains, and its application in elevator systems holds
significant promise for enhancing vertical transportation (DQN). DQN is a powerful algorithm that combines Q-learning
efficiency in high-rise residential complexes. With the with deep neural networks, enabling it to handle large state-
proliferation of urbanization and the construction of taller action spaces efficiently. Here are a few reasons why DQN is
buildings, the demand for efficient elevator operations has preferable for this application:
intensified. Our paper delves into the integration of AI
technology to mitigate user waiting times within a residential • Complex State-Action Space: Elevator systems
complex comprising a 14-story building. With six flats per floor operate in dynamic environments with multiple floors,
and an average of four occupants per home, optimizing elevator varying passenger demand, and different elevator
operations becomes imperative to ensure smooth passenger flow states (e.g., idle, moving, loading/unloading). DQN's
and minimal congestion. This introduction outlines the necessity ability to approximate the optimal action-values for
and potential benefits of employing AI in elevator management
to address the challenges of vertical transportation in densely large state spaces makes it well-suited for handling the
populated residential environments. complexity of elevator control.
• Continuous Learning: Elevator systems are subject to
changing traffic patterns and user preferences,
RL and Model-Free RL requiring continuous adaptation to optimize
Reinforcement Learning (RL) is a branch of machine performance. DQN's iterative learning process allows
learning concerned with decision-making and control processes. the agent to update its policy based on new
Unlike supervised learning, where an algorithm learns from experiences, enabling it to adapt to evolving conditions
labeled input-output pairs, and unsupervised learning, where the over time.
algorithm discovers patterns in unlabeled data, RL focuses on • Exploration and Exploitation: Balancing exploration
learning from interactions with an environment to achieve a (trying new actions to discover optimal strategies) and
cumulative reward. At the core of RL is the concept of an agent, exploitation (leveraging known information to
which learns to navigate an environment through trial and error, maximize rewards) is crucial for elevator optimization.
aiming to maximize its cumulative reward over time. DQN incorporates epsilon-greedy exploration
strategies, allowing the agent to explore different
actions while gradually shifting towards exploiting Python codes for AEORL
learned policies as it gains experience. import numpy as np
• Scalability and Efficiency: With two elevators serving import random
a 14-story building with multiple flats on each floor, from collections import deque
scalability and computational efficiency are essential from keras.models import Sequential
considerations. DQN's use of deep neural networks from keras.layers import Dense
enables it to scale to larger environments while from keras.optimizers import Adam
efficiently approximating Q-values, making it suitable class DQNAgent:
for real-time elevator control. def __init__(self, state_size, action_size):
self.state_size = state_size
Algorithm for AEORL self.action_size = action_size
• State Representation: Define the state space for the self.memory = deque(maxlen=2000)
elevator system. This could include information such self.gamma = 0.95 # discount rate
as the current floor of each elevator, the direction of self.epsilon = 1.0 # exploration rate
each elevator, the number of passengers in each self.epsilon_min = 0.01
elevator, the destination floors of the passengers, and self.epsilon_decay = 0.995
the waiting time of passengers in the lobby. self.learning_rate = 0.001
• Action Space: Define the action space for the self.model = self._build_model()
elevators. Actions could include moving up, moving def _build_model(self):
down, stopping at a floor, or opening/closing doors. model = Sequential()
• Reward Function: Design a reward function that model.add(Dense(24, input_dim=self.state_size,
incentivizes efficient elevator operation. For example, activation='relu'))
rewards could be based on minimizing the waiting time model.add(Dense(24, activation='relu'))
of passengers, minimizing the time taken to reach model.add(Dense(self.action_size,
destinations, and minimizing energy consumption. activation='linear'))
• Q-Network: Implement a deep neural network (DNN) model.compile(loss='mse',
to approximate the Q-values for state-action pairs. The optimizer=Adam(lr=self.learning_rate))
input to the network would be the state representation, return model
and the output would be the Q-values for each possible def remember(self, state, action, reward, next_state,
action. done):
• Experience Replay: Implement experience replay to self.memory.append((state, action, reward, next_state,
store and sample past experiences (state, action, done))
reward, next state) for training the Q-network. This def act(self, state):
helps stabilize training and improve sample efficiency. if np.random.rand() <= self.epsilon:
• Target Q-Network: Use a separate target Q-network return random.randrange(self.action_size)
to stabilize training. Periodically update the parameters act_values = self.model.predict(state)
of the target network with the parameters of the main return np.argmax(act_values[0])
Q-network. def replay(self, batch_size):
• Epsilon-Greedy Exploration: Implement epsilon- minibatch = random.sample(self.memory, batch_size)
greedy exploration to balance exploration and for state, action, reward, next_state, done in minibatch:
exploitation. With probability epsilon, select a random target = reward
action (explore); otherwise, select the action with the if not done:
highest Q-value (exploit). target = (reward + self.gamma *
• Training Procedure: Train the Q-network using a np.amax(self.model.predict(next_state)[0]))
variant of the DQN algorithm such as Double DQN or target_f = self.model.predict(state)
Dueling DQN. Use techniques such as gradient descent target_f[0][action] = target
to minimize the temporal difference error between the self.model.fit(state, target_f, epochs=1, verbose=0)
predicted Q-values and the target Q-values. if self.epsilon > self.epsilon_min:
• Deployment: Once the Q-network is trained, deploy it self.epsilon *= self.epsilon_decay
to control the elevators in real-time. At each time step,
use the trained Q-network to select actions for the # Define state and action sizes
elevators based on the current state of the system state_size = 10 # Example: number of elevator states
. action_size = 4 # Example: number of elevator actions (up,
down, stop, open door)
. BIOGRAPHIES