0% found this document useful (0 votes)
7 views

Dependency-Aware CAV Task Scheduling Via Diffusion-Based Reinforcement Learning

Uploaded by

neturiue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Dependency-Aware CAV Task Scheduling Via Diffusion-Based Reinforcement Learning

Uploaded by

neturiue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Dependency-Aware CAV Task Scheduling via

Diffusion-Based Reinforcement Learning


Xiang Cheng∗† , Zhi Mao∗ , Ying Wang† , and Wen Wu∗
∗ Frontier
Research Center, Pengcheng Laboratory, Shenzhen, China
† School of Information and Communication Engineering, Beijing University of Posts and Telecommunications
Email: {chengx01, maozh, wuw02}@pcl.ac.cn∗ , and [email protected]

Abstract—In this paper, we propose a novel dependency- computing resources are acted as extensions of the edge and
aware task scheduling strategy for dynamic unmanned aerial termed service vehicles (SVs) [8], the tasks generated by task
arXiv:2411.18230v1 [cs.AI] 27 Nov 2024

vehicle-assisted connected autonomous vehicles (CAVs). Specifi- vehicles (TVs) can be offloaded to nearby service vehicles.
cally, different computation tasks of CAVs consisting of multiple
dependency subtasks are judiciously assigned to nearby CAVs Furthermore, the fine-grained vehicular tasks partition and
or the base station for promptly completing tasks. Therefore, we scheduling accelerates the completion of the task [9] [10],
formulate a joint scheduling priority and subtask assignment since tasks can be divided into several dependent subtasks,
optimization problem with the objective of minimizing the which are modelled as the directed acyclic graph (DAG) [11]
average task completion time. The problem aims at improving describing subtasks interdependency, and offloaded to other
the long-term system performance, which is reformulated as
a Markov decision process. To solve the problem, we further SVs or BS. The existing researches illuminate the advantages
propose a diffusion-based reinforcement learning algorithm, of the internet of vehicle (IoV) combined with VEC task
named Synthetic DDQN based Subtasks Scheduling, which can offloading. However, due to geographical limitations, placing
make adaptive task scheduling decision in real time. A diffusion many BSs on highway may not be feasible on economic
model-based synthetic experience replay is integrated into the benefits [12]. Thus, for tasks with diverse computation require-
reinforcement learning framework, which can generate sufficient
synthetic data in experience replay buffer, thereby significantly ments, optimizing the tasks scheduling with high-mobility SVs
accelerating convergence and improving sample efficiency. Sim- and the limited BS servers is a challenging problem.
ulation results demonstrate the effectiveness of the proposed In this paper, we investigate the task scheduling problem
algorithm on reducing task completion time, comparing to for highway CAVs. TVs offload subtasks to SVs with avail-
benchmark schemes. able computing resources, but time-varying task computation
requirements and computing resources make it challenging to
I. I NTRODUCTION
make adaptive task scheduling decisions in real-time, which
With advancements in communication and autonomous will prolong the completion time when the required computing
driving technologies, connected autonomous vehicles (CAVs) resources of a subtask exceeds the computing capacity of SVs.
have become increasingly prevalent, catering to people’s traffic Due to the advantages of flexibility and easy deployment, the
demands [1]. They have to execute various computation- unmanned aerial vehicle (UAV) can be deployed for relaying
intensive and delay-sensitive tasks, including perception fu- subtasks to surrounding BS server, aiming at compensating
sion, real-time navigation based on video or Augmented for the offloading needs of TVs when the number of SVs
Reality (AR), and multimedia entertainment, etc [2]. These is insufficient or subtasks have a higher workload. Firstly,
tasks necessitate joint processing to guarantee safe driving with the goal of on-demand tasks scheduling, we construct
while satisfying the quality of service [3]. a task scheduling model, i.e., two-side priority adjustment,
Due the limited computing resources for CAVs, it will for determining the scheduling priority of subtasks while
inevitably prolong tasks completion time when multiple tasks considering mobility and resources for selecting optimal of-
need to be processed simultaneously. To minimize the task floading targets. Second, we formulate the long-term subtasks
completion time, some researches [4] [5] used mobile edge scheduling problem that minimizes the average completion
computing (MEC) to directly offload entire task to the time of overall tasks, and the deep reinforcement learning
base station (BS) for fast processing through the vehicle-to- (DRL)-driven synthetic DDQN based subtasks scheduling
infrastructure (V2I) link, which may cause extended com- algorithm (SDSS) is proposed for solving the problem in a
pletion time due to the additional task transmission. Thus, dynamic environment. Thirdly, simulation results demonstrate
some literature [6] [7] had proposed the partial offloading the effectiveness of the proposed algorithm on reducing task
scheme, one part of the task is assigned to the local vehicle, completion time. The main contributions of this paper are
while the remaining is offloaded to edge servers. Although the summarized as follows:
scheme reduces transmission delay, it remains challenging to • We design a dependency-aware task scheduling strategy
guarantee efficient tasks scheduling due to the high dynamic. to reduce task completion time for CAV networks;
Consequently, vehicle edge computing (VEC) based tasks • We formulate a long-term optimization problem for mini-
scheduling is gradually emerging, other vehicles with available mizing the overall tasks average completion time and then
BS servers
reformulate it into a Markov decision process (MDP); Offloading/Backhaul

• We propose the SDSS algorithm, which integrates RL


Map
Navigation 2 4 loading
1
task 3 5ms

2ms
Perception 1 5 2 43
and diffusion model to make adaptive task scheduling
2ms

Control 3ms Path BS coverage


task input Traffic selection


perception

decisions in real time.


V

The remainder of this paper is organized as follows. We first ……

present the system model and optimization problem in Section


Highway section no covered by BS server Section covered by BS server
II. Section III presents the SDSS algorithm design in detail. Task Vehicle Service Vehicle
Available computing 1
2
4
Computing task UAV coverage
resources 3

The results of the simulation experiments are then presented


and analyzed in Section IV. We finally conclude the whole Fig. 1. System scenario for UAV-assisted highway CAVs task
paper in Section V. scheduling.

II. S YSTEM M ODEL AND P ROBLEM F ORMULATION


We consider a UAV-assisted highway CAV task scheduling respectively. Fig. 2 shows an example of vehicular task DAG
scenario, as depicted in Fig. 1, where a set of TVs can offload with the perception fusion and AR navigation task. They are
the generated computation-intensive tasks that composed of composed of multiple interdependency subtasks, the subtask
several dependent subtasks to SVs directly via vehickle-to- set of task m is denoted as W = {1, · · · , w, · · · , W}.
vehicle (V2V) links, or offload the task to the BS server with Multiple subtasks are modelled as the DAG to describe their
the help of the relay UAV, for satisfying TVs tasks offloading interdependency, the parameter is φw w w
m = (ωm , λm ), where ωm
w
w
demands as needed. In addition, the set of TVs and SVs is and λm represent the workload size and computing resources
denoted as N and S, respectively. required of the w-th subtask of task m. Fig. 2(b) shows an
example of an AR navigation task, the entire task can be
A. Communication Model divided into six subtasks. Control input as the initial subtask,
By allocating the communication resources to different i.e., Subtask 1, its outputted information is used as the input of
transmitting vehicles, the orthogonal frequency-division mul- Subtask 2 map loading and Subtask 3 traffic perception, and
tiple access technique [13] is considered to avoid interference then Subtask 4 path selection outputs the optimal driving path
among multiple vehicles. There are two types of links con- based on the results of Subtask 2 and Subtask 3, while Subtask
sidered in this paper: V2V links and UAV-assisted V2I links. 5 performs video processing based on the perception results
The TVs can offload tasks using no more than one link type. of Subtask 3. Finally, Subtask 6 executes the AR navigation
In V2V links, the data transmission rate for tasks offloading using the results of Subtask 5 and Subtask 4.
from TV n to SV s can be calculated as
! C. Task Scheduling Model
Pn h2n,s
rn,s = Bn,s log2 1 + , (1) 1) Scheduling Time: A series of subtasks can be scheduled
σn2 from TV n to nearby SVs, relayed to BS by UAV, or executed
where Bn,s , Pnt , hn,s , and σn2 represent the channel bandwidth, on the local TV. The completion time of each subtask is
transmitting power at TV n, channel gain for the link between composed of the transmission delay and computing delay. The
TV n and SV s, and the noise power, respectively. transmission delay is ignored when the subtask is executed on
In V2I links, the UAV can relay the tasks from TV n to the a local TV. When subtasks are executed in TV n locally, the
BS server j. The data rate from TV n to the UAV, and from computing time is calculated as
UAV to the BS j can be calculated as follows, Twn = λw
m /fn , (4)
!
Pn h2n,u where λw
rn,u = Bn,u log2 1 + , (2) m and fn are computing resources required to com-
σn2 plete w and available computing capacity of the TV n.
! When one subtask w is offloaded to SVs, the completion
Pu h2u,j time of the subtask is given by
ru,j = Bu,j log2 1 + , (3)
σn2
Tws,total = Twn,s + Tws (5)
where Bn,u and Bu,j represent the channel bandwidth of the
where Twn,s
and Tws
represent the data transmission and com-
links from TV n to UAV u, and from UAV u to BS server j,
puting delay, i.e., Twn,s = ωm
w
/rn,s and Tws = λwm /fs , and the
respectively. Pu is the transmitting power at UAV n, hn,u and
fs indicates available computing capacity of the SV s. When
huj are the channel gain from n to u and from u to j.
the computing demands exceed the computing capacity of the
B. Task Dependency Model SV, the subtasks could be relayed to BS. The completion time
The TV generates M consecutive and computation-intensive of subtask executed in the BS server j is given as
tasks simultaneously. Let Φm = (ωm , τm , fm ) represents the
Twj,total = Twn,u,j + Twj , (6)
characteristic of m-th task, 1 ≤ m ≤ M, where ωm , τm , and
fm represent the computation workload, the maximal tolerant where Twn,u,j and Twj represent the transmission delay from
delay, and the computing resource demand of the m-th task, the SV to BS and computing delay. Then we have Twn,u,j =
where the V2V scheduling of subtask w, the V2I scheduling,
Map
and the local execution are denoted by the binary variable
loading xw,s , yw,j , and zw,n , which are equal to 1 if the V2V
Data Pre- Object Control Path
collection processing detection input
2 4 selection scheduling, UAV-assisted V2I scheduling, and local execution
1 Traffic are used, respectively, otherwise, 0. It is assumed that a subtask
perception
5
w can be executed at only one target: local TV n, one SV s,
3
Decision Env Data Video
6 AR or BS server j. The Eq. (9) must be met to guarantee that each
modeling navagition
planning fusion process subtask can only be offloaded to one target,
(a) Vehicle perception fusion task (b) Vehicle AR navigation task
xw,si + yw,j + zw,n = 1. (9)
Fig. 2. Example of vehicular task DAG. Therefore, considering the scheduling and offloading of
subtasks, the problem of minimizing the overall average com-
w w
pletion time of all subtasks can be formulated as
ωm /rn,u + ωm /ru,j and Twj = λw m /fj , where fj indicates the
M W
available computing capacity of the BS server j. 1 XX
2) Scheduling Strategy: For dependent subtasks, successor P: min τm,w (10a)
αw , xw,s , yw,j , zw,n M m=1 w=1
subtasks can only be executed after completing the predecessor
ones. Due to the dynamic network topology and time-varying s.t. λw
m ≤ fs , ∀w ∈ W, ∀s ∈ S, (10b)
W
computing resources of SVs, it is worth noting when and to X
whom to offload, which will determine whether it can achieve τw ≤ τm , ∀m, (10c)
the lowest completion time. Thus, the scheduling priority of w=1
b
subtasks needs to be arranged while considering the selection Tw+1 ≥ Twe , (10d)
of offloading targets. We design a two-side priority adjustment xw,s + yw,j + zw,n = 1, (10e)
mechanism, aiming at selecting the optimal offloading targets xw,s , yw,j , zw,n ∈ {0, 1}, (10f)
for all subtasks. It includes three parts:
αw ∈ N, (10g)
Subtasks scheduling priority: For serial dependency, the pri-
ority of the predecessor subtasks is higher than the successor where αw , xw,si , yw,j , zw,n represent optimization variables
ones. The scheduling priority of subtask w is denoted as αw , related to two-side priority adjustment, αw is the schedul-
it needs to meet the following constraint ing decision of subtasks, and xw,si , yw,j , zw,n represent the
αw > αw+1 , ∀w ∈ W, (7) offloading decision. Constraint (10b) states the computing
resources of the selected SV cannot be smaller than the
where the successor subtasks only can begin to execute after required computing resource of subtask w. (10c) guarantees
b
completing the predecessor ones, i.e., Tw+1 b
≥ Twe , Tw+1 and the completion time of the task does not exceeds the maximal
e
Tw represent the beginning time of successor subtask w + 1 tolerant delay. (10d) means that the successor subtasks can
and the ending time of predecessor subtask w, respectively. only be executed after completing the predecessor subtasks.
Offloading targets selection priority: First, the priorities (10e) ensures that a subtask can only be assigned to one target.
of SVs are determined by comparing the distance between (10f) and (10g) defines the range of variables.
SV and TV. In addition, the priorities are resorted based on Since the above optimization problem P is nonlinear and
available computing resources fs . Specifically, the SVs with has multiple integer variables, it is difficult to find the optimal
smaller distances and higher computing resources are marked solution within polynomial time under the dynamic subtasks
as the highest selection priority. workload, network environment, and computing resources of
Two-side priority adjustment: To guarantee the minimum SVs. Moreover, the selection of offloading targets also needs
completion time of overall tasks, the priority of offloading to consider the status of each SV and subtasks workload. The
targets needs to be adaptively adjusted based on computing above factors make it difficult to find the optimal scheduling
demands, dynamic topology, and available resources after strategy using traditional optimization methods for the opti-
determining the scheduling priorities of subtasks. The goal is mization problem. Therefore, we propose the data-driven DRL
to minimize the task completion time according to the rapid algorithm [14] to solve the above problem. The key idea is to
adjustment of scheduling and offloading orders. model the long-term scheduling process of dependent subtasks
D. Problem Formulation in the dynamic environment as an MDP.
In this part, we formulate the optimization objective that III. A LGORITHM D ESIGN
minimizes the average completion time of overall tasks. In this section, we design the SDSS algorithm based on
Firstly, the completion time τm of a task m composed of synthetic experience replay (SER) [15]-double deep Q net-
multiple dependency subtasks is denoted as follows, work (DDQN) [16], where DDQN is the representative DRL
W
X algorithm in discrete decisions, the novel combination aims
τm = (xw,s Tws,total + yw,j Twj,total + zw,n Twn ), (8)
w=1
to adaptively control the discrete scheduling decisions while
guaranteeing the efficient strategies exploration by additionally Gradient
descent Q value a noise
(s, a, r, s')
network θ
generating high-reward transitions from the diffusion-based s

Q(s, a;θ)
Xt
SER module. We describe the long-term subtasks scheduling

p(Xt|Xt-1;σt)

q(Xt-1|Xt,X0)
and offloading problem as an MDP and then propose the SDSS ... ... ...

Forward diffusion process

Reverse denoising process


Loss function Lθ
Environment
Xt-1
algorithm to solve it. Network
Synthetic experience

Q(s',argmaxa'Q(s', a';θ);θ')
parameter

... ...
... ...
...
updating syn (s, a, r, s' )
A. MDP Formulation s, a ... ...
Target Q value
X1
network θ'
The optimization problem P is described as an MDP four-

p(X1|X0;σ1)

q(X0|X1)
real
tuple {S, A, P, R}, including state S, action A, transition s' (s, a, r, s' )
... ...
Experience replay buffer X0
probability P, and reward R. The main components are as ... ... ...
(s, a, r, s') Batch
sampling generated
follows. (s, a, r, s')
r SER
1) State: The state is denoted as the combination of sub-
tasks information, scheduling decisions, and offloading targets Fig. 3. Model structure of the SDSS algorithm.
information. The state for subtask w is denoted as

S = {Iw , aw , Or }, (11) and then the target Q value network θ′ uses the action with the
maximum Q value to calculate the expected value of the next
where Iw denotes information of subtasks, including workload state Sw+1 , the calculation of the target Q value is denoted as
w
ωm and interdependency, i.e., the indicator of predecessor and
successor subtask Prew and Sucw . aw denotes the scheduling ytarget = r + γQ (s′ , argmaxa′ Qestim (s′ , a, θ) , θ′ ) , (15)
decision of subtasks. Or is the offloading targets information, where γ represents the discount factor, which can be used
including the available computing resources of SVs and BS to adjust the long-term reward, and its range is set is γ ∈
server fs and fj , and the distance between TVs and SV. (0.1, 0.99). Subsequently, it can obtain the loss function by
2) Action: Each subtask w of the vehicle task m can be comparing the target Q value, i.e., ytarget with the output
computed in local TV n or offloaded to SV s and BS server j. Qestim (s, a, θ) of Q network,
For a whole vehicle task, the scheduling action of all subtasks 1
can be represented as L (θ) =Es,a;s′ { (r + γQ (s′ , argmaxa′ Qestim (s′ , a, θ) ; θ′ )
2
2
aw = {αw , Bw }, (12) −Qestim (s, a; θ)) },
(16)
where αw is the scheduling priority of subtasks and Bw = the above operation of decoupling action selection and value
{xw,s , yw,j , zw,n } as an array denotes the different selection estimation can effectively avoid the overestimation problem of
of offloading targets. the target Q value, which is helpful to improve the reliability
3) Reward Function: The reward function is designed as of the high dimension discrete strategies.
the negative increment of delay after making a scheduling On the other hand, the main goal placing the experience
action, it is denoted as replay buffer between the networks and the environment
interaction is to store the transitions obtained from the agent’s
R (sw , aw ) = −∆τw , (13) interaction with the environment. These stored transitions
(s, a, r, s′ ) , as the exploration experience, can provide training
where ∆τw is represented as the difference of completion
data for these networks, which also makes the training samples
delay between two adjacent subtasks scheduling decisions, i.e.,
aw+1 of the agent have a certain diversity.
∆τw = τw+1 − τwaw . The goal is to find the optimal decision
In practice, there exist a large number of ineffective tran-
to obtain the maximum cumulative reward.
sitions in the initial strategy exploration stage. It will in-
B. SDSS Algorithm evitably cause the difficulty in convergence by sampling and
utilizing these data for agent training. Therefore, to improve
The model structure of the SDSS algorithm based on the the algorithm convergence ability and sample efficiency, we
Synthtic-DDQN is shown in Fig. 3, the algorithm structure consider using the emerging generative diffusion model [17] to
mainly includes Q value network and target Q network and provide real and synthetic transitions for agent training through
an improved experience replay buffer. diffusion-based transitions generation, i.e., the SER module.
On the one hand, the two networks represent the Q value Two steps of SER module are as follows.
network θ, i.e., policy network and target Q value network 1) The forward noising: For a original data distribution
θ′ . The policy network estimates the Q value of all possible p (x) with standard deviation σ, x represents the real tran-
actions aw in state Sw and outputs the action a corresponding sitions data, considering the noised distribution p(x; σ) that
to the maximum Q value, the calculation of action selection obtained by adding independent and identically distributed
process is shown as Gaussian noise of deviation σ to the p (x), i.e., p (xt |x0 ; σn ),
which will make the final data distribution with unknown noise
amax (s, θ) = argmaxa′ Qestim (s, a, θ) , (14) become the indistinguishable random noise.
Algorithm 1 Proposed SDSS algorithm TABLE I: Simulation parameters.
Input: The subtask info Iw , offloading target info Or , data Parameter Value
ratio r, and discount factor γ; Number of TVs and SVs (N, S) {2, 5}
Output: The estimation values of two Q networks; Number of subtasks of single task (W ) {4∼6}
Computing power of TVs, SVs (fn , fs ) {2, 2∼8} GHz
1: Initialize experience replay buffer D, denoising model, Computing power of BS server (fj ) 50 GHz
networks parameter θ and θ′ ; Transmit power of vehicle and UAV (Pn , Pu ) {20, 30} dBm
2: for episode = 1: M do Bandwidth of vehicle and UAV (Bn , Bu ) {5, 10} MHz
Maximum tolerant delay of single task (τm ) 650 ms
3: episode = 1, Reset the environment and initialize state Workload size of a subtask (ωm w) {500∼5000} KB
space S; Length of highway section 1 km
4: for step = 1: T do Learning rate and discount factor {0.001, 0.95}
Experience replay size 100000
5: step = 1, agent outputs actions a, then selects action Denoising step 10
and executes, argmaxa′ Qestim (s, a, θ);
6: Complete subtask scheduling and offloading target
selection while calculating rw , and Sw ← Sw+1 ;
7: Store the transitions (s, a, r, s′ ) into D;
8: The real transitions from D are sampled into SER
module for updating the forward diffusion process;
9: Generate samples from reverse denosing process by
diffusion step iteratively, and adding to D, D ←
Dsyn ;
10: Train the agent makes scheduling policy by sampling
from D with ratio r;
11: Calculating the target Q value using Eq. (15);
12: The Q network θ is updated by gradient descent of
loss function Eq. (16);
13: step + 1
14: end for
15: episode + 1 Fig. 4. The algorithm convergence curve.
16: end for

under diverse workload and computing capacity conditions in


2) The reverse denoising: The essential of denoising performance comparisons.
process is to learn iteratively to reverse forward noising
A. Validation of Analytical Results
process and generate samples from unknown noise, i.e.,
q(xt |xt−1 ; x0 ). For the training of the denoising model, it Figure 4 shows the reward convergence curve of the SDSS
begins with the training of DDQN, and applying the score algorithm compared with DDQN, where the curve is the mean
function based ordinary differential equation solver [18] to value with five random seed simulations, and the shadow part
perform the denoising process. represents their positive and negative standard deviation. It
The Alg. 1 is designed to solve the optimization problem is obvious that the proposed SDSS algorithm can achieve
of the subtasks scheduling for continuous CAVs task. rapid convergence and higher reward, and the stability of the
IV. P ERFORMANCE E VALUATION algorithm is better than the original DDQN. The main reason
is that SDSS can train the agent with generated experience data
In this section, simulation results are provided to demon- under the original strategy exploration stage. When the agent
strate the effectiveness of the proposed SDSS algorithm. The begins to learn the discrete decision of subtasks offloading,
traffic simulator Simulation of Urban MObility (SUMO) [19] the generated transitions from the real interaction experiences
is used to generate vehicle mobility. The DAG generator [20] can help it avoid the inefficient trial and error process, which
is used to simulate task DAG with different dependencies. We not only can reduce the train time but also improve the
consider the simulation experiment of CAV tasks scheduling in performance of discrete decisions.
a section of one km highway lacking BS coverage, where the
single relaying UAV can hover to cover the whole section. The B. Performance Comparisons
simulation parameters and the algorithm hyperparameter con- In this part, the average completion time of the proposed
figurations are listed in Table I. Next, we will first compare and SDSS algorithm is compared with that of the DDQN and
analyze the convergence performance of the proposed SDSS random offloading schemes. We consider evaluating algorithm
algorithm with the original DDQN algorithm [16] in part of performance under different subtasks workload size and SV
validation of analytical results, and compare the algorithms computing capacity, i.e., 1) subtasks workload size (Unit:
performance with the random subtasks scheduling scheme KB): four subtasks and three combinations with low, high,
R EFERENCES
[1] M. Ahmed, M. A. Mirza, S. Raza, H. Ahmad, F. Xu, W. U. Khan,
Q. Lin, and Z. Han, “Vehicular Communication Network Enabled CAV
Data Offloading: A Review,” IEEE Trans. Intell. Transp, vol. 24, no. 8,
pp. 7869–7897, 2023.
[2] P. Abdisarabshali, M. Liwang, A. Rajabzadeh, M. Ahmadi, and S. Hos-
seinalipour, “Decomposition Theory Meets Reliability Analysis: Pro-
cessing of Computation-Intensive Dependent Tasks Over Vehicular
Clouds with Dynamic Resources,” IEEE/ACM Trans. Netw., vol. 32,
no. 1, pp. 475–490, 2024.
(a) Delay vs subtasks workload size (b) Delay vs SVs computing capacity
[3] K. Qu, W. Zhuang, Q. Ye, W. Wu, and X. Shen, “Model-Assisted Learn-
ing for Adaptive Cooperative Perception of Connected Autonomous
Fig. 5. The performance comparison. Vehicles,” IEEE Trans. Wireless Commun, vol. 23, no. 8, pp. 8820–
8835, 2024.
[4] G. Ma, M. Hu, X. Wang, H. Li, Y. Bian, K. Zhu, and D. Wu, “Joint
Partial Offloading and Resource Allocation for Vehicular Federated
and mix workload size, i.e., W1 (500, 600, 700, 800), W2 Learning Tasks,” IEEE Trans. Intell. Transp, vol. 25, no. 8, pp. 8444–
(4000, 4300, 4600, 4900), and W3 (800, 2500, 1200, 4500). 8459, 2024.
2) SVs computing capacity (Unit: GHz): five SVs and three [5] Y. Sun, Z. Wu, K. Meng, and Y. Zheng, “Vehicular Task Offloading
and Job Scheduling Method Based on Cloud-Edge Computing,” IEEE
combinations with low, middle, and high available computing Trans. Intell. Transp, vol. 24, no. 12, pp. 14 651–14 662, 2023.
resources, i.e., C1 (2, 3, 4, 3, 2), C2 (4, 5, 5, 8, 3), and C3 [6] L. Zhao, Z. Zhao, E. Zhang, A. Hawbani, A. Y. Al-Dubai, Z. Tan,
(6, 7, 5, 8, 8). and A. Hussain, “A Digital Twin-Assisted Intelligent Partial Offloading
Approach for Vehicular Edge Computing,” IEEE J. Sel. Areas Commun,
Figure 5 indicates the performance comparison with two vol. 41, no. 11, pp. 3386–3400, 2023.
schemes under different subtasks workload size and SVs [7] H. Zhang, X. Liu, Y. Xu, D. Li, C. Yuen, and Q. Xue, “Partial Offloading
computing capacity. In Fig. 5(a), we compare the average and Resource Allocation for MEC-Assisted Vehicular Networks,” IEEE
Trans. Veh. Technol, vol. 73, no. 1, pp. 1276–1288, 2024.
completion time of different algorithms. It can be seen that the [8] B. Hazarika, K. Singh, C.-P. Li, A. Schmeink, and K. F. Tsang, “RADiT:
proposed SDSS algorithm can achieve a lower task completion Resource Allocation in Digital Twin-Driven UAV-Aided Internet of
time under different computing requirements, which demon- Vehicle Networks,” IEEE J. Sel. Areas Commun, 2023.
[9] Q. Shen, B.-J. Hu, and E. Xia, “Dependency-Aware Task Offloading
strates the higher robustness of scheduling decisions. The main and Service Caching in Vehicular Edge Computing,” IEEE Trans. Veh.
reason is that SDSS can adaptively adjust the offloading target Technol, vol. 71, no. 12, pp. 13 182–13 197, 2022.
selection, while determining the subtasks scheduling priority. [10] Z. Wang, G. Sun, H. Su, H. Yu, B. Lei, and M. Guizani, “Low-
Latency Scheduling Approach for Dependent Tasks in MEC-Enabled
It can achieve the efficient subtasks offloading by estimating 5G Vehicular Networks,” IEEE Internet Things J, vol. 11, no. 4, pp.
the target states, subtask completion time, and overall task 6278–6289, 2024.
completion time. [11] S. E. Mahmoodi, R. N. Uma, and K. P. Subbalakshmi, “Optimal Joint
Scheduling and Cloud Offloading for Mobile Applications,” IEEE Trans.
Figure 5(b) illustrates that subtasks with high workload Cloud Comput, vol. 7, no. 2, pp. 301–313, 2019.
can be relayed to the BS server through UAV, but the data [12] M. Samir, D. Ebrahimi, C. Assi, S. Sharafeddine, and A. Ghrayeb,
transmission overhead cannot be ignored. The computing “Leveraging UAVs for Coverage in Cell-Free Vehicular Networks: A
Deep Reinforcement Learning Approach,” IEEE Trans. Mob. Comput,
resources of SVs need to be effectively utilized to complete all vol. 20, no. 9, pp. 2835–2847, 2021.
subtasks within the tolerant delay. Regarding combination C1, [13] Y. Liu, S. Wang, Q. Zhao, S. Du, A. Zhou, X. Ma, and F. Yang,
due to the low computing resources of SVs, SDSS can choose “Dependency-Aware Task Scheduling in Vehicular Edge Computing,”
IEEE Internet Things J, vol. 7, no. 6, pp. 4961–4971, 2020.
to offload some subtasks with high workload to BS by UAV, [14] R. S. Sutton, “Reinforcement Learning: An Introduction,” A Bradford
which will lead to a higher completion delay. When existing Book, 2018.
the SV with high computing resources, SDSS can adaptively [15] C. Lu, P. Ball, Y. W. Teh, and J. Parker-Holder, “Synthetic Experience
Replay,” in Proc. of NeurIPS, vol. 36, 2024.
make subtasks offloading decisions to reduce overall task [16] H. Van Hasselt, A. Guez, and D. Silver, “Deep Reinforcement Learning
average completion delay. with Double q-Learning,” in Proc. of AAAI, 2016.
[17] H. Cao, C. Tan, Z. Gao, Y. Xu, G. Chen, P.-A. Heng, and S. Z. Li,
V. C ONCLUSION “A Survey on Generative Diffusion Models,” IEEE Trans. Knowl. Data
Eng, vol. 36, no. 7, pp. 2814–2830, 2024.
In this paper, we have studied the task scheduling problem [18] T. Karras, M. Aittala, T. Aila, and S. Laine, “Elucidating the Design
for CAV dependency subtasks offloading in highway scenario. Space of Diffusion-Based Generative Models,” in Proc. of NeurIPS,
Firstly, the task dependency model and scheduling model have 2022.
[19] P. A. Lopez, M. Behrisch, L. Bieker-Walz, J. Erdmann, Y.-P. Flotterod,
been established to formulate the optimization problem of R. Hilbrich, L. Lucken, J. Rummel, P. Wagner, and E. Wiessner,
minimizing the average completion time of overall tasks. Then, “Microscopic Traffic Simulation Using SUMO,” in Proc. of IEEE ITSC,
the long-term optimization problem is reformulated as an MDP 2018, pp. 2575–2582.
[20] C. Shu, Z. Zhao, Y. Han, G. Min, and H. Duan, “Multi-User Offloading
to find the optimal subtask scheduling strategies. Moreover, to for Edge Computing Networks: A Dependency-Aware and Latency-
achieve the objective, we have designed the SDSS algorithm Optimal Approach,” IEEE Internet Things J, vol. 7, no. 3, pp. 1678–
based on synthetic-DDQN. Finally, we have shown that SDSS 1689, 2020.
can yield faster scheduling decision exploration and lower
task completion time in a dynamic environment than other
schemes under different scales of subtasks workload size and
SVs computing capacity.

You might also like