Art 10 - Repeated Stochastic Bargaining Game To Share Queuing Network Resources
Art 10 - Repeated Stochastic Bargaining Game To Share Queuing Network Resources
gi ( pi* , µ ) ≥ g i ( pi , µ ), pi ≥ 0, i = 1,L , n, µ =
∑p i
(5)
C
T. B. Ravaliminoarimalalason, PhD School in Sciences and Technical
Engineering and Innovation, University of Antananarivo, Antananarivo, In that case, the solution ci* = pi* µ is an optimal solution
Madagascar, + 261340016433.
H. Z. Andriamanohisoa, PhD School in Sciences and Technical satisfying (1).
Engineering and Innovation, University of Antananarivo, Antananarivo, To prove it, we can use the lagrangian method to establish
Madagascar, + 261341033943. that the conditions in (5) are identical to the conditions in (1)
F. Randimbindrainibe, PhD School in Sciences and Technical
Engineering and Innovation, University of Antananarivo, Antananarivo, with ci* = pi* µ .
Madagascar, + 261340646690.
49 www.erpublication.org
Repeated stochastic bargaining game to share queuing network resources
B. A repeated game At each step of the game, given the current state e ∈ E ,
With repeated game we can model situations where players choose strategies s = (s1 ,L , sN ) to play. Each player
players interact repetitively playing the same game [4]-[6]. i own a gain gi (e, s) , and then, the system goes from state e
They choose their actions simultaneously without knowing
to state eʹ according to the transition model T who satisfies
the choices of other players. So, a basic game is played in
(7).
each period of a discrete time t = 1, 2, ... , and at the end of
each period, the players observe the performed actions. ∑ T (e, s, eʹ) = 1
eʹ∈E
(7)
We are given a game in normal form
Card Si
(
J = N , ( Si )i∈• , ( gi )i∈• ) where N is the players set, Si the We call a policy π i : E → [0,1] the vector whose
set of possible strategies for player i, and g i his associated elements define a probability distribution over the strategies
of player i, specific to a game in normal form defined by the
gain. state e. for player i, the policy defines a local strategy in
At the step t = 1 , each player i chooses an action si1 ∈ Si each state within the meaning of game theory. The expected
independently of other players. Let’s denote s1 = ( si1 )i∈• the utility refers to the expected gain on the strategies of
opposing players. For a joint policy π = (π1 ,L , π N ) , we
vector joint played actions in step 1. At the end of this step,
define the expected utility of player i for each state e as
s1 is revealed to all players. expressed in (8).
At step t ( t ≥ 2 ), knowing the history
ui (π , e) = E s∈S [ gi (e, s)] (8)
ht = ( s1 , s 2 ,L , s t −1 ) of the played actions during the past,
each player i chooses an action sit ∈ Si independently of where S = S1 ×L × SN , and E denotes the expectation
other players. Let’s denote H t the set of possible history at function.
step t. And therefore, we can also define the utility U i (π , e) of
It remains to define the gain function. For this, we should states for player i, associated to a joint policy π , as the
determine how the players evaluate the result of an infinite expected utility for player i from the state e if all players
length of history. Indeed, if ( s1 , s 2 ,L ) ∈ H ∞ , the player i follow this joint policy.
receive a gain g i ( s t ) at step t. In a model of discounted ∞
U i (π , e) = E s∈S ⎡⎢ ∑ δ t −1ui (π , e) | e0 = e ⎤⎥
game, the player gives more weight to a unit of gain ⎣ t =1 ⎦ (9)
received today compared to a unit of gain received = ui (π , e) + δ ∑ ∑ T (e, s, eʹ).π (e, s).U i (π , eʹ)
tomorrow. For this, we will use a discount factor δ ∈ ]0,1[ . s∈S eʹ∈E
Thus, one unit of gain owned at step 2 is only δ unit of gain where π (e, s) designates the probability of the joint
in step 1, and one unit of gain owned at step t is δ t −1 unit of strategies s on the state e according to the joint policy π ,
gain in step 1. and δ is the discount factor.
In this context, the gain owned by player i during a game In a stochastic game, a Nash equilibrium is a vector
play h = ( s1 , s 2 ,L ) and evaluated at time t = 1 is expressed strategy π * = (π 1* ,L , π N * ) as for all state e ∈ E and for all
by (6) [6]. player i [9] :
U i ( (π i* , π −*i ), e ) ≥ U i ( (π i , π −*i ), e ) , ∀π i ∈ Π i
∞
giδ (h) = (1 − δ )∑ δ t −1 gi (st ) (6) (10)
t =1
where Π i is the set of policies offered to player i. The
In (6), the factor (1 − δ ) is a normalization factor to take the
gain back to the same unit at any step. notation (π i* , π −*i ) means the vector of policies π * where
C. A stochastic game
π i* is the policy of player i and π −*i the joint policy (π *j ) of
j ≠i
Stochastic games are extension of Markov process in case players other than i.
of several agents called players in a common environment
[7]-[9]. These players play a joint action which defines the III. MODEL OF CUSTOMERS AND QUEUING
owned gains and the new state of the environment. NETWORK
A stochastic game can be defined by a quintuplet
A. Principle
( N , E, ( S ) , ( g )
i i∈• i i∈• )
, T where:
On a given queue, customers arrive and others leave. Each
- N is the set of players who act to the game customer needs a total of resource bi to fulfill his
- E is the finite set of states of the game
requirement that he asked to the server. In the following, we
- Si is the set of possible strategies (actions) for player i put that these resources are sampled and shareable in order
- g i is the gain function, which is a function of the state to work in the discrete domain. At the beginning of each
of the game and the strategies played by all players : time interval t , each player i sends his strategy sit for the
gi : E × S1 ×L × S N → R bargaining of the resource C of the server. We restrict to a
- T is the transition model between states, which finite countable strategies set S i . The server will evaluate
depends to joint strategies :
these proposals to compute the resources cit that he will
T : E × S1 ×L × SN × E → [0,1]
assign to the players. Each resource cit has a price pit that
50 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-7, Issue-1, January 2017
the server sends with the resource to the player. Once Let git : Bi × S1 ×L × S → R be a function gain for the
Nt
received, each resource will be used by each player and they
will calculate their gains. They also assess the remains of player, function of the local state bit and the joint strategies
their respective requirement function of the consumed s t = (s1t , !, sNt t ). We evaluate this gain from the allocated
resources. We precise that the player requirement at the
beginning of time t was assessed at the end of time resources, which are themselves based on price proposals
t − 1 after using the resource cit −1 allocated at this time. s t done by all players.
The transition model between local states is defined by
As we consider the mobility of the players, each of them
will decide from his requirement whether he will stay in his the function Ti t : Bi × S1 × ! × S N t × Bi → [0,1] as expressed
current queue, or move to the next queue. Initially, when the by (13) :
customer i arrives in a queue, the first requirement is noted t t t +1
bi0 . Over time, depending on the allocated resources, that ∑ T (b , s , b
i i i ) =1 (13)
bit +1∈Bi
t
requirement becomes b as : i
Since the requirements (local states) bit +1 and bit are
t t −1 t −1
b =b −c
i i i (11) dependent, and are also function of the joint strategy s t , we
The decision of the customer is determined by (12). can say that this function can be well defined. We will
⎧ stay if b > 0 t further evaluate this transition function. From these data, it
i
dec(bit ) = ⎨ t
(12) is possible to model the actions and movements of the
⎩move if bi ≤ 0 players with a stochastic game defined by the quintuplet
This principle is illustrated on Fig. 1. ( N t , ( Bi ), ( Si ), ( g i ), (Ti )).
C. Bargaining of the resource of the server
Given a resource C of the server, it will be bargained
through the customers of the queue, here called as players.
At time t , each player must maximize his gain on the
possible proposals set Pi as shown on (14) obtained from (4)
⎛ pt ⎞
g it ( pit , µ t ) = ui ⎜⎜ it ⎟⎟ − pit (14)
⎝µ ⎠
On (14), the function u i (cit ) is the utility function of
player i regarding the resource cit that the server has
allocated after the bargaining computation. This function
must be a concave function as described on paragraph II.
And to better assess the allocated resource, it is necessary
that this utility function u i is also function of the
requirement bit . We can use, for example, a logarithmic or
quadratic valuation given in (15) and (16).
log(cit )
ui (cit , bit ) = A (15)
log(bit )
2
Fig. 1. Games at time t ⎛ ct ⎞
ui (c , b ) = A − B⎜⎜ it
t
i i
t
⎟
⎟ (16)
⎝ bi ⎠
B. The game formulation
where A and B are arbitrary positive constants. The proof of
Let’s model these actions and movements by a stochastic
the concavity of these functions comes from the negativity
game. This is a stochastic game between N t players. The of their second derivatives.
number of players N t varies over time as players arrive to or Players send to server their proposals
depart from the queue according to their requirements. t
pt = ( p1t ,L , pNt t ) ∈ R N . Once received by the server, it
At time t , for player i , the local state of the game is
defined by the requirement bit . By its finite cardinal, let’s computes the resource to allocate c t = (c1t ,!, c t t ).
N
note Bi the set of possible requirements of player i . For On (14), the price µ t is not yet known beforehand, so the
player i , let’s put sit the strategy that he proposes to the players are not able to compute the optimal proposal pit .
server. The set of possible proposals, noted S i = { sit }, is also We suppose that price anticipation described by (3) is used.
The function gain that he must maximize is defined by (17)
the set of possible strategies for that player. This strategy is
developed on the next paragraph. assuming that ∑ j ≠i p tj is constant.
51 www.erpublication.org
Repeated stochastic bargaining game to share queuing network resources
d it The problem is how the player i can find this optimal policy
and C means the allocated resource to player i at time
∑ j d tj π i* .
t. Equation (25) shows that the policy of player i ensuring
The gain owned by that player at this time is expressed by the best response depends on other players policies, that is a
(22). function of other players states. Player i doesn’t know other
players states, so he doesn’t able to compute his optimal
git (dit ) = ui (dit ) − µ t dit (22) policy. However, he can optimize only his immediate gain
git (bit , d it ) . The expected gain is therefore discounted by a
52 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-7, Issue-1, January 2017
ti
f During 10 minutes of simulation, we get the results below.
= ∑ δ ti −t ( ui (π m ) − µ t π m )
t =tii
53 www.erpublication.org
Repeated stochastic bargaining game to share queuing network resources
Fig. 4. Evolution of the average sojourn time on each queue on a Fig. 6. Evolution of the average number of packets on each queue
stable system on an unstable system
As in Fig. 4, the average sojourn time is almost identical
for PS queue and MYOPIC queue. There is a stable
difference around 0.7 second between them, where the
sojourn time for PS queue is greater than the one for
MYOPIC queue.
54 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869 (O) 2454-4698 (P), Volume-7, Issue-1, January 2017
Source 607
Let’s precise that the Source sends the same number of
packets at the same time with a same distribution, but this
big difference is due to the scheduling and processing on
each queue.
We can say that our MYOPIC system can better manage the
packets in case of instability compared to the classic PS
queue (e.g: in case of temporary congestion).
V. CONCLUSION
Our contribution consists of a new way to manage the
resource of queue. Our methodology is based on a repeated
stochastic bargaining game to share the resources of a
queuing network. We introduced a new principle of a
myopic player who doesn’t optimize his future gain by the
history of the game. The simulation shows the performance
of our model which has a better scheduling during an
instability period.
REFERENCES
[1] J.F. Nash, “The Bargaining Problem”, Econometrica, vol. 18, 1950,
pp. 155-162
[2] T.B. Ravaliminoarimalalason, H.Z. Andriamanohisoa, F.
Randimbindrainibe, “Modélisation de partage de ressources dans un
réseau de files d’attente à processeur partagé par la théorie des jeux de
marchandage. Résolution algorithmique.”, MADA-ETI, vol. 2, 2014,
pp 28-39,
[3] R. Johari, “The price of anarchy and the design of scalable resource
allocation mechanisms”, Algorithmic game theory, Cambridge
University Press, 2007, pp. 543-568
[4] G.J. Mailath, L. Samuelson, Repeated Games and Reputations :
Long-run relationships, Oxford University Press, 2006
[5] J.-F. Mertens, S. Sorin, S. Zamir, “Repeated games”, Core D.P. -
Université Catholique de Louvain, 1994, pp. 20-22
[6] D. Abreu, “Theory of Infinitely Repeated Games with Discounting”,
Econometrica, vol. 56, 1988, pp. 383-396
[7] L.S. Shapley, “Stochastic games”, Proceedings of the National
Academy of Sciences of the U. S. A., vol. 39, 1953, pp. 1095-1100
[8] A. Neyman, S. Sorin, “Stochastic games and applications”, NATO
Science Series, Kluwer Academic Publishers, 2003
[9] A.M. Fink, “Equilibrium in a stochastic n-person game”, Journal of
Sciences of Hiroshima University, vol.28, 1964, pp. 89-93
[10] K.A. Chew, Introduction to OPNET Modeler, Centre for
Communication Systems Research, University of Surrey - Guildford -
UK, 2002
[11] E. Kalai, M. Smorodinsky, “Other Solutions to Nash’s Bargaining
Problem”, Econometrica, vol. 43, 1975, pp. 513-518
55 www.erpublication.org