0% found this document useful (0 votes)
77 views

05 Cart Pole

The document provides the correct equations that describe the dynamics of the cart-pole system, as the classic papers that introduced this problem contained mistakes in their equations. It identifies two mistakes in the equations presented in a seminal 1983 paper: the friction force was incorrectly modeled, and the gravitational acceleration was mistakenly given a negative value. The document then derives the fully correct equations of motion for the cart position, pole angle, and friction force from first principles using Newton's laws of motion.

Uploaded by

sonytin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

05 Cart Pole

The document provides the correct equations that describe the dynamics of the cart-pole system, as the classic papers that introduced this problem contained mistakes in their equations. It identifies two mistakes in the equations presented in a seminal 1983 paper: the friction force was incorrectly modeled, and the gravitational acceleration was mistakenly given a negative value. The document then derives the fully correct equations of motion for the cart position, pole angle, and friction force from first principles using Newton's laws of motion.

Uploaded by

sonytin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Correct equations for the dynamics of the cart-pole system

R zvan V. Florian a Center for Cognitive and Neural Studies (Coneural) Str. Saturn 24, 400504 Cluj-Napoca, Romania [email protected] July 11, 2005; updated February 10, 2007
Abstract The problem of balancing a pole on a moving cart is a widely used benchmark problem for testing reinforcement learning algorithms. The classic papers that introduced this problem contain mistakes in the equations that govern the dynamics of the cart-pole system, and these mistakes propagated in other studies that used the same problem as a benchmark. Here we provide the equations that describe correctly the dynamics of the system.

Introduction

The control of a cart-pole system is widely used as a benchmark problem for testing the efciency of reinforcement learning algorithms. It seems to have been rst used as a test problem in adaptive control by Michie and Chambers (1968a,b) and became a more famous problem since its use in the paper of Barto et al. (1983). Google Scholar1 reports about 500 papers citing this paper, and about 100 papers containing the words cart pole or cartpole. There are, however, two mistakes in the equations from Barto et al. (1983) that describe the dynamics of the cart pole. One mistake introduces a difference between the reported equations and the equations describing a correct physical model, and the other mistake is probably a typo. The mistakes propagated in other papers that followed the original paper of Barto (e.g., Anderson, 1986; Schmidhuber, 1990; Si and Wang, 2001). The existence of these mistakes does not affect the validity of the reinforcement learning algorithms presented in the papers using them, because the problematic equations still describe a complex dynamical system. However, we believe that it is useful to be corrected, for the sake of scientic rigor. Here we provide the equations that describe correctly, from a physical point of view, the dynamics of the system.

uz ux uy ux '

' uy

Gp N -N Nc FF Gc F

Figure 1: The cart-pole system.

The system

The studied system is a cart of which a rigid pole is hinged (see gure). The cart is free to move within the bounds of a one-dimensional track. The pole can move in the vertical plane parallel to the track. The controller can apply a force F to the cart, parallel to the track. The cart has a mass mc ; the pole has mass m p and length 2l. We note with x the position of the cart along the track. The angle between the pole and the vertical is . The friction coefcient between the cart and the track is c ; there also exists friction in the articulation connecting the pole to the cart that leads to a torque p .

The wrong equations and the mistakes

The equations reported in Barto et al. (1983) are: F m p l 2 sin + c sgn(x) p mc + m p mp l l x= 4 m p cos2 3 mc + m p

g sin + cos =

(1)

F + m p l [ 2 sin cos ] c sgn(x) mc + m p

(2)

The mistake inherent in these equations is related to the friction force between the cart and the track, that is considered to be c sgn(x). The mistake is apparent even by
1 http://scholar.google.com/

dimensional analysis, observing that c sgn(x) is adimensional instead of having the dimensionality of a force. In fact, the friction force is the product between the friction coefcient and the magnitude of the force normal to the track; and the force normal to the track is not constant, because the movement of the pole induces a variation in its magnitude. A second mistake in Barto et al. (1983) is to consider the gravitational acceleration g in these equations to be negative (in the paper of Barto is specied that g = 9.8 m/s2 ). In fact, these equations are consistent with a positive value of g. Otherwise, the term g sin would produce a negative for a small , according to Eq. 1, meaning that the gravity pushes the pole towards the vertical position; this is, of course, wrong.

Correct equations

We consider that the cart acts with a reaction force N on the pole, at the articulation. According to the law of action-reaction, the pole will act on the cart with a force N. By applying Newtons second law to the cart we get: F + F f + Gc N + Nc = mc ac , (3)

where Ff is the friction force between the cart and the track that acts on the cart, and ac is the acceleration of the cart. We have F = F ux ; F f = Ff ux ; Gc = mc g uy ; N = Nx ux Ny uy ; Nc = Nc uy ; ac = x ux ; ux , uy and uz are the unit vectors of the laboratory frame of reference (see gure). Decomposing the previous equation on the x and y axis we get F Ff Nx = mc x mc g + Ny Nc = 0. (4) (5)

According to the Coulomb model of friction, and assuming that the track limits the movement of the cart both downwards and upwards, the friction force is Ff = c |Nc | sgn(x) = c Nc sgn(Nc x). By applying Newtons second law to the linear movement of the pole we get: N + Gp = mp ap, (7) (6)

where G p = m p g uy . The acceleration a p of the center of mass of the pole is due to the composed effects of the acceleration of the cart it is attached to, and of the rotation of the pole with angular velocity = uz and angular acceleration = uz : a p = ac + r p + ( r p ), (8)

where r p = l (sin ux cos uy ) is the vector representing the position of the center of mass of the pole relative to the articulation around which the pole rotates. Thus, we get a p = x ux + l uz (sin ux cos uy ) + l 2 uz [uz (sin ux cos uy )]. We have uz ux = uy and uz uy = ux . Hence, a p = x ux + l (sin uy + cos ux ) l 2 (sin ux cos uy ). 3 (10) (9)

An alternative way to get to the previous equation is to compute directly the accelera tion component due to the angular velocity, l 2 ux , and the acceleration component due to the angular acceleration, l uy , and expressing the unit vectors ux and uy of the frame of reference rotating with the pole in the laboratory frame of reference. By introducing Eq. 10 into 7 and decomposing on the x and y axis we get Nx = m p (x + l cos l 2 sin ) m p g Ny = m p (l sin + l 2 cos ) (11) (12)

By applying Newtons second law to the rotational movement of the pole around the articulation (that moves with acceleration ac relative to the laboratory frame of reference) we get: M = I + r p ac , (13) where M = r p G p p uz is the sum of the non-inertial torques acting on the pole relative to the articulation, I = 4/3 m p l 2 is the moment of inertia of the pole relative to the articulation, and r p ac can be interpreted as the torque generated by the inertial force caused by the acceleration of the cart. Hence, we get m p g l sin p = 4/3 m p l 2 + m p x l cos . From Eq. 4 and 11 we get: x= F + m p l ( 2 sin cos ) Ff , mc + m p (15) (14)

and by introducing this into Eq. 14 we get g sin + cos = l F m p l 2 sin + Ff p mc + m p mp l 4 m p cos2 3 mc + m p

(16)

We see that the last two equations are the same as Bartos equations 1 and 2, with the difference that Barto used a form of the friction force Ff that is wrong. Indeed, from Eq. 6, 5 and 12 we get Nc = (mc + m p ) g m p l ( sin + 2 cos ) Ff = c [(mc + m p ) g m p l ( sin + 2 cos )] sgn(Nc x). By introducing the previous equation in Eq. 15 and then in 16, we get g sin + cos = F m p l 2 [sin + c sgn(Nc x) cos ] p + c g sgn(Nc x) mc + m p mp l l 4 m p cos [cos c sgn(Nc x)] 3 mc + m p (19) (17) (18)

Conclusion

In conclusion, we have provided dynamical equations for the cart-pole system that are correct from a physical point of view. They are: Nc = (mc + m p ) g m p l ( sin + 2 cos ) g sin + cos = (20)

F m p l 2 [sin + c sgn(Nc x) cos ] p + c g sgn(Nc x) mc + m p mp l l 4 m p cos [cos c sgn(Nc x)] 3 mc + m p (21) l ( 2 sin cos ) c Nc sgn(Nc x) . mc + m p

x=

F + mp

(22)

During a simulation of the system, at each timestep, we may assume that Nc has the same sign as at the previous timestep (we may consider it to be positive at the beginning of the simulation) and compute according to Eq. 21. We then compute Nc using the that we obtained, according to Eq. 20; if Nc changes sign, we compute again value of taking into account the new sign. Finally, we compute x according to 22. Usually, for common choices of the parameters, Nc will be always positive, as the cart should not try to jump off the track. If we neglect friction, the equations are g sin + cos = l x= F m p l 2 sin mc + m p

4 m p cos2 3 mc + m p

(23)

F + m p l ( 2 sin cos ) . mc + m p

(24)

In these equations, g is positive, and not negative, as mistakenly indicated in the paper of Barto.

Acknowledgements

I thank to James Knight for spotting a typo in a previous version of this paper.

References
Anderson, C. W. (1986), Learning and Problem Solving with Multilayer Connectionist Systems, PhD thesis, University of Massachussets, Department of Computer and Information Science. http://www.cs.colostate.edu/anderson/res/rl/chuck-diss.pdf Barto, A. G., Sutton, R. S. and Anderson, C. (1983), Neuron-like adaptive elements that can solve difcult learning control problems, IEEE Transactions on Systems, 5

Man, and Cybernetics 13, 834846. http://www.cs.ualberta.ca/sutton/papers/barto-sutton-anderson-83.pdf.gz Michie, D. and Chambers, R. A. (1968a), BOXES: An experiment in adaptive control, in E. Dale and D. Michie, eds, Machine Intelligence 2, Oliver and Boyd, Edinburgh, pp. 137152. Michie, D. and Chambers, R. A. (1968b), Boxes as a model of pattern formation, in C. H. Waddington, ed., Towards a theoretical biology, Vol. 1, Edinburgh University Press, Edinburgh, pp. 206215. Schmidhuber, J. (1990), Networks adjusting networks, in J. Kindermann and A. Linden, eds, Proceedings of Distributed Adaptive Neural Information Processing, Oldenbourg, pp. 197208. Extended version: TR FKI-125-90 (revised), Institut f r Inu formatik, TUM. ftp://ftp.idsia.ch/pub/juergen/fki125.ps.gz Si, J. and Wang, Y.-T. (2001), On-line learning control by association and reinforcement, IEEE Transactions on Neural Networks 12(2), 264276. http://ebrains.la.asu.edu/jennie/siandwang01.pdf

You might also like