Efficient C Tip #13 - Use The Modulus (%) Operator With Caution Stack Overflow
Efficient C Tip #13 - Use The Modulus (%) Operator With Caution Stack Overflow
caution
Tuesday, February 8th, 2011 by Nigel Jones
This is the thirteenth in a series of tips on writing efficient C for embedded systems. As the title suggests, if you are interested in
writing efficient C, you need to be cautious about using the modulus operator. Why is this? Well a little thought shows that C = A %
B is equivalent to C = A – B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result
it’s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely
have to use the modulus operator. However in many cases it’s possible to restructure the code such that the modulus operator is
not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.
This approach has a nice looking symmetry to it. However, it contained three divisions and three modulus operations. I thus was
rather concerned about its performance and so I measured its speed for three different architectures – AVR (8 bit), MSP430 (16
bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted
are for 10 invocations of the test code and include the test harness overhead:
No that isn’t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my
claim that the modulus operator can be very inefficient is true for some architectures – but not all. Thus if you are using the
modulus operator on an ARM processor then it’s probably not worth worrying about. However if you are working on smaller
processors then clearly something needs to be done – and so I investigated some alternatives.
In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single
operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators
in one (- * /). Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results
however were a little surprising:
Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no
impact on the ARM code. Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the
ARM results, it’s clear that at least in this example, it’s possible to significantly speed up an algorithm by eliminating the modulus
operator.
I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by
observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds
are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.
All I have done is change the data types and to add casts where appropriate. The results were interesting:
Thus while this resulted in a significant improvement for the AVR & MSP430, it resulted in a significant worsening for the ARM.
Clearly the ARM doesn’t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a
lot more portable – and that is to use the C99 fast types. Doing this gives the following code:
All I have done is change the data types to the C99 fast types. The results were encouraging:
Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast
and portable.
Conclusion
Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for
further optimizations. As a result with the AVR & MSP430 I was able to more than halve the execution time.
However, using the technique espoused above, we can rewrite this much more efficiently as:
lsd = value;
/* Now display the digits */
}
If you benchmark this you should find it considerably faster than the modulus based approach.