0% found this document useful (0 votes)
14 views

Efficient C Tip #13 - Use The Modulus (%) Operator With Caution Stack Overflow

The document discusses four attempts at writing a C function to convert seconds to days, hours, minutes and seconds. The first attempt used the modulus operator, which was inefficient. The second attempt replaced the modulus operator with subtraction and multiplication, improving performance. The third attempt used smaller data types, with mixed results. The fourth attempt used C99 fast types and achieved good performance across architectures.

Uploaded by

jordan1412
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Efficient C Tip #13 - Use The Modulus (%) Operator With Caution Stack Overflow

The document discusses four attempts at writing a C function to convert seconds to days, hours, minutes and seconds. The first attempt used the modulus operator, which was inefficient. The second attempt replaced the modulus operator with subtraction and multiplication, improving performance. The third attempt used smaller data types, with mixed results. The fourth attempt used C99 fast types and achieved good performance across architectures.

Uploaded by

jordan1412
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Efficient C Tip #13 – use the modulus (%) operator with

caution
Tuesday, February 8th, 2011 by Nigel Jones

This is the thirteenth in a series of tips on writing efficient C for embedded systems. As the title suggests, if you are interested in
writing efficient C, you need to be cautious about using the modulus operator. Why is this? Well a little thought shows that C = A %
B is equivalent to C = A – B * (A / B). In other words the modulus operator is functionally equivalent to three operations. As a result
it’s hardly surprising that code that uses the modulus operator can take a long time to execute. Now in some cases you absolutely
have to use the modulus operator. However in many cases it’s possible to restructure the code such that the modulus operator is
not needed. To demonstrate what I mean, some background information is in order as to how this blog posting came about.

Converting seconds to days, hours, minutes and seconds


In Embedded Systems Design there is an increasing need for some form of real time clock. When this is done, the designer
typically implements the time as a 32 bit variable containing the number of seconds since a particular date. When this is done, it’s
not usually long before one has to convert the ‘time’ into days, hours, minutes and seconds. Well I found myself in just such a
situation recently. As a result, I thought a quick internet search was in order to find the ‘best’ way of converting ‘time’ to days, hours,
minutes and seconds. The code I found wasn’t great and as usual was highly PC centric. I thus sat down to write my own code.

Attempt #1 – Using the modulus operator


My first attempt used the ‘obvious’ algorithm and employed the modulus operator. The relevant code fragment appears below.

void compute_time(uint32_t time)


{
uint32_t days, hours, minutes, seconds;

seconds = time % 60UL;


time /= 60UL;
minutes = time % 60UL;
time /= 60UL;
hours = time % 24UL;
time /= 24UL;
days = time;
}

This approach has a nice looking symmetry to it. However, it contained three divisions and three modulus operations. I thus was
rather concerned about its performance and so I measured its speed for three different architectures – AVR (8 bit), MSP430 (16
bit), and ARM Cortex (32 bit). In all three cases I used an IAR compiler with full speed optimization. The number of cycles quoted
are for 10 invocations of the test code and include the test harness overhead:

AVR: 29,825 cycles

MSP430: 27,019 cycles

ARM Cortex: 390 cycles

No that isn’t a misprint. The ARM was nearly two orders of magnitude more cycle efficient than the MSP430 and AVR. Thus my
claim that the modulus operator can be very inefficient is true for some architectures – but not all. Thus if you are using the
modulus operator on an ARM processor then it’s probably not worth worrying about. However if you are working on smaller
processors then clearly something needs to be done – and so I investigated some alternatives.

Attempt #2 – Replace the modulus operator


As mentioned in the introduction, C = A % B is equivalent to C = A – B * (A / B). If we compare this to the code in attempt 1, then it
should be apparent that the intermediate value (A/B) computed as part of the modulus operation is in fact needed in the next line of
code. Thus this suggests a simple optimization to the algorithm.

void compute_time(uint32_t time)


{
uint32_t days, hours, minutes, seconds;

days = time / (24UL * 3600UL);


time -= days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = time / 3600UL;
time -= (hours * 3600UL);
/* time now contains the number of seconds in the last hour */
minutes = time / 60U;
seconds = time - minutes * 60U;
}

In this case I have replaced three mods with three subtractions and three multiplications. Thus although I have replaced a single
operator (%) with two operations (- *) I still expect an increase in speed because the modulus operator is actually three operators
in one (- * /). Thus effectively I have eliminated three divisions and so I expected a significant improvement in speed. The results
however were a little surprising:

AVR: 18,720 cycles

MSP430: 14,805 cycles

ARM Cortex: 384 cycles

Thus while this technique yielded a roughly order of two improvements for the AVR and MSP430 processors, it had essentially no
impact on the ARM code. Presumably this is because the ARM has native support for the modulus operation. Notwithstanding the
ARM results, it’s clear that at least in this example, it’s possible to significantly speed up an algorithm by eliminating the modulus
operator.

I could of course just stop at this point. However examination of attempt 2 shows that further optimizations are possible by
observing that if seconds is a 32 bit variable, then days can be at most a 16 bit variable. Furthermore, hours, minutes and seconds
are inherently limited to an 8 bit range. I thus recoded attempt 2 to use smaller data types.

Attempt #3 – Data type size reduction


My naive implementation of the code looked like this:

void compute_time(uint32_t time)


{
uint16_t days;
uint8_t hours, minutes, seconds;
uint16_t stime;

days = (uint16_t)(time / (24UL * 3600UL));


time -= (uint32_t)days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = (uint8_t)(time / 3600UL);
stime = time - ((uint32_t)hours * 3600UL);
/*stime now contains the number of seconds in the last hour */
minutes = stime / 60U;
seconds = stime - minutes * 60U;
}

All I have done is change the data types and to add casts where appropriate. The results were interesting:

AVR: 14,400 cycles

MSP430: 11,457 cycles

ARM Cortex: 434 cycles

Thus while this resulted in a significant improvement for the AVR & MSP430, it resulted in a significant worsening for the ARM.
Clearly the ARM doesn’t like working with non 32 bit variables. Thus this suggested an improvement that would make the code a
lot more portable – and that is to use the C99 fast types. Doing this gives the following code:

Attempt #4 – Using the C99 fast data types


void display_time(uint32_t time)
{
uint_fast16_t days;
uint_fast8_t hours, minutes, seconds;
uint_fast16_t stime;

days = (uint_fast16_t)(time / (24UL * 3600UL));


time -= (uint32_t)days * 24UL * 3600UL;
/* time now contains the number of seconds in the last day */
hours = (uint_fast8_t)(time / 3600UL);
stime = time - ((uint32_t)hours * 3600UL);
/*stime now contains the number of seconds in the last hour */
minutes = stime / 60U;
seconds = stime - minutes * 60U;
}

All I have done is change the data types to the C99 fast types. The results were encouraging:

AVR: 14,400 cycles

MSP430: 11,595 cycles

ARM Cortex: 384 cycles

Although the MSP430 time increased very slightly, the AVR and ARM stayed at their fastest speeds. Thus attempt #4 is both fast
and portable.

Conclusion
Not only did replacing the modulus operator with alternative operations result in faster code, it also opened up the possibility for
further optimizations. As a result with the AVR & MSP430 I was able to more than halve the execution time.

Converting Integers for Display


A similar problem (with a similar solution) occurs when one wants to display integers on a display. For example if you are using a
custom LCD panel with say a 3 digit numeric field, then the problem arises as to how to determine the value of each digit. The
obvious way, using the modulus operator is as follows:

void display_value(uint16_t value)


{
uint8_t msd, nsd, lsd;

if (value > 999)


{
value = 999;
}

lsd = value % 10;


value /= 10;
nsd = value % 10;
value /= 10;
msd = value;

/* Now display the digits */


}

However, using the technique espoused above, we can rewrite this much more efficiently as:

void display_value(uint16_t value)


{
uint8_t msd, nsd, lsd;

if (value > 999U)


{
value = 999U;
}

msd = value / 100U;


value -= msd * 100U;

nsd = value / 10U;


value -= nsd * 10U;

lsd = value;
/* Now display the digits */
}

If you benchmark this you should find it considerably faster than the modulus based approach.

You might also like