Writing Low-Power Code

Standard

Many embedded applications are power sensitive, particularly battery powered devices due to the available run time.   While hardware has a major influence over power consumption, there are many ways software can help reduce power.   In this article I will discuss some of them.

Watch Your Current

One obvious way to reduce power consumption is turning off the power when it is not in-use.  To do so, one must know the different power rails in the system.  The power rails are usually separated by analog versus digital, and according to common voltages, such as 1.8V, 3.3V, and 5V.    Many micro-controllers (MCU) are mixed-signal devices requiring both analog and digital power.   Power to each rail could originates from AC-DC and DC-DC power supplies, voltage regulators or standard power buses, such as USB.   Low voltage power rails can be derived from higher voltage rails through buck converters.   Similarly, higher voltage rail can be derived from lower voltage rails through boost converters.

Simple load switch

When a portable device is turned OFF, power consumption would normally go to zero.  But often there is leakage current through the device, or through the passive circuit elements that can drain the battery.  To reduce leakage current, GPIO-controlled FETs, or load switches, are used to block current on all power rails, except for those necessary to enable the device to return to full operation.   When possible, unused on-chip and off-chip peripherals should be turned OFF.  Any GPIOs should be put into high-impedance state (read mode) to avoid sourcing or draining current inadvertently.

Run in the Right Memory

MMU and Cache

The memory from where the code runs can impact power consumption.  For instance, running from Flash may burn more power than running from SRAM, depending on how the charge pumps are used in the Flash memory.  Running from external SDRAM will burn more power than running from internal SRAM, because the external SDRAM requires refresh cycles and switches more bus lines.

The spatial locality of code also matters.  When the code is running in a CPU with memory management unit (MMU) and cache, executing code segments too distant from one another may lead to cache replacement and page swaps, and therefore consumes more power.  In extreme cases, cache replacement and page swaps can happen repeatedly in a loop, creating thrashing, which raise power consumption significantly.  Therefore, code segments that call one another should be located close to one another in memory.  Frequently running code would ideally fit inside the fast cache memory.

Watch Your Speed

Clocks burn power, therefore it is important to manage the clocks in the system.  Most MCUs and System-On-Chips (SOC) have high-speed and low-speed clocks.  The source of these clocks is either an oscillator or a crystal, or both, that feeds an internal Phase Locked Loop (PLL) which will boost the clock frequency, usually to megahertz or even gigahertz range.  The PLL is usually divided down to source the high-speed clock that drives the CPU.  The low-speed clock is either derived from the high-speed clock, from a crystal, or from an external clock source.  It is mostly used by the on-chip peripherals, and has a typical frequency of 32kHz.

Clocking matters to power consumption

Software can help reduce system power by managing the clocks wisely.  One way is to reduce the clock frequency when speed is not required.  In some MCUs the PLL can be turned off entirely, leaving only the low-speed clock active to allow the device to wake on external stimuli; others have built-in low-power modes that will slow the system clock.  Another way to reduce power is to run the system at maximum speed to complete the task sooner, so system can spend more time in low-power mode.

Watch Your Voltage

In addition to throttling the clock speed, voltage can be adjusted as well.  The dynamic power dissipation is a linear function of the clock frequency, but a square function of the voltage, based on the formula,

Voltage vs frequency of a processor–an example

P = C × V2 × f

The formula reveals that, while reducing the clock frequency offers benefit, reducing the voltage will have even greater impact.  This observation gave rise to the concept of Dynamic Voltage Scaling (DVS), a feature found in many modern computing devices.  The basic premise of DVS is to deliver just enough power when it is needed while meeting the required voltages at the operating speed.  This way excess power that will only turn into heat is minimized.

To effectively leverage DVS, the voltage of the power source has to be precisely regulated and programmable.  The voltage is dynamically adjusted to follow the electrical characteristics of the device being powered.  Implementing DVS using discrete components can be very complex; fortunately, there are power management ICs (PMIC) available that support DVS.  Most PMICs are controlled through an I2C-like bus.   Software can command the optimal operating clock speed and supply voltage based on the task at hand.

Know Your Hardware

Modern CPUs and MCUs have parallel hardware, instruction and data pipelines, co-processor for special calculations, and buffered input/output.  These hardware features should be leveraged to increase parallelism.   An example of CPU level parallelism is Very Long Instruction Word (VLIW) instruction, where each instruction cycle fetches multiple instructions and feeds multiple ALUs.  Some systems have co-processors to speed up calculations, such a floating point co-processor or DSP, or multiple CPU cores for symmetric multiprocessing.  In general, when parallel hardware is used, more gets done for the same number of clock cycles, which means greater power efficiency.  Of course, it is equally important to shutdown any unused hardware.

Leverage Interrupt and Events

Embedded systems are usually required to perform tasks periodically or when triggered by events.  While waiting for these events, the system should reduce its power consumption.  The software should be architected to facilitate sleep and wake operations.  For example, if an OS is used, the idle task, one with the lowest priority, should be created to handle power management.  Since it will run only when there are no other tasks running, the idle task is the perfect place to put the system into low-power mode.  The low-power mode is exited when there is an interrupt, ideally corresponding to a valid event for waking the device.  If such interrupt is not available, an alternative would be to use a timer interrupt to periodically wake the device to check for wake condition.

More Efficient Algorithm

Algorithm efficiency has significant impact on power consumption.   This is especially critical for embedded system with scarce CPU and memory resources.   How to make algorithm more efficient is mostly domain-specific and a rather large topic; therefore only a few optimization techniques are mentioned here.  For example, linear search of large data set could be replaced with more efficient b-tree or m-way tree.  Large decision tree could replace switch() or if() with a dispatch/callback table, where the table entry is found using one of the more efficient search algorithms.   Iteration loops may be combined (loop jamming) for more efficient execution, with a downside of  making the code less readable.  Maximize instruction-level burst mode data transfer, such as the STM and LDM instruction for ARM, hardware stack push/pop.  Replace argument passing through stack with use of globals will reduce the number of instruction per call.  Finally, when necessary, code in assembly to gain maximum control over the code generation.

 

Further Reading:

(The above article is solely the expressed opinion of the author and does not necessarily reflect the position of his current and past employers)