A constant battle being fought by embedded developers everywhere is the balance between designing for low power consumption and designing for high performance. Power management and workload consolidation are two areas of technology focus that have emerged to help vendors address these causes.
There were no surprises in a recent report released by the International Energy Agency (IEA) announcing that energy consumption is steadily rising and will continue to do so in the long-term. The report estimates that global energy consumption will rise by 2.5% annually through 2015, with fossil energy continuing to play a dominant role. And while much of this increased consumption can be attributed to lifestyle advancements in developing countries, the role of first world industry continues to contribute to the world"s dwindling energy supply.
According to the annual reports of leading telecommunications operators, the industry is a heavy contributor to the trend in rising energy consumption, with some operators listed as the largest energy-consuming companies in their respective countries. These companies continue to introduce complex information and communications technologies, leading to an ever-increasing number of peripheral equipment running online and an increased demand on the world"s energy supply. As a result of this increased demand, both CO2 emissions and energy costs have risen in parallel, putting operators under long-term financial pressure to reduce their consumption in order to meet corporate social responsibility requirements and/or federal regulations, as well as improve their bottom line. The continued increase in data demand and faster transfer rates leads to an increasing need for speed provided by communications equipment, which in turn amplifies the overall power consumption of the telecommunications industry.
Understanding and investing in power management has never been more important, which has lead telecommunications operators and equipment vendors to address the need for emission reduction and focus on developing energy efficiency plans based on sustainable development. In the lifetime of an AdvancedTCAR (ATCA) chassis used in a networking deployment, the majority of CO2 emissions are attributed to unit performance and cooling required for heat dissipation. The most energy is consumed during the operational phase, in which CO2 emission accounts for about 80% of total emission over the life of the product. During the operational phase, there are three levels - support facilities, network equipment, and power conversion - where power is consumed and can be managed. Energy efficiency begins by understanding the relevant technologies that can be used to manage that power consumption.

Figure 1. Only about 36% of power consumption is used for network equipment, such as servers, storage, and network devices, with most of that most of that power going toward heat production. About 2.4% of total power input is really used for effective output. Today"s vendors offer solutions to improve energy efficiency in ATCA-based network equipment, which also results in improved energy efficiency in equipment facilities and the power conversion process.
Reasonable design is essential to thermal management. By reducing CPU usage, power supply output is also reduced, which further reduces the cooling needs within the equipment room. The end result of this cascading reduction is both decreased CO2 emissions and decreased costs associated with cooling due to diminished energy consumption.
Power Management Concepts and Technologies
In terms of the equipment itself, there are several concepts that contribute to reduced energy consumption. Perhaps the best known is processor level dynamic power management, which occurs when a device or system is set into different running modes: performance/on-demand/powersave/emergency. With this technology, dynamic voltage scaling and dynamic frequency scaling are used to obtain efficient power management for the processor. With dynamic voltage scaling and dynamic frequency scaling, the processor core voltage, clock frequency, or both can be reduced to decrease power consumption in real time to meet system performance requirement. Power-capping refers to the ability of a system or component to keep its peak power usage below a defined limit by policy-based strategy according to real service model, including the raw data of CPU usage, concurrent session number, and so on.
ATCA shelf-level power management policies that include virtualization live migration for load consolidation also reduce power usage and related cost/expenses. Live migration allows the server administrator to move a running virtual machine (VM) or application between different physical machines (PMs) without disconnecting the client or application. One of the primary use-cases for live migration is for resource management in cloud computing, with telecom providers who have thousands of virtual machines (VMs) running in their data centers. To save energy and cost, and for load-balancing, these providers can move VMs using live migration without disrupting customer applications running in the VMs.
Setting policy for live migration can be based on an energy-aware migration model and/or a load-dispatching model, guided by whether primary goals are for energy savings or quality of service levels. The key to energy savings with live migration is to efficiently pack service into fewer physical servers, thus providing considerable energy savings by reducing the amount of physical servers requiring power and producing heat.
While live VM migration brings multiple benefits, such as resources (CPU, memory, etc.), distribution and energy-aware consolidation, the migration of virtual machines itself requires extra power consumption. According to a paper on performance and energy modeling for live migration of virtual machines published in proceedings from the 20th International Symposium on High Performance Distributed Computing, tests run to determine power consumption during live migration show that the power overhead of migration is greatly reduced when employing the energy-aware, server consolidation model. Model-guided decisions significantly reduced the migration cost by more than 72.9%, at an energy saving of 73.6%.
Setting and Controlling Management Policies
Taking the telecom industry as an example, today"s ATCA chassis often include a set of high quality power module and intelligent fan system that can be used relationally to control temperature output and power consumption. Based on tests run by ADLINK on a typical ATCA chassis, the power consumption of fans (1/8 of total chassis) can be reduced by 40% with automated policies for variable fan speed based on ambient temperature.
For the remaining portion (7/8) of the chassis, embedded software can be used to set the frequencies and operating mode of the CPU, memory, and devices on every blade in the chassis in order to achieve dynamic power management and/or power-capping. With the added intelligence in the firmware and control at the software level, power management polices can be put into place to greatly reduce consumption.
From a systems management perspective, dynamic power management might be done on a scheduled basis when the system workloads are known to be well below the system"s full capacity. It might also be used to reduce energy consumption in peak periods. When power (energy) saver mode is enabled, however, the reduction in processor frequency may affect workload performance and throughput.
Power-capping can be done with internal or external processing of monitors and actuators. The actuators might scale processor voltages or scale processor or memory frequencies. The actuators might also "throttle" the processor, which delays instruction processing by injecting dead cycles. When power cap limits are reached and capping techniques enabled, performance of workloads may be impacted.
Embedded Power Management Software
The topology for power management software is to have multiple system-daemon components, each of which manages one blade, and one client component.

Figure 2: Basic components of embedded power management.
The client collects power-related data on behalf of the power-managed systems. The system-daemon is an application located on each blade that acts as a power management module. It provides CPU, memory, hard disk, network, and virtualization work methods and power-capping functionality to meet performance requirement with the least possible power consumption. The actual management client can run on a desktop or laptop. It consolidates and displays the chassis, board, and sensor (e.g., temperature) information, as well as actual power consumption.

Figure 3: Example of power-capping function.
Active Power Management
Setting a policy to switch the CPU of an ATCA blade into powersave or active power management mode can reduce consumption up to 15% of each blade as compared to continuously running in performance mode (See Figure 4 and Figure 5). 0.4KW power consumption could be saved in each blade in 24 hours under a service load (See Figure 5). Supposing there are 10 service blades in a 14 slots ATCA system, then totally, there would be 4KW energy will be saved every day.

Figures 4 & 5: Comparison of power consumption of a CPU in three separate states.
Live Migration
A very powerful method to reduce power consumption is to use only the necessary equipment to handle events. Using Erlang probability distribution (See Figure 6), phases of lower usage can be detected.

Figure 6: Example of Erlang probability distribution being applied to telecommunications network traffic.
In the Erlang example, usage is very low during the hours from 1-7. However, the individual blades still consume power, as they are running in powersave mode. In this case, each blade consumes 90W in active power management and up to 140W during peak performance. The solution is to use policy-driven live migration to concentrate the workload on the minimum number of CPU blades and send the ones that are in powersave to a sleep state to achieve an additional 25% power savings over active power management mode.
Boosting Performance through Workload Consolidation
In terms of workload and I/O handling, there has been a market and technology trend towards the convergence of network infrastructure to a common platform or modular components that support multiple network elements and functions, such as application processing, control processing, packet processing, and signal processing. Enhancements to processor architecture and the availability of new software development tools are enabling developers to consolidate the workloads on unified blade architecture for all their application, control, and packet processing workloads. Huge performance boosts achieved by this hardware/software combination is making the processor blade architecture increasingly viable as a packet processing solution.
To illustrate the workload consolidation evolution, we developed a series of tests to verify that an ATCA processor blade combined with a data plane development kit (DPDK) supplied by the CPU manufacturer can provide the required performance and consolidate IP forwarding services with application processing using a single platform. In summary, we compared the Layer 3 forwarding performance of an ATCA blade using native Linux IP forwarding without any additional optimization from software with that obtained using the DPDK. We then analyzed the reasons behind the gains in IP forwarding performance achieved using the DPDK.
Data Plane Development Kit
The data plane development kit provides a lightweight run-time environment for x86 architecture processors, offering low overhead and run-to-completion mode to maximize packet processing performance. The environment provides a rich selection of optimized and efficient libraries, also known as the Environment Abstraction Layer (EAL), which controls low-level resources and provides optimized driver (Poll Mode Driver PMD) as well as full APIs for integration with higher level applications. The software hierarchy is shown in Figure 7.

Figure 7: cEAL and GLIBC in Linux Application Environment
Test Topology
In order to measure the speed at which the ATCA processor blade can process and forward IP packets at the Layer 3 level, we used the following test environment shown in Figure 8 below.

Figure 8: IP Forwarding Test Environment
By analyzing the results of our tests using the ATCA processor blade"s two 10GbE external interfaces and two 10GbE Fabric Interfaces (total 40G capability) with and without the data plane development kit, we can conclude that running Linux with the DPDK and using only two CPU cores for IP forwarding can achieve approximately 10 times the IP forwarding performance of that achieved by native Linux with all CPU threads running on the same hardware platform. Using the DPDK platform is able to achieve >70% line rate loading, in small packet layer 3 forwarding; highly optimized software stacks in DPDK enable 10x performance boost. With control and data plane in a single IA blade with DPDK enablement, it would eliminate one NPU blade with 40G throughput. Normally the power consumer of 40G NPU blade is 180W, then about 56% could be saved by workload consolidation.
As is evident in Figure 9, the IPv4 forwarding performance achieved by the processor blade with the DPDK makes it cost- and performance-effective for customers to migrate their packet processing applications from network processor based hardware to x86 based platforms, and use a uniform platform to deploy different services, such as application processing, control processing, and packet processing services. Additional details about our test procedure and results can be found in our white paper, Consolidating Packet Forwarding Services on the ADLINK aTCA-6200 Blade with the IntelR DPDK, at www.adlinktech.com.

Figure 9: IP Forwarding performance comparisons using 4x 10GbE.
CONCLUSION
There are multiple ways to optimize power usage and power efficiency of a multi-board/multi-processor system. We have seen the possibilities using embedded power management, live migration combined with embedded power management, and workload consolidation with throughput optimization. Since the system configurations and workload demands vary case-by-case, there is no generic solution. For each scenario, the techniques and policies to achieve the desired throughput and power consumption must be selected carefully. In the future, power management will remain an important factor for telecommunications operators, as the power density (watt/cubic inch) per system will continue to increase and, with that, intensify the impact on cooling and operational expense.