ADLINK Home Applications Telecom Articles

Consolidating Packet Forwarding Services
on the ADLINK aTCA-6200A Blade with the Intel® DPDK

Plurk Twitter
Download ADLINK White Paper-DPDK
ADLINK Technical White Paper

Platform Integration & Validation
Embedded Computing Product Segment
ADLINK Technology, Inc.

Jack Lin, PIV Manager
Yunxia Guo, Software System Engineer
Xiang Li, Software System Engineer
Introduction

In recent years, there has been a market and technology trend towards the convergence of network infrastructure to a common platform or modular components that support multiple network elements and functions such as application processing, control processing, packet processing and signal processing. In addition to cost savings and reduced time-to-market, this approach provides the flexibility of modularity and the ability to independently upgrade system components where and when needed, using a common platform or modular components in shelf systems and networks of varying sizes. In traditional networks, switching modules would be used to route traffic between in-band system modules and out-of-band systems; processor modules used for applications and control plane functions; packet processing modules used for data plane functions; and DSP modules used for specialized signal plane functions. By utilizing the Intel® Data Plane Development Kit (Intel® DPDK), Intel® x86 architecture-based processor modules can not only handle traditional processing applications and control functions, but they can also capably and efficiently perform packet processing functions.

Taking IP forwarding as a packet processing example, this white paper shows how the ADLINK aTCA-6200A blade combined with the Intel DPDK can provide the required performance and consolidate packet processing services using a single platform. First, we compare the Layer 3 forwarding performance of the aTCA-6200A blade using native Linux IP forwarding without any optimization with that obtained using the Intel DPDK. We then analyze the reasons behind the gains in IP forwarding performance achieved using the Intel DPDK. Finally, we introduce ADLINK's own development toolkit based on the Intel DPDK that allows customers to easily develop their own Intel DPDK based applications.

ADLINK aTCA-6200A

The ADLINK aTCA-6200A is a highly integrated AdvancedTCA processor blade with dual Intel® Xeon® processors E5-2648L (Sandy Bridge-EP, 32nm process), each of which can provide up to 8 cores and 20MB of shared cache. By enabling Intel® Hyper-Threading Technology (Intel® HT Technology), up to 16 physical threads are supported per processor. In addition, eight channels of DDR3-1600 VLP RDIMM are supported for a maximum system memory capacity of 64GB per processor. Network I/O features includes two 10Gigabit Ethernet ports (XAUI, 10GBase-KX4) compliant with PICMG 3.1 option 1/9, and up to six Gigabit Ethernet 10/100/1000BASE-T ports to the front panel, AdvancedTCA Base Interface channels and RTM Gigabit Ethernet interfaces.

The ADLINK aTCA-6200A blade has been designed for carrier-grade Security and Telecommunications applications, as well as use in network infrastructure for IMS Servers, Media Gateways, Packet Inspection Servers, Traffic Management Servers and WLAN Access Point Controllers.

The detailed architecture of the aTCA-6200A is illustrated in the following functional block diagram, Figure 1.


Figure 1: aTCA-6200A Functional Block Diagram

Intel DPDK

The Intel® Data Plane Development Kit (Intel DPDK) is a lightweight run-time environment for Intel® architecture processors, offering low overhead and run-to-completion mode to maximize packet processing performance. It provides a rich selection of optimized and efficient libraries, also known as the Environment Abstraction Layer (EAL), which are responsible for initializing and allocating low-level resources, hiding the environment specifics from the applications and libraries, and gaining access to the low-level resources, such as memory space, PCI devices, timers and consoles.

The EAL provides an optimized Poll Mode Driver (PMD); memory & buffer management; and timer, debug and packet handling APIs, some of which may also be provided by the Linux OS. To facilitate interaction with application layers, the EAL, together with standard the GNU C Library (GLIBC), provide full APIs for integration with higher level applications. The software hierarchy is shown in Figure 2.


Figure 2: EAL and GLIBC in Linux Application Environment

Test Topology

In order to measure the speed at which an aTCA-6200A can process and forward IP packets at the Layer 3 level, we used the following test environment shown in Figure 3 below.


Figure 3: IP Forwarding Test Environment

As shown in Figure 3, two ADLINK aTCA-3400 switch blades with FASTPATH® Networking Software provide non-blocking interconnection switches for the 10GbE Fabric and 1GbE Base Interface channels of all three processor blades in the ADLINK aTCA-8505 shelf, which supports a full-mesh topology. Therefore, each aTCA-3400 switch blade can provide at least one Fabric and Base interface connection to each processor blade, such as the aTCA-6200A blade (device under test) installed in slot 5.

An Ixia XM12 test system, compliant with RFC 2544 for throughput benchmarking, is used as a packet simulator to send IP packets with different frame sizes and collect the final statistical data, such as frames per second and throughput.

According to the topology of the test environment shown above, the aTCA-6200A, used as a processor blade, has four Gigabit Ethernet interfaces: two directly from the front panel (Flow1 and Flow2), and another two from the Base Interfaces (Flow3 and Flow4) via the aTCA-3400's Base switches. In addition to these four 1GbE interfaces, the aTCA-6200A has two 10GbE interfaces connected to the Ixia XM12 via the aTCA-3400 switch blade.

In our test setup, the aTCA-6200A is the device under test (DUT) and is responsible for receiving IPv4 packets from the Ixia test system, processing these packets at the Layer 3 level (e.g. packet de-encapsulation, IPv4 header checksum validation, route table look-up and packet encapsulation), then finally sending the packets back to the Ixia XM12 according to the routing table look-up result. All six flows are bi-directional: for example, the Ixia XM12 sends frames from Interface 1/2/3/4/5/6 to the aTCA-6200A and receives frames via Interface 2/1/4/3/6/5 respectively.

Test Methodology

To evaluate how the Intel DPDK consolidates packet forwarding services on the ADLINK aTCA-6200A, an IP forwarding application based on the Intel DPDK is used in the following two test cases:

  • Performance with native Linux
  • In this test, Ubuntu Server 11.10 64-bit was installed on the aTCA-6200A. As with other current Linux distributions, the IP forwarding feature is disabled by default. To enable it, make sure the ufw service is disabled by entering the following commands:

    # sudo ufw disable
    # sysctl net.ipv4.ip_forward
    net.ipv4.ip_forward = 0

    By setting net.ipv4.ip_forward to "0" as above, IP forwarding is disabled on the current kernel configuration. However, it can be enabled immediately as follows:

    # sysctl -w net.ipv4.ip_forward = 1
    or
    # echo 1 > /proc/sys/net/ipv4/ip_forward

    IP forwarding can be enabled by default by setting net.ipv4.ip_forward to "1" in /etc/sysctl.conf and restarting the network service, as below:

    #echo "net.ipv4.ip_forward = 1">/etc/sysctl.conf
    # /etc/init.d/network restart
    Performance with Intel DPDK

    The Intel DPDK can run in different modes, such as Bare Metal, Linux with Bare Metal Run-Time and Linux User Space. The Linux User Space mode is the easiest to use in the initial development stages, as described in Intel Data Plane Development Kit - Getting Started Guide for Linux. Details of how the Intel DPDK functions in Linux User Space Mode are shown in Figure 4.


    Figure 4: Intel DPDK running in Linux User Space Mode

    To set up the Intel DPDK on the aTCA-6200A blade, the following features should be set in the kernel:

  • GLIBC >=2.7
  • HPET and HPET MMAP configuration options enabled:
  • # grep HPET /boot/config-`uname -r`
    CONFIG_HPET_TIMER=y
    CONFIG_HPET_EMULATE_RTC=y
    CONFIG_HPET=y
    CONFIG_HPET_MMAP=y

  • HUGETLBFS enabled:
  • # mkdir /mnt/huge
    # mount -t hugetlbfs nodev /mnt/huge
    # echo 1024 > /sys/kernel/mm/hugepages/hugepages-\ 2048kB/nr_hugepages

  • Userspace I/O (UIO) kernel driver loaded:
  • # sudo /sbin/modprobe uio # needed if uio is built as a module
    # sudo insmod <$RTE_HOME>/x86_64-default-linuxapp-\ gcc/kmod/igb_uio.ko

  • After compiling the Intel DPDK target environment, an IP forwarding application can be run as a Linux User Space application. Please refer to the Intel® Data Plane Development Kit - Getting Started Guide for Linux for details.
  • # ./build/l3fwd -c 0x1 -- -p=0xF --config="(0,0,0)"
    Notes:
    • "-c=0x1" means the CPU mask is 0x1, i.e. only the first CPU thread is used for this Layer 3 forwarding application
    • "-p=0xF" means the port mask is 0xF, i.e. only the first four Gigabit ports are initialized and used for this Layer 3 forwarding application
    • --config="(portid, queueid, coreid)"
    Results

    After testing the aTCA-6200A blade under native Linux and with the Intel DPDK, we have compared the IP forwarding performance in these two configurations from the four 1GbE interfaces (2 from the front panel and 2 from the Base Interfaces) and two 10GbE Fabric Interfaces. In addition, we have also benchmarked the combined IPv4 forwarding performance of the aTCA-6200A using all six interfaces of the aTCA-6200A simultaneously (four 1GbE interfaces and two 10GbE interfaces).

    Performance comparison using four 1GbE interfaces


    Figure 5: IP Forwarding Performance comparison using 4x 1GbE interfaces

    When running IPv4 forwarding on the four 1GbE interfaces of the aTCA-6200A with native Linux IP forwarding enabled, a rate of 1 million frames per second can be sustained with a frame size of 64 bytes. As the frame size is increased to 1024 bytes, native Linux IP forwarding can approach 100% of the line rate. But in the real world, frame sizes are usually smaller than 1024 bytes, so 100% line rate forwarding is not achievable. However, with the Intel DPDK running on only two CPU threads under the same Linux OS, the aTCA-6200A can forward frames at 100% line speed without any frames lost regardless of the frame size setting, as shown in Figure 5 above.

    The aTCA-6200A blade with Intel DPDK enabled provides almost 6 times the IP forwarding performance compared to native Linux IP forwarding.

    Performance comparison using two 10GbE interfaces


    Figure 6: IP Forwarding Performance comparison using 2x 10GbE interfaces

    Running the IP forwarding test on the two 10GbE Fabric Interfaces shows an even greater performance gap between native Linux and Intel DPDK-based IP forwarding than that using four 1GbE interfaces. As shown above in Figure 6, the aTCA-6200A with Intel DPDK running on only two threads provides a gain of more than 10 times IP forwarding performance compared to native Linux using all available CPU threads.

    Total IPv4 forwarding performance of the aTCA-6200A


    Figure 7: IP Forwarding Performance comparison using 2x 10GbE + 4x 1GbE interfaces

    Testing the combined IP forwarding performance of the aTCA-6200A using all available interfaces (two 10GbE Fabric Interfaces, two 1GbE front panel interfaces and two 1GbE Base Interfaces), the aTCA-6200A with the Intel DPDK can forward up to 27 million frames per second when the frame size is set to 64 bytes. In other words, up to 18 Gbps of the theoretical 24 Gbps throughput can be forwarded (i.e. 75.3% of the line rate). Furthermore, the throughput in terms of the line rate increases to 92.3%, even up to 99%, when the frame size is set to 128 bytes and 256 bytes respectively.

    Analysis

    The reasons why the Intel DPDK can consolidate more powerful IP forwarding performance than available with native Linux come mainly from the Intel DPDK design features described below.

    Polling Mode instead of interrupts

    Generally, when packets come in, native Linux receives interrupts from the network interface controller (NIC), schedules the softIRQ, proceeds with context switching, and invokes system calls such as read() and write().

    In contrast, Intel DPDK uses an optimized poll mode driver (PMD) instead of the default Ethernet driver to pull the incoming packets continuously, avoiding software interrupts, context switching and invoking of system calls. This saves significant CPU resources and reduces latency.

    Hugepage instead of traditional pages

    Compared to the 4 kB pages of native Linux, using larger pages means time savings for page look-ups and the reduced possibility of a translation lookaside buffer (TLB) cache miss.

    The Intel DPDK runs as a user-space application by allocating hugepages in its own memory zone to store frame buffer, ring and other related buffers, that are out of the control of other applications, even the Linux kernel. In the test described in this white paper, a total of 1024@2MB hugepages are reserved for running IP forwarding applications.

    Zero-copy buffers

    In traditional packet processing, native Linux decapsulates the packet header, and then copies the data to the user space buffer according to the socket ID. Once the user space application finishes processing the data, a write system call is invoked to send out data to the kernel, which takes charge of copying data from the user space buffer to the kernel buffer, encapsulates the packet header, and finally sends it out via the relevant physical port. Obviously, the native Linux process sacrifices time and resources on buffer copies between kernel and user space buffers.

    In comparison, the Intel DPDK receives packets at its reserved memory zone, which is located in the user-space buffer, and then classifies the packets to each flow according to configured rules without copying to the kernel buffer. After processing the decapsulated packets, it encapsulates the packets with the correct headers in the same user-space buffer, and finally sends them out to the relevant physical ports.

    Run-to-implement and core affinity

    Prior to running applications, the Intel DPDK initializes to allocate all low-level resources, such as memory space, PCI device, timers, consoles, which are reserved for Intel DPDK-based applications only. After initialization, each of the cores (or threads, if Intel® Hyper-Threading Technology is enabled in BIOS settings) are launched to take over each execute unit, which run the same or different workloads, depending on the actual application requirements.

    Moreover, the Intel DPDK provides a way to set each execute unit running in each core to keep more core affinity, thus avoiding cache misses. In the tests described in this white paper, the physical ports of the aTCA-6200A blade are bound to two different CPU threads according to affinity.

    Lockless implement and cache alignment

    The libraries or APIs provided by the Intel DPDK are optimized to be lockless to prevent dead locks for multi-thread applications. For buffer, ring and other data structures, the Intel DPDK also optimizes them to be cache aligned to maximize cache-line efficiency and minimize cache-line contention.

    Conclusion

    By analyzing the results of our tests using the ADLINK aTCA-6200A's four 1GbE interfaces and two 10GbE Fabric Interfaces with and without the Intel DPDK (Figures 5 and 6), we can conclude that running Linux with the Intel DPDK and using only two CPU threads for IP forwarding can achieve approximately 10 times the IP forwarding performance of that achieved by native Linux with all CPU threads running on the same hardware platform.

    As is evident in Figure 7, the IPv4 forwarding performance achieved by the aTCA-6200A with the Intel DPDK makes it cost and performance effective for customers to migrate their packet processing applications from NPU based hardware to Intel x86 based platforms, and use a uniform platform to deploy different services, such as application processing, control processing, and packet processing services.

    However, it is important to note that the Intel DPDK is a data plane development kit running in the user space, and not a complete product that customers can build their applications on directly. In particular, it does not include the implementations required to interact with the control plane, including kernel and protocol stacks.


    Figure 8: How the ADLINK Development Toolkit works with control and data planes

    Shown as Figure 8, ADLINK has developed its own development toolkit based on the Intel DPDK to manage both control plane and data plane, performing tasks like cloning virtual NICs at the control plane to sync with physical ports at the data plane. Using such a toolkit, customers can easily develop their own Intel DPDK based applications to interact with control and data planes, not only improving packet processing performance, but also making the development path easier and reducing time to market.

    References
    • Intel® Data Plane Development Kit Software - Architecture Specification (Ref. No. 450255, Dec. 2010)
    • Intel® Data Plane Development Kit - Getting Started Guide for Linux (Ref. No. 450248, Dec. 2010)
    • Wind River White Paper: High-Performance Multi-Core Networking Software Design Options (Dec. 2011)
    ADLINK AdvancedTCA Products
     BACK

       Contact Us | Career | Investor Relations | Partner Center | Sitemap |
    Copyright © ADLINK Technology Inc., Specification and product names are trademarks or trade names of their respective companies or organizations.