Ostinato throughput on a 40G NIC

3 minute read

Update (2021): Ostinato now supports 10G, 25G and 40Gbps line-rate rate traffic using the Turbo Transmit add-on.

A few weeks ago I attended the DPDK Summit in Bangalore, where M Jay from Intel loaned me a dual-port 40G interface to enable Ostinato to become a high-speed packet generator using DPDK. Thanks M Jay and Intel!

If you’ve been following Ostinato for a long time, you might recall I had done a prototype of Ostinato with DPDK back in 2014 which won the 6Wind DPDK Design contest. That work was done on a VM using VNICs instead of physical NICs and the lack of the latter (amongst other things) meant I never productized that code.

Before I (re)start the DPDK work on Ostinato, I wanted to get some baseline measurements with the existing libpcap based code. This blog post presents these results.

You can also look at the results for a 1G port for Linux, MacOS, Live ISO.

I configured the same single stream as in previous tests -

  • Protocols: Mac, Ethernet II, IPv4, UDP, Pattern
  • Source and Destination Mac addresses populated with the actual Mac addresses of the source and sink
  • Source and Destination IP addresses populated with the actual IP addresses of the source and sink
  • Packets/sec: 0
  • Number of packets: 10 (default)
  • After this stream: Goto first

One difference from the previous tests is that I’m using a single host with a dual-port NIC with the ports connected back to back. Ostinato packets sent on port 1 are received back on port 2 on the same host.

Also, I’m adding two jumbo sized packet sizes to this test - 4096 bytes and 9018 bytes.

Performance figures

Here are the numbers -

Packet Size Send Rate (Kpps) Recv Rate (Kpps) Send Rate (Mbps) Recv Rate (Mbps)
64 755 630 507 423
128 753 640 892 758
256 738 635 1,630 1,402
512 732 636 3,115 2,707
1024 725 642 6,055 5,362
1518 725 639 8,920 7,862
4096 574 566 18,901 18,637
9018 225 225 16,268 16,268

Mbps rate was calculated using the below formula to take into account the ethernet line overhead of 20 bytes (1 byte SFD + 7 byte preamble + 12 byte inter-packet gap) -

Rate (in Mbps) = (PacketSize + 20) * KppsRate * 8 / 1000

Specs

Software

Hardware

  • Processor: Xeon E5-2620v4 @ 2.10GHz (8-core)
  • RAM: 32GB
  • NIC: Intel XL710 40Gbps dual-port PCI 3.0, x8

Observations

  • The first that jumped out was that there was a lot of fluctuation in the send rate while transmitting e.g. for 256 bytes packets, I saw rates ranging from 635Kpps to 728Kpps. This is a very wide range and I don’t have an explanation. I have used the lower end of the range seen in the above results for all packet sizes.
  • The second thing that jumps out from the results table is that receive rate gets capped at ~640Kpps even if the send rate is higher. Again, I don’t have an explanation for the same.
  • The next thing that is suprising is that the max rate is achieved for 4096 packets, instead of 9018.
  • Although it’s a 8-core CPU, only two core reach 100% CPU during the test - this is expected since for a single port, Ostinato doesn’t use more than one core and we have both transmit and receive port on the same host. The first core with 100% cpu utilization (presumably the one handling the tx port) has 15% user and 85% system usage - indicating that the major time is spent in the kernel driver. The second core had 100% utilization in softirq - presumably the one handling the receive port
  • RAM usage not measured
  • One heartening observation is that Ostinato (even without DPDK) can be used as a 10G traffic generator - if you can use jumbo frames.

For more Ostinato posts, subscribe for email updates.

Leave a Comment