The High-Frequency Signals of PCIe 4.0 Demand Higher Performance from Engineers

The High-Frequency Signals of PCIe 4.0 Demand Higher Performance from Engineers

March 20th, 2018

By Lynnette Reese, Editor-in-Chief, Embedded Systems Engineering, Embedded Intel Solutions

PCIe 4.0 meets growing throughput demands but requires hair-pulling attention to detail from engineers to make it happen.

If a processor has to access a peripheral, it might use Peripheral Component Interconnect Express (PCIe® or PCI Express®). PCIe is a high-speed serial computer expansion bus that replaced PCI (a parallel bus). PCIe, a dual simplex point-to-point serial connection, is a product of an Intel® R&D Lab begun in the 1990s. It briefly competed with a couple of other standards but gained momentum for broad adoption by the mid-‘90s. The latest standard, PCIe 4.0, launched in October 2017, increasing speeds to 16 Gb/s per lane. This means that PCIe 4.0 is switching a voltage sixteen billion times per second over a differential pair.

Table 1: PCIe through the years. PCIe 4.0, when scaled up to 16 lanes, has a total throughput of 31.5 GB/s. *The term “GT/s“ means Gigatransfers per second. “Transfers per second” includes overhead bits, which is more realistic, since bits per second would include overhead that does not contribute to throughput.

PCI Express is scalable, as well. PCIe 4.0 can be implemented as one lane up through sixteen bidirectional lanes, at almost 16 GT/s. Each lane is a differential pair comprised of one transmit signal and one receive signal. For more bandwidth, PCIe is commonly scaled up to two (x2), four (x4), eight (x8), or sixteen lanes (x16), although the standard allows up to 32 (x32) lanes. If there are 16 lanes of PCIe 4.0 in use, total throughput can reach almost 64 GB/s, with 16 signals transmitting and 16 signals receiving at the same time.

Figure 1: PCIe terminology. A differential signal is carried by a pair of wires, as differential signaling helps maintain signal integrity. Two of these differential pairs are in each lane; one pair to transmit and one pair to receive. A link describes the physical connection between PCIe devices regardless of the number of lanes. (Credit: Ravi Budruk, PCI Express Basics)

Evolving Applications
PCIe has evolved to accommodate the need for speed in other technologies, such as storage. PCI Express is most widely recognized for network cards and enabling gaming-level graphic cards. However, beginning with PCIe 3.0, it also began implementation for storage on Solid State Drives (SSD). SSDs are much faster than magnetic hard disk drives (HDDs), which have been in use since before conventional PCI was invented. SSDs found a distinct advantage in faster access through a PCIe serial bus standard. With PCIe, the SSD industry gained the advantage of speed. What’s more, the PCIe standard was already proven and demonstrated excellent interoperability with other products. Personal computers with an SSD may have a SATA or SATA Express connection, as they may be lower cost than PCIe. However, the SSD industry is starting to move to PCIe, most using an M.2 connector (previously known as Next Generation Form Factor). In addition to PCIe 3.0, the M.2 connector standard can also accommodate Serial ATA (SATA) 3.0, and USB 3.0 (backward compatible as USB 2.0). Although the SATA interface is well-established and widespread in the embedded space, PCIe for storage is growing in high-performance computing and other applications where load times are a concern. However, SATA is expected to coexist with PCIe in the industrial and embedded markets for several more years. Non-volatile Memory Express (NVMe) is a specification developed for SSDs that uses PCIe for data transfer. NVMe is employed more with SSDs of substantial capacity and is therefore used more in the server market rather than embedded applications. Most embedded applications typically do not require the huge, fast, and (presently) more costly storage of an SSD. However, PCIe‘s reliability plays a good part in why PCIe is becoming the standard interface on the storage side.

Figure 2: PCI-SIG has reliably doubled bandwidth over the years. PCIe 5.0 is expected to yield 128 GB/s (using 16 lanes). (Image: PCI-SIG).

One example of PCIe used in high-performance computing is in weather forecasting. MeteoSwiss, the Swiss Federal Office for Meteorology and Climatology, uses servers densely populated with acceleration devices to compute weather forecast models for simulation. MeteoSwiss achieves significant energy efficiency by connecting multiple accelerator devices (GPUs) using PCIe. Even so, PCIe networks must be engineered for a topology that reduces traffic congestion. A server or supercomputer that houses numerous accelerator devices (e.g., GPGPUs, Intel Xeon Phi) experiences an increased burden on intra-node communication networks using PCIe.[i]

In the consumer market, PCI Express is used every single day, primarily with smartphones and tablets that use Thunderbolt™. Thunderbolt is a combination of PCI Express and DisplayPort into one standard, which of course we use to charge Apple products. According to Intel, Thunderbolt technology provides flexibility and simplicity by supporting both data (PCIe) and video (DisplayPort) on a single cable connection. “Thunderbolt™ 3 technology is 8x faster than USB 3.0 and provides 4x more video bandwidth than HDMI 1.4.“[ii] USB Type-C ports can also connect and communicate with Thunderbolt devices. Thunderbolt 3 uses four lanes of PCIe 3.0 and eight lanes of DisplayPort in one cable, with an integrated USB 3.1 (10 Gb/s) host controller. PCI Express is an excellent choice for mobile devices. The adaptable PCI Express has some very low power states, offers very high speed when it’s needed, yet barely sips power when it’s in a standby state. Some mobile devices internally use PCIe to drive data to and from the display or elsewhere.

An acceptable eye height for PCIe 3.0 is 25 mv. For PCIe 4.0, it’s reduced to 15 mv.

Higher Performance? How High?
Whereas PCIe 3.0 was a radical departure from PCIe 2, the PCIe 4.0 protocol and encoding are similar to PCIe 3.0, as well as many other components. What makes PCIe 4.0 a challenge for the industry is that while throughput has increased dramatically with PCIe 4.0, the channels have not changed much from PCIe 3.0. The average channel length of a desktop computer is about 10 inches (~25 cm). Modern servers have channels measuring around 20 inches total. For PCIe 4.0 to increase in speed, higher frequencies must be put through the same channel that was used for the previous generation. However, the tradeoff is an insertion loss, which is a loss that increases with higher frequencies. Insertion loss is a frequency-dependent loss in signal strength that is usually expressed in decibels. When you double the frequency, the result is a significant loss, and the ability to drive a signal into a channel and recover it at the other end is significantly diminished. PCIe 4.0 demands higher performance and considerably more attention to detail in implementation.

Figure 3: Example of a PCIe 4.0 receiver eye signal. With such a small eye height to work with, tools such as Synopsys’ DesignWare PHY and Controller IP solutions for PCI Express 4.0 technology can help system designers assess their design’s performance variation tolerance early on. (Source: Synopsys.com)

PCIe 3.0 survived in part because receivers have improved over the years. Using eye diagrams to test signal integrity, the acceptable level of eye height for an equalized PCIe 3.0 signal is 25 mV. For PCIe 4, that eye height has dropped to 15 mV. [iii]

Such a low voltage makes even a tiny level of noise an issue. Therefore, considerable attention to detail is going to be the means of implementing successful products, products that pass standards testing for PCIe 4, as well as interoperability testing with other PCIe 4.0 products. The PCIe 4.0 standard does specify a new component, a retimer, which takes data and passes it through as quickly as possible to allow you to extend the channel lane. A retimer has some advantages over a typical re-driver that relate to deterministic and random jitter.[iv] (Retimers were offered as an option in subsequent revisions to the original PCIe 3.0 standard). As Intel states it, “With PCI Express Gen4 (16 GT/s), data rate has increased by 2x compared to previous generation (8 GT/s), resulting in shorter channel reach. Common use cases include channels expanding over system boards, backplanes, cables, risers, and add-in cards.”[v]

Another option to improve PCIe 4.0 performance is to use FR4 as the board material in printed circuit boards (PCBs). This approach mitigates frequency-dependent characteristics of the boards, which affects insertion loss to some degree. FR4 is fiberglass mesh that is weaved and pressed. Signal integrity issues are less of a problem with FR4 as a PCB material. Nevertheless, the signal integrity requirements of PCIe 4.0 are much higher than what engineers experienced with PCIe 3.0.

The PCIe 4.0 specification does provide guidelines for the new challenge it presents. PCIe is far from being “just an interface,” however. Phenomenal throughputs have been achieved with every release by PCI-SIG, the PCIe standards body, and thanks to the efforts of the multitude of engineers working to turn the specification into products that work well together. Successfully implementing the more demanding PCIe 4.0 will require not just adding a retimer, but also making small changes in many areas to accommodate high-frequency signals, from mounting connectors to meeting channel requirements for the specifications using statistical simulation. PCIe does not get as much attention as GPUs or processors that are tuned to implement neural networks, but without the significant gains the world has seen from PCIe in the past decade we could have been heading to yet another standard to create speed gains with an entirely different approach.

[i] Martinasso, Maxime, and Et. al. “SC ’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.” IEEE, A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers, 2016, pp. 739–749.

[ii] Intel. Thunderbolt 3 Overview Brief. Thunderbolt 3 Overview Brief, PDF, Intel, 2015.

[iii] “Under the Hood of PCI Express – EEs Talk Tech #4 .” Keysight Technologies, 26 Jan. 2017.

[iv]“PCIe 3.0 Retimer Frequently Asked Questions.” PCIe 3 Retimer FAQ, IDT, 2014, PDF

[v]“Introduction Overview.” PCIe* 4.0 Retimer Supplemental Features and Standard, PDF, Intel, 2018, p. 6. Document number: 336467-002US


Lynnette Reese is Editor-in-Chief, Embedded Intel Solutions, and has been working in various roles as an electrical engineer for over two decades. She is interested in open source software and hardware, the maker movement, and in increasing the number of women working in STEM so she has a greater chance of talking about something other than football at the water cooler.