What is PCIe 4.0?
PCIe 4.0 (also referred to as PCIe 4, PCIe Gen 4, PCI 4, PCI Express 4.0) is the fourth generation of Peripheral Component Interconnect Express (PCIe), high-speed computer bus technology. Designed to address applications requiring higher bandwidth at a lower cost, including AMD, gaming, and flash content, and for those wondering, "is PCIe 4.0 backwards compatible," PCIe 4 remains fully backwards compatible with previous PCIe generations.
The specification defines variable lane widths - x1, x2, x4, x8, x16, and x32, which gives developers access to many lane widths and speed configurations and caters to applications with various bandwidth requirements. For example, storage applications use the PCIe x4 lane widths, whereas high-performance applications that can benefit from increased bandwidths can use PCIe x16 lane widths.
Compatible PCI Express 4.0 motherboards paired with a PCI gen 4 SSD can produce sequential read and write PCIe speeds twice as fast as last-gen PCIe 3.0 series SSDs and more than ten times that of some of the slower SATA SSD tech.
The PCI Express 4.0 specification was finalized in October of 2017 with a published transfer (bit) rate of 16 GT/sec, double the bit rate of the previous version, PCIe 3.0. In the table below, you can see the aggregate bandwidth options offered by PCIe 3.0 vs PCIe 4.0.
|Bandwidth in GBPS||PCIE X1||PCIE X2||PCIE X4||PCIE X8||PCIE X16|
How is PCIe 4.0 Different?
In 2003, the first generation of PCI Express, PCIe 1.0, was released and quickly supplanted the bus standards that directly preceded it, PCI and AGP (Accelerated Graphics Port). The latter was developed specifically for graphics controller connections with higher bandwidth demands.
The improvements over PCI were readily apparent, with a serial interface format replacing the parallel format of PCI, and individual busses for each connected device replacing the lumbering PCI shared-bus architecture. PCIe 1.0, initially known as High Speed Interconnect, boasted a bandwidth specification of 250 MB/sec per lane and a transfer rate of 2.5 GT/sec.
The convention of bandwidth doubling for each full PCIe release was established with versions 2.0 and 3.0, released in 2007 and 2010, respectively. Incremental improvements for PCIe 3.0 included an encoding scheme that was updated from 8b/10b to 128b/130b. With each successive iteration, PCIe has remained backwards compatible with previous versions, although the lowest version and speed between the PCIe slot and connecting card will always dictate the actual bandwidth performance and time constraints.
Breaking from the four year per release cadence that had preceded it, the PCI Express 4.0 final release specification extended a full seven years after PCIe 3.0. Maintaining the established PCIe standard for bandwidth doubling with a backwards-compatible mechanical and electrical form factor has proven a more daunting task for developers.
While meeting this challenging performance expectation, PCIe Gen4 also introduced functional enhancements, including reduced system latency, scalability for added lanes of bandwidth, and lane margining features to evaluate the electrical integrity and reliability for each lane of the PCIe channel.
PCIe Gen4 Speed
With a myriad of additional improvements included, the significant upswing in PCIe 4.0 speed remains the best centerpiece around which numerous benefits and avenues for new applications have been realized. The incremental speed improvement with each successive release has now brought PCIe x4 (four lanes) to a level exceeding the first-generation throughput of PCIe x16 (sixteen lanes).
To provide some real-world context, the transfer rate for PCI Express 4.0 standard is indicative of a voltage switch over a differential pair occurring sixteen million times each second.
This speed-doubling tradition of PCIe has enabled PCIe 4.0 cost savings for high-end applications such as cloud servers and data centers while improving performance, user experience, and real estate efficiencies for standalone devices such as laptops and tablets.
Despite the dramatic increase in bit rate and bandwidth specifications over the past 16 years, more speed will be required to keep pace with integral elements of the network architecture. 400G Ethernet technology requires 50GB in each direction to keep pace, exceeding the maximum PCI Express Gen4 speed. Since busses can often become the choke point in x86 architecture, advanced applications may already be awaiting the superior bandwidth promised by future generations of PCIe.
|Version||Transfer Rate||Throughput/Lane||x16 Throughput|
|PCIe 1.0||2.5 GT/sec||250 MB/sec||4.0 GB/sec|
|PCIe 2.0||5.0 GT/sec||500 MB/sec||8.0 GB/sec|
|PCIe 3.0||8.0 GT/sec||1.0 GB/sec||16.0 GB/sec|
|PCIe 4.0||16.0 GT/sec||2.0 GB/sec||32.0 GB/sec|
PCIe 4.0 Architecture
The architecture of PCIe 4.0 is intended to provide improved PCIe speed, as well as more economical lane assignment. Any decrease in I/O pin usage enabled by PCIe Gen4 equates to a proportional improvement in power consumption. For example, a GPU has traditionally utilized 16 lanes, while PCIe 4.0 NVMe drives would consume an additional four lanes. This PCIe x4 vs x16 requirement can quickly deplete the available 20-24 PCIe lanes included on a standard motherboard faster.
With the advent of PCIe Gen4, the user has the option of either doubling the bandwidth or halving the lanes, the latter providing opportunities for more discretionary plugins utilizing any combination of PCIe x4, x8, or x16 device card.
The PCI Express Gen4 architecture also includes several additional features intended to improve efficiencies and power consumption. Extended tags and credits for service devices are features that can mask latency and optimize bandwidth saturation. Superior Reliability, Availability, and Serviceability (RAS) capabilities to augment system errors and improved I/O virtualization are among the other new elements of PCIe 4.0. By utilizing I/O virtualization, virtual software devices can be substituted for their physical equivalent, such as a network interface card (NIC).
Challenges with PCI Express 4.0
The long list of architectural enhancements incorporated in PCIe 4.0 also presents many technical challenges. Insertion loss is higher than had been encountered with previous revisions. Maintaining signal integrity while adapting to tighter margining requirements has been a formidable obstacle to overcome. Due to the higher accompanying frequency of PCIe Gen4, maximum trace lengths went from 16-20 inches with PCIe Gen3 to 10-12 inches.
The latest increase in bandwidth has also proportionally increased the reference clock performance demands. Appropriate clocking architecture decisions now require more analysis to ensure sufficiently low jitter and frequency stability levels, and clocks meeting the requirements of PCIe 3.0 may not necessarily meet the needs of a PCIe 4.0 device.
Testing PCIe 4.0
The challenges inherent to PCIe 4.0 design enhancements have also extended into the realm of testing. With yet another doubling of PCIe 4.0 speed, a higher standard has ensued for data capture, analysis, storage, and visualization. The higher frequency and channel loss inherent to PCIe Gen4 are additional considerations for new test platform applications. Reference clock testing has become more complex, with phase jitter requirement testing of all four PCIe iterations and data rates now requisite.
Some of the common hardware issues observed during PCIe testing include link speed issues, traffic issues, and link quality issues after recovery. A protocol analyzer is a versatile PCI Express tester and a powerful solution for testing and debugging many PCIe issues. With the enhanced PCIe 4.0 speed, the capability to filter out specific packets and record long sequences are increasingly valuable test device features.
Solutions for Analysis & Test
Throughout the history of the PCIe standard, PCIe bandwidth test equipment has evolved and adapted to meet the demands for speed multiplication and additional complexity within the architecture. In order to ensure accurate and comprehensive testing, multiple tools and processes should be utilized.
Analyzers designed specifically for PCI Express 4.0, with a high level of visibility into traffic flows and advanced trace analysis capabilities, have become indispensable. Test platforms such as the Xgig 4K16 Protocol Analyzer/Jammer also provide the benefits of error injection capabilities and the means to analyze and jam on a single chassis simultaneously. Memory segmentation enables the capture of multiple traces. 64GB of memory in both the upstream and downstream directions also provide ample storage capacity for traffic data capture.
Although jamming capability can be an integrated feature of a protocol analyzer, discrete products such as the Xgig Jammer can simulate errors by manipulating live traffic, thereby verifying the responsiveness and robustness of the error recovery process. The timing and types of errors introduced can be precisely preplanned so that automated PCIe speed test routines can be developed.
PCIe 5.0 and Beyond
The PCIe 5.0 specification final release was completed in May of 2019. According to PCI-SIG, the group responsible for PCIe releases, PCIe 5.0 has garnered an enthusiastic industry response, leading to accelerated hardware and test solution development cycles. The electrical improvements inherent to PCIe 5.0 have reigned in many of the signal integrity issues observed with PCIe Gen4. Customarily, PCIe Gen 5 also provides full backwards compatibility and a PCIe transfer rate doubling that of PCI Express 4.0, or a phenomenal 32.0 GT/sec.
The wait for PCIe 4.0 may have been an unusually long one, but the giant leap forward has justified this prolonged gestation period. With a continuous passing of the torch to PCIe 5.0, PCIe 6.0, and beyond, the demands on design and test professionals will continue unabated. Fortunately, a new generation of protocol analyzers and other versatile testers have proven worthy of the challenge, boding well for future generations.