VIAVI Solutions

White Paper

# Test and Validate FEC Implementations with VIAVI ONT

This white paper shows how the range of applications available for the VIAVI ONT family can be used to develop, test and validate the FEC IP block used in 400G and related Ethernet technology.

Modern communication systems extensively use forward error correction (FEC) technology to ensure reliable and high-performance communication links. From ultra-long-haul subsea cables to short hops across a backplane – a FEC can be used to increase overall performance. The recent emergence of 400G Ethernet has led to widespread deployment of Ethernet interfaces with a mandatory FEC. The development, test and validation of this FEC block is required to allow an open, multi-vendor, plug and play ecosystem. FEC has been widely deployed in OTN (Optical Transport Network) technology and VIAVI has a lot of experience in test and in FEC validation and troubleshooting which it has expanded with the latest 400G Ethernet to match the specific needs and challenges found today.

### What is a FEC?

In a real communication system, a desired message is encoded as a number of symbols (in the simplest case – binary bits). The message is then sent over a communication channel, subject to noise, corruption, distortion and other effects. The expectation is that the noise in the channel will corrupt a random number of the bits transmitted therefore the desired message will be corrupted, and although simple techniques such as parity or checksum can be used to validate the message, it would normally mean the message has to be retransmitted. This is clearly inefficient for modern communication systems as in any real links errors could cause continued retransmission of data packets.

When a FEC is used the desired message is encoded at the sending end (normally in a fixed block size) where addition bits are added according to the chosen FEC algorithm. This addition information is then sent along with the desired message over the noisy channel. As before several of the transmitted bits could be corrupted, but the receiver FEC algorithm can now use the addition bits to allow it to detect and correct the corrupted bits. Depending on the properties of the communication channel and the FEC algorithm used, a modern FEC algorithm can make a noisy channel effectively appear as error free at the expense of some additional data 'overhead'.

The FEC algorithm performance can be expressed in many ways, one common method is coding gain. This is the effective improvement in channel signal to noise ratio (SNR) due to data being transmitted with a particular FEC. The IEEE took many factors into account when choosing the FEC to be used in 400G Ethernet (802.3bs), these included implementation complexity, latency, performance, required power and IC area. It was decided to use on a Reed-Solomon based FEC acting on a 514 symbol codeword with each symbol consisting of 10 bits, expanded to 544 symbols with the addition of FEC coding. This allows the code to detect and correct up to 16 errored symbols in each codeword block (the FEC can detect but not correct more than 16 errored symbols and can flag the codeword as corrupted).

The IEEE 802.3 standards document is the reference for the Ethernet FEC and although it does not define an implementation it does set the standards and definitions.

The screenshot (Figure 1) from a VIAVI ONT shows how the FEC decoder fits within the signal flow. The 16 logical lanes (which can be encoded as 4 PAM-4 optical lanes (100G per lane or 56 Gbd) and/or 8 PAM4 electrical lanes at 28 Gbd) enter the PCS logic where each lane has its unique alignment marker (AM) identified and tracked. The logic determines if each lane is correctly identified and within limits for skew and then it is passed onto a block which reorders the lanes (they may have been reordered and skewed during the muxing and demux process used in the transmission) and then deskewed to realign them into a codeword. The codeword block is then passed into interleaved FEC blocks which detect and correct any errors that occurred during transmission, any codewords that cannot be corrected due to excessive errors are marked as uncorrectable. The codewords are then transcoded and passed on to the reconcilliation layer for further (MAC) processing.

At each stage various alarms and errors can be asserted to track the state machines and flow of the data. The FEC decoder is one of the most complex parts of the receiver logic and the correct functionality and performance of this logical block is a major part of any 400G system development, test and validation.



Figure 1: FEC Decoder Block as shown by the ONT

### Details of the 400GE FEC

The FEC block involves additional and complex logic at both the transmitter (to encode the data) and the receiver (to decode and then detect and correct errors and signal the appropriate information to higher layers).

The table below shows the performance of a KR4 FEC (used for NRZ coded signalling) and the KP4 FEC used in PAM-4. Since we are focusing on 400GE we'll focus on the KP4 FEC.

The KP4 FEC uses 514 symbols made up of 10 bits which are then encoded with the KP4 FEC logic to form a 544 symbol codeword, this is the addition of 30 'parity' symbols (each of 10 bits). The receiver FEC can detect and correct up to and including 15 errored symbols in the received 544 symbol codeword block. The output of the FEC receiver will be a fully corrected codeword if the received codeword has 15 or less errored symbols. Highly errored codewords (more than 15 errored symbols) will be detected and flagged, but the receiver can no longer correct those errors.

The FEC encoding process 'expands' the information to get the coding gain, in the case of the KP4 FEC an additional 30 symbols are added (300 bits to the original 5140 bit codeword block). This adds the ability to detect and correct a certain number of errors, equivalent to a 'coding gain'. In effect it appears as if the link SNR was improved by a few dBs. In the case of KP4 FEC this coding gain is stated as 6.5 dB (a Poisson random error distribution is assumed) with an expected post FEC BER of 10^-12. Of course, this coding gain is not for free. The FEC has three 'costs':

- It requires extra logic in the PCS layer at the transmitter and receiver, this logic takes up power and area in the ASIC or FPGA and of course represents additional design cost.
- It adds latency. The FEC logic requires time to act on and encode & decode the codeword. This can become a serious issue for shorter links where the FEC latency becomes a significant fraction of the overall end to end transmission delay. For telecoms and long haul this is normally not an issue.
- The addition parity bits required mean the transmitted data rate must increase. This additional line rate increases the bandwidth, performance and power requirement of the electronic and photonic elements.

In the vast majority of cases the additional burden of the FEC is more than outweighed by the link coding gain, although in some specialised cases people may use a lighter FEC (lower coding gain but lower latency) with better (engineered) links.

| RS-FEC              |        | Parameter Name | NRZ PHY                  | PAM4 PHY                  |
|---------------------|--------|----------------|--------------------------|---------------------------|
| FEC encoding        |        | _              | RS (528, 514, t=7, m=10) | RS (544, 514, t=15, m=10) |
| Total symbols       |        | n              | 528                      | 544                       |
| Message symbols     |        | k              | 514                      | 514                       |
| Parity symbols      |        | n-k            | 14                       | 30                        |
| Bits per symbol     |        | m              | 10                       | 10                        |
| Correctable symbols |        | t              | 7                        | 15                        |
| Coding gain         | DFE    | _              | 4.9 dB @ 1E-15           | 5.4 dB @ 1E-15            |
|                     | Random | _              | 5.3 dB @ 1E-12           | 6.5 dB @ 1E-12            |

Figure 2: Performance Comparison of NRZ PHY (KR4-FEC) versus PAM-4 PHY (KP4 FEC)

# **Testing the logical FEC implementation**

This view from the ONT FEC tools gives an excellent overview of the FEC on a 400GE link. It is a concise overview of the transmitter codeword (544 symbols each of 10 bits) and shows the ability to 'error' any chosen symbol (the marked symbols). Individual bits in the 10 bit symbols can be errored using the bit mask. It is important to note that 400G uses two interleaved FECs (FEC A and FEC B) to give additional protection against error bursts, as the burst is 'diluted' across the two interleaved codewords.



Figure 3: ONT FEC stress application user mode screen showing precise error positioning capability

# Simple FEC performance monitoring

Most test sets today show the errored symbol per codeword table view pioneered by VIAVI. The example below shows the output from a 'bad' link (in this case the link had been engineered to cause errors). Note that although the vast majority of symbols in the codeword have no errors (0 errored symbol), and the errored symbol count falls rapidly (more than two orders or magnitude for every additional errored symbol) down to 5 errored symbols per

codeword. The indication that this is a bad link is given by the very long tail on the 6 to 15 errored symbol count and the fact that the link still has uncorrectable (>=16) errored symbols per codeword. This link would certainly require further investigation with tools such as the VIAVI "Advanced Error Analysis" suite.

The errored symbol count shows the performance of the system on a given link but it does not validate or stress the FEC implementation and they give very little insight into the error 'root cause'. The symbol count view only shows the test set view of the link using the test set FEC block, not any DUT FEC receiver implementation.

#### Symbol Errors per Codeword

| No. of Symbols | Count             | Percentage |
|----------------|-------------------|------------|
| 0              | 4,496,897,584,919 | 99.924110  |
| 1              | 3,410,587,388     | 0.075786   |
| 2              | 4,316,290         | 0.000096   |
| 3              | 11,000            | 0.000000   |
| 4              | 211               | 0.000000   |
| 5              | 71                | 0.000000   |
| 6              | 33                | 0.000000   |
| 7              | 17                | 0.000000   |
| 8              | 19                | 0.00000    |
| 9              | 15                | 0.000000   |
| 10             | 8                 | 0.00000    |
| 11             | 7                 | 0.00000    |
| 12             | 8                 | 0.00000    |
| 13             | 2                 | 0.00000    |
| 14             | 5                 | 0.00000    |
| 15             | 0                 |            |
| >= 16          | 5                 | 0.00000    |

Figure 4: Classic errored symbol per codeword view of a 400GE link

### Properties of a FEC that need to be tested and validated.

Two core aspects of FEC need to be tested and validated during the R&D and validation phase – the logical 'correctness' of the FEC and the actual implementation stability. Although they are related and intertwined both need distinct test methods to ensure robustness and reliability. VIAVI brings this concept out via two applications in its FEC test application set:

### **FEC Stress Test**

This test focuses on the logical validation of the FEC, it ensures the FEC has realistic coverage for a range of error counts and position in a given codeword. Given the myriad of potential combinations it would not be possible to cover them all, but the applications are 'intelligent' enough to offer solid coverage.

The VIAVI FEC stress application allows the user to precisely probe the FEC by manual positioning of the errors (as shown in Figure 3), but it also supports a comprehensive automatic mode that scans through the huge potential range of error positions in a codeword for 1 to 15 errored symbols to give a solid coverage to validate the logical performance of the FEC. Although the automatic mode cannot possibly cover every combination of errored symbol position and count the application is carefully tailored to give the optimal coverage in a reasonable run time. The user can tune the 'depth' of test to match the potential test execution time. If the automated test shows issues then the manual test can be employed to investigate which area of the logic and under what conditions it is not performing as expected.

### **Dynamic FEC Stress**

Even if FEC logic is correct the hardware under test can still fail due to faults in implementation, especially with challenges like power supply integrity. The FEC decoding and checking block is often implemented in a wide parallel bus structure with many XOR based logic gates. Error detecting and correcting can cause fast changes in logic power demand, these rapid current spikes may cause power supply integrity issues, especially in FPGAs. They can also expose issues with point of load converter output impedance dynamics and PCB layout and decoupling. The application can be used in conjunction with tools like an oscilloscope to trace power supply dynamics around the IC packaging.

| VIAVI               | ONT-600<br>400G CI                                                                                                                                                                                                                                  | P8 PHY Module 307692610ZP414 Port 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Location: ONT-608 DB-004 Stot 1-3.1 10.49.<br>Application: New-Application                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 16.71 Module Time: 04.25 PM CEST<br>Disk: 2.308 of 7.208 free                                                                                      |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| All<br>Layers<br>OK | PCS<br>Contg<br>Rx Status<br>Tx Lane<br>Meoping<br>St Rx Errors/<br>Alarms<br>Rx Errors/<br>Rx BCR<br>Estimation<br>Tx Errors/<br>Rx Statistics<br>Tx Lane<br>Statistics<br>Tx Lane<br>Statistics<br>Tx Lane<br>Statistics<br>Tx Lane<br>Statistics | Tx Alarms 256B/257B Errors FEC Error   Error Insertion Type: FEC Power Supply Stress •   Mode: Single Error • •   Symbol Errors per Code Word: • • •   Symbol Errors per Code Word: • • •   Fast • Mondet • •   Fast • • • • •   Slow Estimated Sweep Time: • • • • •   Cod 90h 01m 15s Sweep Execution Status: • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • <th>1   2   3   4   5   6   7   6   9   10   11   12   13     mic   0   0.01   14/2   0   1.6/42   1   16/42   1   16/42   1   16/42   1   16/42   1   16/42   1   16/42   1   10/64   1   100   16/42   1   10/64   1   10/64   1   10/64   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1<!--</th--><th>9   (a)     14   15     Max. Surge Frequency   0.1 kHz     1 kHz   10 kHz     1 00 kHz   100 kHz     1 000 kHz   1000 kHz     1 000 kHz   1000 kHz</th></th> | 1   2   3   4   5   6   7   6   9   10   11   12   13     mic   0   0.01   14/2   0   1.6/42   1   16/42   1   16/42   1   16/42   1   16/42   1   16/42   1   16/42   1   10/64   1   100   16/42   1   10/64   1   10/64   1   10/64   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 </th <th>9   (a)     14   15     Max. Surge Frequency   0.1 kHz     1 kHz   10 kHz     1 00 kHz   100 kHz     1 000 kHz   1000 kHz     1 000 kHz   1000 kHz</th> | 9   (a)     14   15     Max. Surge Frequency   0.1 kHz     1 kHz   10 kHz     1 00 kHz   100 kHz     1 000 kHz   1000 kHz     1 000 kHz   1000 kHz |
| 🛷 Insertion         | C ALase                                                                                                                                                                                                                                             | n                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Elapsed: 00d 00h 00m 01s of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | Continuous Stop                                                                                                                                    |

Figure 5: ONT dynamic FEC stress control page showing the ability to vary the stress in a dynamic manner

The screen above shows some of the important settings required to dynamically stress a FEC. The user dials in the amount of error symbols in the codeword but now additional control is offered by the rate of the error injection, this is expressed in terms of frequency because this is the physical rate of the power and current impulses. The ability to drive the power impulses at varying frequencies (and sweep over the ranges) can be used to stress the power supply and related elements associated with the FEC receiver logic. A 'good' implementation would perform as expected over all conditions while an unstable implementation can lead to unpredictable and inconsistent results and may even lock up or crash. Of course such combinations of stressful burst error frequencies could be extremely rare in the field but would be impossible to troubleshoot or repeat so the heavy and comprehensive stress validation in the R&D and SVT phase is critical in producing a stable and compliant product.

Such tests can also be performed over temperature and power supply voltage levels to allow full margin testing and understand actual failure modes.

| Test mode                          | Characteristics                                             | Application areas                                                                                                                                                  |
|------------------------------------|-------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| FEC validation                     | Step by step combination test of FEC logic.                 | FEC IP vendor selection<br>FEC IP validation<br>Timing stability over temperature<br>Inter-op debug<br>Host S/W validation for FEC BER                             |
| FEC user mode                      | Ability to precisely position<br>errors within the codeword | Deep troubleshooting of specific logic areas and R&D<br>FPGA team investigation in timing margin stability<br>Firmware teams writing control harness for FEC logic |
| Dynamic FEC power supply integrity | Dynamic error burst                                         | FPGA & ASIC floorplan<br>Signal integrity<br>Power supply design<br>PCB decoupling<br>Maximum power draw & thermal test                                            |

# Summary

FEC is a key element of high-speed Ethernet and plays a critical role in the network. The reliability and interoperability of FEC is mandatory for high speed Ethernet. Without the correct tools the chances of reliable coverage of FEC performance is impossible. The challenges of logical inter-op are further complicated by dynamic concerns, the rare and random issues causing spurious and difficult to troubleshoot events.

The simple FEC overview tools can give a very basic overview of the link health but offer no insight into the true FEC and link performance. Only the Advanced Error Analysis applications on the ONT can drill down into the nature of the errors while the suite of FEC tools is essential to deliver robust and compliant 400GE products that will correctly inter-op across the 400G ecosystem. The logical FEC stress can automatically scan through the stressful error patterns to validate logic. If issues are found the manual error placement tools allow the engineers to drill down to find the root cause. The dynamic FEC stress test loads the FEC logic like no other test and validation application can. Dynamic error bursts stress the whole power supply integrity while dynamically loading the FEC logic, maxing errors, maximum power, maximum dynamics.

Gain complete confidence in the FEC design and implementation with the VIAVI ONT FEC applications.



Contact Us +1 844 GO VIAVI (+1 844 468 4284)

To reach the VIAVI office nearest you, visit viavisolutions.com/contact

© 2020 VIAVI Solutions Inc. Product specifications and descriptions in this document are subject to change without notice. test-validate-fec-ont-wo-opt-nse-ae 30191142 900 0620