HPC or Artificial Intelligence Networking

Dynamically Adjusts Explicit Congestion Notification Marking Threshold Values

OcNOS is enhanced to support lossless Ethernet fabrics for AI/ML workloads through Dynamic Explicit Congestion Notification (D-ECN) method on Broadcom Tomahawk5 platforms.

D-ECN allows users to adjust the ECN thresholds using D-ECN-ON-Offset and D-ECN-OFF-Offset settings that provides capability to enable precise congestion marking based on shared buffer usage.

Unlike traditional methods that depend on packet drops, D-ECN enhances efficiency by marking IP headers to indicate congestion, prompting receivers to signal senders to adjust transmission rates.

For further details, refer to Dynamic ECN Marking section in the OcNOS Quality of Service Guide, Release 7.0.0.

PFC Deadlock Detection and Recovery

OcNOS now supports Priority Flow Control (PFC) Deadlock Detection and Recovery. It prevents network congestion and improves performance in data transmission. It works by allowing the transmitter to dynamically adjust the amount of data sent to the receiver based on the receiver's ability to process the data.

This enhancement introduces mechanisms to detect and recover from PFC deadlocks, ensuring traffic flows are restored automatically without manual intervention. It provide the following capabilities:

Per-interface enablement of PFC deadlock detection and recovery.
Timer-based monitoring to identify persistent XOFF conditions.
PFC State XON mode to restore traffic once congestion clears.
Global action mode to automatically drop traffic in deadlock scenarios if configured.

For more details, refer to the Priority-based PFC Deadlock Detection and Recovery section in the OcNOS Layer 2 Guide, Release 7.0.0.

For more details, refer to the PFC Deadlock Detection and Recovery topic in the OcNOS Layer 3 Guide, Release 7.0.0.

PFC Frames and ECN Packets Monitoring

OcNOS now supports monitoring of Priority-based Flow Control (PFC) pause frames and Explicit Congestion Notification (ECN) marked packets.

PFC (IEEE 802.1Qbb) provides per-priority flow control by pausing traffic for specific classes, preventing congestion and improving link utilization.

ECN (RFC 3168) enables end-to-end congestion signaling in TCP/IP networks by marking packets instead of dropping them, prompting the sender to reduce its transmission rate until congestion clears.

It supports the following capabilities:

Monitoring of ECN-marked packets on an interface.
Monitoring of PFC pause frames on an interface.

For more details, refer to the PFC Frames and ECN Packets Monitoring section in the OcNOS Layer 2 Guide, Release 7.0.0.

For more details, refer to the PFC Frames and ECN Packets Monitoring topic in the OcNOS Layer 3 Guide , Release 7.0.0.

Switch Packet Buffer Tuning

This release introduces Network Switch Packet Buffer Tuning, a system designed to enhance network switch performance by avoiding congestion and packet drops. This feature allows for the allocation of packet buffer size based on traffic priority classes, known as Priority Groups (PGs), instead of physical ports.

Key Enhancements Include:

• Custom device responses to Priority-based Flow Control (PFC) pause storms, enabling precise control over when the switch transmits pause frames to prevent packet loss.

• Priority Group (PG) configuration with specific limits on shared memory and the ability to set PFC X-OFF and X-ON offsets to trigger pause frames during congestion.

• Queue-specific buffer limits using a dynamic threshold (alpha value) for fine-grained control over buffer consumption from the shared pool.

• Global adjustment of buffer limits, simplifying configuration.

Supported Platforms: This feature is intended for LTSW chipsets (Tomahawk4 (TH4) platforms, Tomahawk5 (TH5) platforms, Trident4 (TD4) platforms) and DC chipsets (Tomahawk3 (TH3) platforms, Trident3 (TD3) platforms).

For more details, refer to the Switch Packet Buffer Tuning section in OcNOS Quality of Service Guide, Release 7.0.0.

ECN and PFC Support for Lossless VxLAN Transport

OcNOS 7.0 enables Explicit Congestion Notification (ECN) and Priority Flow Control (PFC) operation over VxLAN overlays, allowing operators to extend lossless transport capabilities across multi-tenant AI fabrics and frontend network.

These enhancements provides:

Scalable Layer 2 and Layer 3 multi-tenancy.
End-to-end lossless transport across overlay networks.
Seamless integration of AI workload isolation with high-performance GPU fabric requirements.

For more details, refer to the Unified ECN and PFC Support for Lossless VxLAN Transport section in the OcNOS Virtual Extensible LAN Guide, Release 7.0.0.