CPU Monitoring Configuration

This procedure outlines the steps to configure Streaming Telemetry with CPU monitoring in OcNOS. CPU monitoring helps prevent overload during high utilization by pausing telemetry sessions when the CPU crosses a configured threshold.

Prerequisites

Ensure that the streaming telemetry is correctly configured with the appropriate sensor groups, destination groups, and subscriptions.

1. Enable streaming telemetry on required VRFs, and set the ports and tunnel retry interval.
Copy
!
ip vrf management
!
ip vrf vrf1
!
feature streaming-telemetry vrf vrf1
 grpc-tunnel-server retry-interval 30
!
feature streaming-telemetry vrf management
 port 35000
 grpc-tunnel-server retry-interval 30
!
feature streaming-telemetry
 port 36000
 grpc-tunnel-server retry-interval 30
!
2. Define sensor groups and paths to device metrics that are published as telemetry data.
Copy
sensor-group int vrf vrf1
 sensor-path ipi:/components/component[name="RAM"]/ram/state
 sensor-path ipi:/components/component[name="CPU"]/cpu/state
 sensor-path ipi:/components/component[name="HARD-DISK"]/storage/state
 sensor-path ipi:/components/component[name="PSU-1"]/power-supply/state
 sensor-path ipi:/components/component[name="FAN-3"]/fan/state
 sensor-path ipi:/interfaces/interface[name="eth0"]/state
 sensor-path ipi:/components/component[name="CHASSIS"]/state
 sensor-path /interfaces/interface[name="eth0"]/state
 sensor-path /interfaces/interface[name="xe4"]/state/counters
 sensor-path /components/component[name="RAM"]/state/memory
 sensor-path /components/component[name="CPU"]/cpu/utilization/state
 sensor-path /components/component[name="HARD-DISK"]/state
 sensor-path /components/component[name="HARD-DISK"]/state/memory
 sensor-path /components/component[name="HARD-DISK"]/state/temperature
 sensor-path /components/component[name="CHASSIS"]/state
 sensor-path /components/component[name="CHASSIS"]/state/memory
 sensor-path /components/component[name="CHASSIS"]/state/temperature
!
sensor-group ipi vrf management
 sensor-path ipi:/components/component[name="RAM"]/ram/state
 sensor-path ipi:/components/component[name="CPU"]/cpu/state
 sensor-path ipi:/components/component[name="HARD-DISK"]/storage/state
 sensor-path ipi:/components/component[name="PSU-1"]/power-supply/state
 sensor-path ipi:/components/component[name="FAN-3"]/fan/state
 sensor-path ipi:/interfaces/interface[name="eth0"]/state
 sensor-path ipi:/components/component[name="CHASSIS"]/state
 sensor-path /interfaces/interface[name="eth0"]/state
 sensor-path /interfaces/interface[name="xe4"]/state/counters
 sensor-path /components/component[name="RAM"]/state/memory
 sensor-path /components/component[name="CPU"]/cpu/utilization/state
 sensor-path /components/component[name="HARD-DISK"]/state
 sensor-path /components/component[name="HARD-DISK"]/state/memory
 sensor-path /components/component[name="HARD-DISK"]/state/temperature
 sensor-path /components/component[name="CHASSIS"]/state
 sensor-path /components/component[name="CHASSIS"]/state/memory
 sensor-path /components/component[name="CHASSIS"]/state/temperature
!
sensor-group ipi1
 sensor-path ipi:/components/component[name="RAM"]/ram/state
 sensor-path ipi:/components/component[name="CPU"]/cpu/state
 sensor-path ipi:/components/component[name="HARD-DISK"]/storage/state
 sensor-path ipi:/components/component[name="PSU-1"]/power-supply/state
 sensor-path ipi:/components/component[name="FAN-3"]/fan/state
 sensor-path ipi:/interfaces/interface[name="eth0"]/state
 sensor-path ipi:/components/component[name="CHASSIS"]/state
 sensor-path /interfaces/interface[name="eth0"]/state
 sensor-path /interfaces/interface[name="xe4"]/state/counters
 sensor-path /components/component[name="RAM"]/state/memory
 sensor-path /components/component[name="CPU"]/cpu/utilization/state
 sensor-path /components/component[name="HARD-DISK"]/state
 sensor-path /components/component[name="HARD-DISK"]/state/memory
 sensor-path /components/component[name="HARD-DISK"]/state/temperature
 sensor-path /components/component[name="CHASSIS"]/state
 sensor-path /components/component[name="CHASSIS"]/state/memory
 sensor-path /components/component[name="CHASSIS"]/state/temperature
!
3. Configure destination groups that defines where telemetry data should be sent. Establishes IP and port of remote collector (e.g., a telemetry server or analytics engine).
Copy
destination-group des2 vrf vrf1
 tunnel-server ip 20.1.1.2 port 10300
!
destination-group des1 vrf management
 tunnel-server ip 10.16.99.109 port 10600
!
destination-group des3
 tunnel-server ip 40.1.1.2 port 10200
!
destination-group des1
!
4. Create persistent subscriptions and bind a sensor group to a destination group, specifying how often data should be sent.
Copy
subscription-name sub1 vrf vrf1
 destination-group des2
 sensor-group int sample-interval 10
!
subscription-name sub2 vrf management
 destination-group des1
 sensor-group ipi sample-interval 10
!
subscription-name sub3
 destination-group des3
 sensor-group ipi1 sample-interval 10
!
!

Topology

The figure below illustrates the OcNOS gNMI telemetry environment with both dial-in and dial-out connections across three VRFs: default, management, and vrf1. The OcNOS target streams telemetry data to gNMI collectors using predefined sensor groups. When a 5-minute CPU average usage exceeds the configured 30% threshold, the gNMI server enters a PAUSE state, suppressing telemetry and rejecting new subscriptions until CPU load decreases.

Key elements:

Dial-In Clients: gNMI collectors initiate subscriptions via default, management (port 35000), and vrf1 tunnel-server.
Dial-Out Subscriptions: OcNOS pushes telemetry to remote collectors at specified destinations per VRF.
Telemetry Flow: Each VRF associates a subscription name, destination collector, and sensor group.
CPU Monitor Logic: If CPU > configured 30% threshold, telemetry streaming is paused, as shown by the PAUSE state indicator.

Figure 12. Telemetry CPU Monitoring

Enable CPU Monitoring and Set Suppression Threshold

To prevent high CPU utilization from impacting telemetry responsiveness, enable CPU monitoring to manage system load dynamically. And define a suppression threshold (CPU usage percentage 30%) above which telemetry is paused.

Copy
OcNOS(config)# telemetry cpu-monitor enable
OcNOS(telemetry-cpu-monitor)# suppress-threshold 30
OcNOS(telemetry-cpu-monitor)# commit
OcNOS(telemetry-cpu-monitor)# end

gNMI Client Behavior When CPU Monitoring is Active

If CPU usage exceeds the configured threshold, the gNMI server enters the PAUSE state. New telemetry subscription requests are paused until the system CPU usage drops below the configured threshold. Below is one such scenario, in which the message confirms that the telemetry server has paused accepting new subscriptions to reduce CPU load.

Copy
# ./gnmic -a 10.16.154.118:35000 -u ocnos -p ocnos --encoding json_ietf --insecure --mode STREAM --stream-mode sample --sample-interval 90s subscribe --path 'ipi:/interfaces/interface[name="eth0"]/state' --log
2025/06/11 11:57:50.415698 [gnmic] version=dev, commit=none, date=unknown, gitURL=, docs=https://gnmic.openconfig.net
2025/06/11 11:57:50.415717 [gnmic] using config file ""
2025/06/11 11:57:50.415822 [gnmic] starting output type file
2025/06/11 11:57:50.415861 [gnmic] queuing target "10.16.154.118:35000"
2025/06/11 11:57:50.416005 [file_output:default-stdout] initialized file output: {"Cfg":{"FileName":"","FileType":"stdout","Format":"json","Multiline":true,"Indent":"  ","Separator":"\n","OverrideTimestamps":false,"AddTarget":"","TargetTemplate":"","EventProcessors":null,"MsgTemplate":"","ConcurrencyLimit":1000,"EnableMetrics":false,"Debug":false}}
2025/06/11 11:57:50.416085 [gnmic] starting target "10.16.154.118:35000" listener
2025/06/11 11:57:50.416156 [gnmic] subscribing to target: "10.16.154.118:35000"
2025/06/11 11:57:50.417046 [gnmic] target "10.16.154.118:35000" gNMI client created
2025/06/11 11:57:50.417086 [gnmic] sending gNMI SubscribeRequest: subscribe='subscribe:{subscription:{path:{origin:"ipi" elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"\"eth0\""}} elem:{name:"state"}} mode:SAMPLE sample_interval:90000000000} encoding:JSON_IETF}', mode='STREAM', encoding='JSON_IETF', to 10.16.154.118:35000
2025/06/11 11:57:50.418932 [gnmic] target "10.16.154.118:35000": subscription default-1749643070 rcv error: rpc error: code = Unimplemented desc = GNMI Server is in PAUSE state. Please retry later when system CPU load is lesser
2025/06/11 11:57:50.418961 [gnmic] target "10.16.154.118:35000": subscription default-1749643070 rcv error: retrying in 10s
2025/06/11 11:58:00.419755 [gnmic] target "10.16.154.118:35000": subscription default-1749643070 rcv error: rpc error: code = Unimplemented desc = GNMI Server is in PAUSE state. Please retry later when system CPU load is lesser
2025/06/11 11:58:00.419881 [gnmic] target "10.16.154.118:35000": subscription default-1749643070 rcv error: retrying in 10s