Fault Management System Configuration

The Fault Management System (FMS) provides a framework for event detection, correlation, and alarm generation. Each event triggers an alarm based on correlation logic parameters specified by individual Protocol Modules. Events, as OPER_LOGs relayed from the VLOGd module, are processed according to the correlation rules in the configuration file alarm_def_config.yaml. The generated alarms persist to indicate faults and are maintained in a database accessible via show commands.

 

  • FMS is disabled by default. Once enabled, it triggers alarms for all valid OPER_LOG events received by the FMS node.js process.
  • The FMS event-alarm correlation configuration is stored in a YAML file (alarm_def_config.yaml), which cannot be modified via CMLSH commands. If changes are required, an operator with the appropriate privileges can edit the file in YAML syntax, but only before starting FMS. Once FMS is active, editing this file is prohibited, as changes take effect only after FMS is disabled, updated, and then re-enabled.
  • The device’s logging level must be set to at least 4 (NOTIFY) to ensure that FMS receives notification events and can take appropriate action. Setting a lower logging level may prevent FMS from receiving clear events, resulting in unresolved active alarms. FMS does not manage the system logging level.
  • FMS relies on the loopback interface (lo0) for communication with VLOGd, so the operational status of lo0 is essential for both FMS and VLOGd.
  • If Localhost communication is blocked by the Access Control List (ACL), FMS must be disabled. Conversely, if FMS is enabled, the ACL must not block Localhost.
  • If FMS reboots due to a device reboot, upgrade, downgrade, or manual restart, active alarms are closed. Use the show alarm closed CLI command to view closed active alarms.

FMS applies correlation procedures based on the configurations specified in the below table:

Table 31.

FMS correlation procedures

Correlation type

Description

Generalization

Groups two or more events into a single alarm.
A generalized alarm will further use one of the correlation types (none, time-bound, counting and compression) for applying correlation logic to the new alarm.

Time-bound

Stipulates that when the event is received, a timer is started for that event.
While the timer is running, subsequent events of the same type are suppressed.
On the expiry of the timer, an alarm will be raised for that event stating the count for the number of times that event was received in this duration.

Counting

Considers a specified number of similar events as one. In this correlation type, the respective alarm will be raised after the event has occurred for count times.

Compression

Check multiple occurrences of the same event for duplicate/redundant event information, remove the redundancies, and report them as a single alarm.

Severity

Correlates events based on the severity of the events.

Implementation Example

FMS was developed with NodeJS with scripts written in JavaScript with a *.js extension and configuration files with a *.yaml extension. These files are in the below paths in OcNOS.

Table 32.

FMS script and configuration files

/usr/local/bin/js

JavaScript files (*.js files)

/usr/local/etc

Configuration files (*.yaml files)

Enabling and Disabling the Fault Management System

Follow the below steps to enable or disable FMS:

Enabling FMS

Copy
OcNOS#configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
(config)#
(config)#fault-management enable 
(config)#

Disabling FMS

Copy
OcNOS#configure terminal
Enter configuration commands, one per line.  End with CNTL/Z.
(config)#
(config)#fault-management disable 
(config)#