Fault Management System Configuration
The Fault Management System (FMS) provides a framework for event detection, correlation, and alarm generation. Each event triggers an alarm based on correlation logic parameters specified by individual Protocol Modules. Events, as OPER_LOGs relayed from the VLOGd module, are processed according to the correlation rules in the configuration file alarm_def_config.yaml. The generated alarms persist to indicate faults and are maintained in a database accessible through show commands.
- FMS is disabled by default. Once enabled, it triggers alarms for all valid OPER_LOG events received by the FMS
node.jsprocess. - The FMS event-alarm correlation configuration is stored in a YAML file (
alarm_def_config.yaml), which cannot be modified via CMLSH commands. If changes are required, an operator with the appropriate privileges can edit the file in YAML syntax, but only before starting FMS. Once FMS is active, editing this file is prohibited, as changes take effect only after FMS is disabled, updated, and then re-enabled. -
The device’s logging level must be set to at least 4 (NOTIFY) to ensure that FMS receives notification events and can take appropriate action. Setting a lower logging level may prevent FMS from receiving clear events, resulting in unresolved active alarms. FMS does not manage the system logging level.
-
FMS relies on the loopback interface (
lo0) for communication with VLOGd, so the operational status oflo0is essential for both FMS and VLOGd. - If Localhost communication is blocked by the Access Control List (ACL), FMS must be disabled. Conversely, if FMS is enabled, the ACL must not block Localhost.
- If FMS reboots due to a device reboot, upgrade, downgrade, or manual restart, interface active alarms will be preserved, and all other active alarms from the previous session will be closed.
Alarm Definition Configuration
FMS uses position-based extraction of resource and qualifier values from log messages as specified in the alarm_def_config.yaml file.
- Extraction is position-based, where tokens in the log message are space-separated.
- This method functions correctly only when the log format remains consistent across both active and clear events for a given alarm type.
- The extracted resource and qualifier values are combined to form the alarm ID, which FMS uses for event correlation.
- The position-based logic does not support alarm types where:
- Log formats differ between event types, such as active and clear messages.
- Resource or qualifier values occur at different positions across message variants.
- In these cases, configuration alone cannot accurately extract the values. Code-level modifications are required to parse log messages using regular expressions or custom logic.
The following example illustrates the configuration of the CMM_MONITOR_CPU alarm in the alarm_def_config.yaml file.
This configuration enables the FMS to extract resource and qualifier string position values from log messages.
- ALARM_ID: 1003
ALARM_TYPE_ID: EQPT
EVENT: CMM_MONITOR_CPU
QUALIFIER_STRING_POSITION:
QUALIFIER_POSITION_1_EVENT_1: 2
QUALIFIER_POSITION_2_EVENT_1: 3
RESOURCE_STRING_POSITION:
RESOURCE_POSITION_1_EVENT_1: 1
Example Log Messages
CPU 15min load avg in Alert level. [Threshold 80% 15min load 90.000%]
CPU 1min load avg in Critical Level. [Threshold 60% 1min load 79.000%]
Qualifier String Extraction
The qualifier string is derived from the second and third positions in the log message.
Example:
From the first log: "15min" (2nd) and "load" (3rd) → combined as 15min_load
From the second log: "1min" (2nd) and "load" (3rd) → combined as 1min_load
Resource String Extraction
The resource string is extracted from the first position of the log message.
In both examples, the resource string is CPU.
Delimiter Configuration
Delimiters used for constructing alarm identifiers and handling whitespace in resource or qualifier strings are defined under the Other_Configurations section of the fms_config.yaml file.
Other_Configurations:
# Delimiter used to join Alarm-Type, Qualifier-String, and Resource
# (e.g., CMM_MONITOR_RAM:usage:Ram)
# Not allowed delimiters: ('@@', '$', ' ', ';', '_')
ALARM_ID_DELIMITER: ":"
# Delimiter that replaces spaces in Qualifier-String (e.g., Uncorrectable_Sector)
# Not allowed delimiters: ('@@', '$', ' ', ':', ';')
QUALIFIER_WHITE_SPACE_DELIMITER: "_"
# Delimiter that replaces spaces in Resource string (e.g., Thermal_Sensor_CPU)
# Not allowed delimiters: ('@@', '$', ' ', ':', ';')
RESOURCE_WHITE_SPACE_DELIMITER: "_"
The above configurations provide flexibility for FMS to generate consistent and well-formatted alarm identifiers across various alarm types.
Final Alarm ID Format
FMS constructs the complete alarm ID by combining the event name, qualifier string, and resource string in the following format:
<Event_Name>:<Qualifier_String>:<Resource>
Resulting Alarm IDs
CPU 15min load avg in Alert level... → CMM_MONITOR_CPU:15min_load:CPU
CPU 1min load avg in Critical Level... → CMM_MONITOR_CPU:1min_load:CPU
FMS applies correlation procedures based on the configurations specified in the below table:
|
Correlation type |
Description |
|||||||||
|
Generalization |
|
|||||||||
|
Time-bound |
|
|||||||||
|
Counting |
Considers a specified number of similar events as one. In this correlation type, the respective alarm will be raised after the event has occurred for count times. |
|||||||||
|
Compression |
Check multiple occurrences of the same event for duplicate/redundant event information, remove the redundancies, and report them as a single alarm. |
|||||||||
|
Severity |
Correlates events based on the severity of the events. |
Implementation Example
FMS was developed with NodeJS with scripts written in JavaScript with a *.js extension and configuration files with a *.yaml extension. These files are in the below paths in OcNOS.
|
/usr/local/bin/js |
JavaScript files ( |
|
/usr/local/etc |
Configuration files ( |
Enabling and Disabling the Fault Management System
Follow the below steps to enable or disable FMS:
Enabling FMS
OcNOS#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
(config)#
(config)#fault-management enable
(config)#
Disabling FMS
OcNOS#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.
(config)#
(config)#fault-management disable
(config)#