Alarm Handler Software
The Alarm Handler software is a third-party Experimental Physics and Industrial Control System (EPICS) package (
http://www.aps.anl.gov/epics/index.php). It is a client program adapted for LBTO. The Alarm Handler is an interactive GUI which monitors and displays warning and alarm states.
Update 2020/02/05
The EPICS ioc has migrated to a new VM (named ioc.mountain, previously aliased to tcs2).
It is now running a single ioc that contains both all the channels for tcs, AO, instruments and IT, and eliminates the need for a caGateway.
The system is automatically built by Jenkins, and deploys as RPM: lbto-ahs (for the
ALH GUI client for the obs workstations), lbto-ahs-srv for the services running on ioc.moutain.
The ioc restart instruction are now:
ssh ioc@ioc.mountain
sudo systemctl {start|stop|status|restart} lbto-ahs-ioc
Running the GUI on the obs machines remains the same:
ALH &
Using the Alarm Handler on the Mountain
The Alarm Handler is comprised of a few software components: the Alarm Handler (
ALH) GUI, and a few processes which collect information to feed the
ALH. The Alarm Handler client software receives data from EPICS Input/Output controllers (IOCs). IOCs are configured from an EPICS "database", which is actually just a file containing record definitions. The LBTO currently uses a single IOC containing the
TCS subsystems (including some instrument data), and the IT computers.
TCS data is collected via the data dictionary (via DDS service) and has some channels set via scripts. IT channels are set from a script monitoring Zabbix.
AGWs data is set from oacontrol process running on oac.mountain.
MODS data is set from an alarm cron job running on "obs3" using caput. AO data is collected from files generated by cronjobs running on the adsecs machine, copied over scp to ioc.mountain.
digraph Architecture {
rankdir=LR
graph [splines=ortho, nodesep=0.8]
node[shape=record]
struct1 [label=" TCS|| DDS| IOC (IT)| IOC(DDS)| IOC DBFile"]
OBS[label="OBS HOST\n\nALH\n(GUI)"]
web[label="web.tucson\n\nStatus Server\nINDI"]
fs[shape=cylinder,label="Filesystem\nMounts\n\n(NAS/SAN)"]
struct1:DDS -> web [dir=none, xlabel="INDI\n/TCP"]
struct1:IOC_IT -> web[style=dashed, dir=none]
struct1:IOC_DDS -> web [style=dashed, dir=none]
fs -> struct1:IOCDBFILE
OBS -> fs
fs -> OBS
}
To use the Alarm Handler effectively, the IOC process must be running. It should be started automatically on reboot of the
ioc.mountain
host.
The IOC can be controlled via the following commands:
ssh ioc@ioc.mountain.lbto.org
sudo systemctl {start|stop|status|restart} lbto-ahs-ioc
The
ioc
account does not have a password, it is a blessed account.
The IOCs log to journald, where all of the other
TCS info is. You can inspect its output using:
journactl -f
With the IOC running, a user can then invoke the command script,
ALH, on the observing workstations. Only the
telescope operator should actually run the
ALH as
user telescope as this gives the operator the ability to acknowledge alerts, and these actions are catalogued in Alarm Handler log files. All other users should invoke the
ALH under a different account as this will open the main Alarm Handler window and ensures any random user does not overwrite the Alarm Handler log files.
ssh -Y observer@obsN.mountain.lbto.org (login to obsN with X11 forwarding enabled)
ALH &
There are two Alarm Handler log files, ALHAlarm.log.yyyy-mm-dd and ALHOpmod.log.yyyy-mm-dd, where the "Alarm" file catalogues the various alarms and the "Opmod" file catalogues the telescope operator actions. These files are stored in
/lbt/data/logs/alh and are gzipped when necessary.
The
ALH is presented in hierarchical format. For
user telescope only, the first GUI to launch is a small single button GUI; for all other users, the main Alarm Handler window will launch.
As
user telescope click on the single button GUI, and a second GUI which is the main Alarm Handler window will launch. There are two panels, left and right. The left panel shows a hierarchical "tree" structure. Each item that has a triangular "arrow" to the right can be expanded by clicking on the triangle. The hierarchy expands downward. Selection on any named button (not triangle) in the left panel expands the bottom level signals into the right panel. The lowest level signals (alarm channels in EPICS jargon) are colored blue. Alarms, if present, will have either Y (yellow) or R (red) indicators next to the name. Obvious!
The width of the left and right panels can be adjusted using the bottom slider. The GUI can be sized with normal corner stretch mouse commands.
In the right panel, selection of the "G" button will typically bring up "Guidance" which can be free form text or a URL reference. For IT channels the "G" button brings up the ComputerRoom/Rack/U location. Selection of a "P" button will launch a
TCS GUI or for the IT channels, a browser will be invoked displaying all the IT alerts as detected by Zabbix. It is also possible to launch other commands (shell scripts, etc.) from the "P" buttons.
Alarms can require "acknowledgement". Alarms can be acknowledged at any level of the heirarchy (right panel, left panel, at any level). To acknowledge, click the left button next to the signal. Acknowledging an alarm does nothing to
TCS or any of the computer systems. It is an operator only action that is logged and resets the state on the alarm handler.
For more interesting use cases, refer to the Alarm Handler User Manual (
http://www.aps.anl.gov/epics/extensions/alh/index.php).
If there are "white" alarms on your ALH (E or V), either:
- the corresponding interface "IOC" may not be running, or
- the TCS DDS is broken (less likely).
NOTE: Portions of the AO system are currently being checked by scripts written by Xianyu and running on the AdSec machines. These scripts are a temporary stopgap until the AOS subsystem is upgraded. If the AO portion of the ALH displays as "white", you will need to start up these scripts.
Follow the above directions to start the IOCs, or invoke the TCSGUI to start the DDS. As noted above the IOCs can and should be controlled as
user telescope via the /etc/init.d commands. To verify the IOCs are running, check some channels (note the
MODS channels are only written every 10 minutes, so if you have just started the IOC you will not see them for several minutes):
> source /lbt/epics/setup.csh
> caget lbc:side1:dewarTemp
lbc:side1:dewarTemp 167.207
> caget mods1:rtemp
mods1:rtemp -99
> caget ecs:airCompr:CV0408
ecs:airCompr:CV0408 4
Alarm Handler Log File
There are two log files associated with the alarm handler. Both of these files are located in /lbt/data/logs/alh. The alarms are logged in ALHAlarm.log.YYYY-MM-DD, and operator actions are logged in ALHOpmod.log.YYYY-MM-DD. Please note that operator actions are limited to the user "telescope". The alarm log file contains seven basic columns of information: date/time, channel name (aka process variable name), status, severity, unacknowledged severity, acknowledge option, and value.
Alarm Handler Configuration Files (LBTO Public Configuration Files)
Instead of having one large configuration file, the LBTO configuration file,
ALH-default.alhConfig, is comprised of the names of its constituent configuration files. In this way, each
TCS Subsystem, IT section, or LBT Instrument has its own configuration file which can be maintained independently. All of these "public" configuration files are located in /home/telescope/TCS/Configuration/ALH/LBTALHConfigure and have names such as
ALH-xxx.alhConfig, where xxx =
AOS, DDS,
ECS, etc.
To view the current alarm handler configuration file, use the GUI "VIEW" pulldown menu for "CONFIGURATION FILE WINDOW".
Configuration File Description
The alarm configuration file is the file used as input to the Alarm Handler. This file defines the Alarm Group structure and takes data in a flexible input format. The alarm configuration file, which can be prepared via any text editor, defines a complete Alarm Group structure composed of subgroups and channels. The arrangement of channels and subgroups follow the standard tree structure. The subgroups always terminate at channels. The only input format constraint is that the definitions must be in hierarchical order. That is, after a group is defined in the configuration file as belonging to a parent group, all its subgroups and channels must be defined in the configuration file before a new group belonging to the same parent group can be defined. There can be only one top-level group (main group) and this group must have NULL as the parent group name. For each group or channel, a set of input specifications is used to define special events to be taken care of at start-up time.
File Name
When opening a new configuration file, ".alhConfig" will be used as the default suffix. The default file name for the alarm configuration file is
ALH-default.alhConfig.
The configuration file statements for a given group or channel takes flexible input format which can consist of the following items:
GROUP parentName GroupName
CHANNEL parentName ChannelName <mask>
INCLUDE parentName fileName
$GUIDANCE
$END
$GUIDANCE urlAddress
$ALIAS anyValidTextString
$COMMAND anyValidCommand
$SEVRCOMMAND severityChangeValue anyValidCommand
$ALARMCOUNTFILTER inputCount inputSeconds
$BEEPSEVERITY ALHbeepSeverity
$BEEPSEVR GroupOrChannelBeepSeverity
Input syntax notes:
- The fields enclosed in <> are optional.
- Blanks can be used to separate the fields for improved readability.
- The GROUP or CHANNEL line must be first line in a set.
- Lines starting with "#" are comments.
- Lines starting with "$" are optional.
- The [$GUIDANCE ... $END] must be entered as a set if text guidance is present.
Group or Channel
A set of group or channel lines must appear in the alarm configuration file. These lines define the Alarm Group structure. The first line is the top level Alarm Group definition. There can be only one top-level Alarm Group and this Alarm Group must have NULL as the parent group name. Group or Channel lines must start with the keyword GROUP or CHANNEL. The GroupName is the name of a user specified Alarm Group. The ChannelName must be the name of a specific record defined in an EPICS database. The parentName is the name of the parent Alarm Group. There is no restriction on the number of group definitions.
GROUP parentName GroupName
The channel <mask> is optional and defaults to no mask (i.e. -----). It is required only for a channel with a non default mask setting. The detailed description of mask settings is given in
Alarm Channel Mask in this Chapter.
CHANNEL parentName ChannelName <mask>
Include File
The line starting with INCLUDE allows a user to designate, within his alarm configuration file, the name of another alarm configuration file to be read by the Alarm Handler at runtime. The main Alarm Group of the designated file will become a child group of the parentName group specified on the INCLUDE line.
INCLUDE parentName fileName
Guidance
The lines starting with $GUIDANCE are optional. They are required only when a user wants to display alarm guidance information for a group or channel. The $GUIDANCE line may be followed by a set of ascii guidance text lines with an $END line to terminate the guidance text, or alternatively, the $GUIDANCE line may contain a url address.
$GUIDANCE
<text lines>
$END
or
$GUIDANCE urlAddress
Alias
The line starting with $ALIAS is optional. It is required when it is desired that the alarm handler display the specified alias text string in places where it would normally display the Alarm Group or Alarm Channel name.
$ALIAS anyValidTextString
The line starting with $COMMAND is optional. It is required only when a user wants to provide the feature of starting a related process for this group or channel. When the alh operator clicks on a Process Button for an alarm group or channel in the alarm handler Main Window display one of two things occurs. If there was a single related process specified on the $COMMAND line for the group or channel, that process is invoked. If multiple processes were specified on the $COMMAND line, a popup menu of the related process names appears and the related process selected by the user is invoked.
A single command is specified as follows:
$COMMAND anyValidCommandSyntax
Multiple commands are specified with command names and command strings separated by exclamation points, "!", using the following syntax
$COMMAND cmd_1_name!cmd_1_string!cmd_2_name!cmd_2_string!...cmd_n_name!cmd_n_string
Severity Command
The line starting with $SEVRCOMMAND is optional. It is required if a process should be invoked when the alarm severity value for a group or channel changes. A single group or channel may have multiple $SEVRCOMMAND lines. This line defines the change in the severity necessary to start the process and defines the process to be started.
Valid severity change values are -
UP_INVALID, UP_MAJOR, UP_MINOR, UP_ANY, DOWN_MAJOR, DOWN_MINOR, DOWN_NO_ALARM, DOWN_ANY, UP_ALARM
Alarm Count Filter
The line starting with $ALARMCOUNTFILTER is optional. It is required only when the alarm handler should filter the registration of alarms for a channel. This line defines the alarm count and seconds required for alarm registration. To register as an alarm, a channel must remain in an alarm state for more than inputSeconds seconds or the channel must enter into an alarm state from a no-alarm state more than inputCount times within inputSeconds seconds.
If inputCounts is zero, inputCounts is not used in determining alarm/no-alarm states, only inputSeconds is used to filter a channel going in and out of alarm state. If inputCounts is -1, inputSeconds only is used for filtering a channel going from no-alarm to alarm state, and a change in channel from alarm state to no-alarm state is not filtered.
$ALARMCOUNTFILTER inputCount inputSeconds
Beep Severity
The line starting with $BEEPSEVERITY is optional. It is required only when the alarm handler should filter the beeping if alarms are present. This line defines the minimum severity level required for beeping. Beeping will not occur when the highest outstanding severity is less than the specified severity. Valid severity values are MINOR, MAJOR, INVALID, and ERROR.
$BEEPSEVERITY severity
Alarm Channel Mask (and Forced Masks)
Associated with each Alarm Channel are two five bit masks (default and current, shown in <-----> values to the right of the alarm in the right side
ALH display). The current mask can be changed by force commands issued by the "Action" GUI menu item "FORCE MASK" . The default mask is defined in the alarm configuration file. A reset command forces all associated masks to return to the default values.
The definition for each bit in the mask value follows:
Add/Cancel Alarm
If cancel is active for a channel, the IOC will not send alarm events to the alarm handler for that channel. This has the effect of suppression (only for the alarm handler...not the
TCS!).
Enable/Disable Alarm
Alarms aren't/are ignored by the alarm handler. Disabling an alarm has the effect of no display and NoAck. If an alarm is disabled the alarm status and severity are not displayed. Thus, even though an alarm is in effect, it always appears to the operator to be in the NO_ALARM state. Alarm change of states will, however, still be logged unless NoLog is in effect.
" D" means alarm disabled.
Ack/NoAck
The operator is/isn't required to acknowledge alarms.
" A" means alarm acknowledgment is not required.
Ack/NoAck Transient Alarms
The operator is/isn't required to acknowledge transient alarms. A transient alarm is one that enters alarm state and then returns to normal before the operator can acknowledge the alarm.
" T" means acknowledgment of transient alarms is not required.
Log/No Log Alarms
Alarms will/won't be logged.
" L" means no alarm logging .
Analog Data vs Severity Codes
Most
TCS alarm data is reported in the form of "severity codes". These values are integers ranging from 1-4 (Major, Minor, Intentional, and OK respectively). The alarm handler also is capable of receiving analog information (temperatures and pressures are common). The acceptable ranges for analog information are contained in the tcs.db file (maintained by software group). If spurious alarms are triggering for analog values, tuning may be required. The alarm handler current alarm history will give the value as well as the type of alarm that has been triggered. If the values look wrong, send a note to software support and we will update them.
Alarm Handler Guidance Pages
The guidance pages for the systems monitored by the Alarm Handler are found here:
AlarmHandlerGuidance.
EPICS Gateway
With the transition to 64-bit
TCS machines, some portions of the
TCS network were divided into subnets. On the
TCS host subnet, UDP broadcasts were disabled. The IOCs running on TCS1 were no longer able to function correctly with
ALH running on the OBS1 host subnet. To keep all IOCs running on the TCS1 host, an EPICS gateway component was added to TCS1. This component allows all IOCs (now and any in the future) to be "hidden" in the shadow of the gateway.
ALH talks to the gateway, and the gateway talks to IOCs. The gateway runs on port 5066 to avoid port 5064 contention with IOCs. Any contention between IOCs for port 5064 won't matter (the gateway will find all IOCs on TCS1 due to using 127.255.255.255 localhost broadcast).
Some useful gateway links:
The biggest problem we found putting the gateway into use was that EPICS automatically enforces access controls. We had to introduce a file with the following for the access control:
ASG(DEFAULT) {
RULE(1,READ)
RULE(1,WRITE)
}
Basically, setting the default access-security-group to READ and WRITE. See
Database Access Security in the EPICS documentation. Some day, access control may come in handy, but until then, it's just a pain that we can avoid using with the above file.
Release Notes
Alarm Handler Details for Developers
Software Group Presentations
The following presentations provide some background material on
TCS events, data dictionary items, and how these ultimately could be ultilized to support the alarm handler facility.
Build and Install
The build is handled by Jenkins (currently on f30,
http://cisrv01.tucson.lbto.org/job/f30/job/ahs).
The release use standard dnf method using the lbto yum repository.