IOC and Alarm Handler Release Notes

  • Added IT channels for NFS mount monitoring (IT 6921): it.db, ALH-IT.alhConfig, scripts for setting/getting IT channels, indialh/channelsfiles/IT (generated)

  • Deleted mcs:elDrive:servoPanic record from db and ddChannelMap

  • tcs.db modified to delete MAJOR alarm for SMT weather down

  • Added MODS back into GUI config file (alhConfig file), LUCI was added back after shutdown.

23 May 2017
  • Temporarily deleted LUCI1 from the main alhConfig file since it's off the telescope now.

17 May 2017
  • Updated IT IOC to be just the sample IOC program - it doesn't read the Zabbix log anymore. It is replaced with a script that is triggered by Zabbix alarms and caputs the channels directly.
  • Updated the it.db and ALH-IT.alhConfig to be this new list of channels.

04 May 2017
  • Updated alhConfig files for updates to guidance and a couple other minor fixes - including enabling two alarms that were not enabled, but should be:
    ECS channel ecs:snowMelt:FN1008 and MCS RDG "Cable Chain Drive OK" mcs:rotators:rdg:cok

01 October 2016
  • Added Leak Detection subcategory under ECS/IC. There are channels to monitor leak detection, the NONC valves, manual override condition, temporary disabling of the system, and valves intentionally put into the wrong state.
    • Updated files: ddChannelMap.cfg, tcs.db, and ALH-ECS.alhConfig

08/09 September 2016
  • Updated the TCSIOC to accommodate changes for AO: there is only one summary TSS temperature now, added an arbitrator channel (coming from AO script), and there is no longer complex logic for the TSSOn channel.
  • X. Zhang added more thorough "G"uidance, and for some channels an AO GUI will be raised via the "P" button. These are improvements to the ALH-AO.alhConfig file.
  • Using ddChannelMap.cfg to replace a deprecated AOS data dictionary variables: pwr_status(old) --> pwstatus(new).

16 June 2016 (Only new configuration files)
  • Updated the ddChannelMap.cfg, tcs.db, and ALH-LUCI.alhConfig to accommodate LUCI1 and LUCI2 channels which WARN the observatory regarding a mask being left in turnout for too long.

06 June 2016 (Only new configuration files)
  • Modified the ALH-AO.alhConfig file with regard to the guidance provided.
  • Modified the ALH-LUCI.alhCnfig file to filter alarms on the MOS unit temperature 1. The channel must be in alarm for at least 480 seconds before the alarm is displayed. Also, LUCI2 MOS unit temperature 2 channel is blocked from being displayed.
  • The tcs.db was updated for MODS - the low alarms were deleted for the CCD temperatures.

11 March 2016 (Version 1.9)
  • Modified the TCS IOC to read two text files for AO status information, ao_[s|d]x_alh.dat, which will each contain a single line of data: timestamp (seconds since the epoch), tssTemp[0-5], coil current, cooling flow, and telemetry data. The previous mechanism of SSH/caput directly to the EPICS database was too slow on occasion causing stale data alarms.
    • Added the necessary checks for missing or unreadable files. The IOC will complain after ~180 seconds, but the IOC will not exit. The heartbeat channel will become invalid (white "V").
  • Removed the exit(1) when there is a single error retrieving data dictionary values from the DDS. Keep track of the number of times an error is returned from the DDS (not the same condition as the DDS not being present) and only exit the IOC after DDS_ERROR_THRESHOLD errors have been received. This addresses IT5882.
  • Cleaned up to make the code more robust, particularly computing the difference between time_t and long variables.
  • Several servers were added to the IT portion of the ALH for monitoring: MODS1, MODS2, and MODS2DATA. The MCS EL drive torque channel and the domain controller 1 channel were turned back on for display. The LUCI1 MOS Unit Temperature 2 has been temporarily deleted from the ALHGUI as it is a known problem which is going to persist for a long time.

29 February 2016 (Version 1.8)
  • "Emergency" patch to modify the behavior of the TCS IOC when the IOC receives an error on the data dictionary (e.g., command thread died). This quick fix simply removes the exit(1) action and continues processing. The notice that an error was encountered from the DDS is still issued to the SYSLOG.

23 February 2016
  • Replaced the diagnostic Version 1.7 with the previous code as the source of the alarms on the AdSecSX /DX monitoring scripts was identified to be on the AO side of the equation.
  • Commented out the Mountain Domain Controller 1 channel temporarily as it is off-line for several weeks.

22 February 2016 (Version 1.7)
  • Added diagnostics to be printed to the SYSLOG every time the SX or DX heartbeat updates from the monitoring script running on the AdSecs (~30s), or the TCS IOC has had no update from the monitoring script for the THRESHOLD amount of time (180s). In the latter case, the lack of communication will continue to be reported every ~30s. The diagnostics consist of the heartbeat values actually sent from the SX and DX Adsec machines by the AO mining script, as well as the time(NULL) values from the TCS IOC. This is to help address IT5877.

12 February 2016 (Version 1.6)
  • Updated TCS/MCS channels being monitored: "Net Torque" and "Gallery Doors Drive Interlock".
  • Updated IT channels being monitored: AGW2-CAM and MODS2-CAM. The monitoring of AGW1-CAM has been turned back on.
  • Updated the IT configuration file, ALH-IT.alhConfig, for better support of the "P" button. Instead of invoking a Web page with all the Zabbix alerts, the button now runs several perl scripts which work in concert with one another. They search the log file read by the IT IOC for the specified machine, collect the last five alerts associated with this machine, and display them in time order (latest to oldest) in a Zenity window. All of this is configurable. The Perl scripts were written by Stephen Hooper.

02 February 2016 (Version 1.5)
  • Updated the AO configuration file, ALH-AO.alhConfig, the TCS database file, tcs.db, and the TCS IOC to accommodate DX secondary data.

17 December 2015 (Version 1.4)
  • Updated the ALH-MCS.alhConfig file for the left and right enclosure proximity limits. In order to avoid transient alarms, the ALARMCOUNTFILTER has been added to these channels such that any alarm on these items must be persistent for at least 5 seconds before the alarm is revealed on the ALHGUI.

30 November 2015
  • Updated the logic in TCSIOC (Version 1.3, 30 November 2015) for the "TSS On/Anemometer Off" channel to better logic (as compared against the AOSGUI).
  • Updated some guidance text for the AO channels.

24 November 2015
  • Changes made to the configuration files to supplement safety concerns for the AO/AOS.
    • ddChannelMap.cfg: utilize existing AOS data dictionary items which map to EPICS channels
    • tcs.db: new AO/AOS EPICS channels and thresholds
    • launch-aos[l|r]-gui: (NEW) scripts to launch AOSGUI from the ALHGUI
    • ALH-default.alhConfig: New AO group under the LBTO main group
    • ALH-AO.alhConfig: (NEW) Channels for the new AO/AOS entries being monitored: temperatures, coil current, cooling flow, telemetry data, AO Supervisor state, power status, anemometer status, elevation data availability, and AOS status
  • Changes made to the TCSIOC (Version 1.1, 24 November 2015) to accommodate more complex logic to support the "TSS On, Anemometer Off" channel. This is a temporary change until the logic can be put into the AOS.

16 November 2015
  • MODS "LOW/LOLO" HEB temperature thresholds lowered due to colder weather (tcs.db)
  • Commented MODS1 out of ALH-MODS.alhConfig file since the instrument is unavailable for monitoring
  • mods1-cam, rm525-3, rm525-4, and rm525-5 are now being monitored by Zabbix. These computers were already coded in the ALH-IT.alhConfig configuration file; the lines of code were uncommented.

14 September 2015
  • Added MODS2 to ALH-MODS.alhConfig file

04 September 2014 (Tucson only)
  • Improved the TCS IOC in the following ways:
    • Any errors which happen in the IOC (not the alarm handler) are now reported to the SYSLOG using LOCAL6. The prefix for these messages is ALH_TCS. This means the messages will be reported to stderr (which is only useful if you run the IOC as we are doing on our workstations) and to the TCS SYSLOG (which is useful in general).
    • Added an internal version string to the IOC source which is printed ("TCS ALH/IOC Version 0.2") to the TCS SYSLOG and can be obtained from the IOC executable directly.
    • Any subsystem which is NOT running, as defined by dds.running.sss variable, now shows up with a white INVALID "V" designation versus the red ERROR "E" designation for better clarity. The "SSS Run State" channel previously used for this purpose still currently exists.
    • If the DDS is not running when the IOC is started, the ALH will have white ERROR "E" for all channels. This means the IOC was never started. You will need to start the DDS, kill your ALH and invoke the ALH again - not optimum. This is an issue on programmer workstations only.
    • If the DDS is detected to not be running after the IOC has started, all the TCS IOC channels are displayed as white INVALID "V". Since all channels are set, you will need to acknowledge the "V" once you start the DDS. Do this at the top level (LBTO ack button). Recall we are using a mask on the "SSS Run State" channel so when an individual subsystem goes down and is restarted, the operator does not have to acknowledge the restart. In this case all channels are set, so they must be acknowledged. (SSS = the TCS subsystem name)
    • NOTE: The DDS subsystem only has the "DDS Run State" channel which does not have to be acknowledged. This is why the DDS does not have its acknowledge button set.
    • Trying to catch all errors gracefully and issuing appropriate messages for the error.
    • When parsing the ddChannelMap.cfg:
      • ignores any line where the first character is a blank or a hash which denotes a comment
      • ignores any line with too few or too many tokens (should have 3 tokens)
      • blanks surrounding any token are stripped
      • The datatype must be short or float and specified as all upper- or lower-case characters.
      • The program keeps track of the bad rows and reports them to stdout so the user can fix the issues. The tokens are determined by parsing a line from the input configuration file which has been obtained by getline(). Therefore, the tokens are delimited by a comma.

-- %USERSIG{MicheleDeLaPena - 2014-09-04}%
Topic revision: r21 - 23 Aug 2019, PetrKubanek
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback