How to recover ARGOS
Procedure to recover from BCU crash
Usual symptom: CCD Viewer does not update
- Set LAS and LAN in simulation (Arbitrator GUI, Tab: Expert->Commissioning)
- Open LGSW engineering GUI and go to Tab 'Rack I/O' (Press
beside LGSW)
- Reset BCU by 'BCU Reset' 'ON' and then 'OFF' after a few seconds
- On the 'Main' tab of the arbitrator click on 'Clear Error'.
- Go through power up sequence until LGSW is in 'STARTED UP"
- Put LAS and LAN back in operation (Arbitrator GUI: Tab Expert-> Tab Commissioning)
- Press 'Prepare'
Troubleshoot:
- Item 6 fails with not allowed to change state: LAN, LAS have not been in 'STARTED UP' or 'READY_TO_OBSERVE'
- Start power up sequence from scratch with LAN, LAS in operation, maybe you have to subvert the arbitrator state to UNKNOWN before.
- If connection with AOS, FLAO or AdSec is not working well:
- Restart Arbitrator repeat procedure (remember to restart the arbitrator GUI if you restart the arbitrator process)
Procedure to recover from DEAD_DEVICE_SERVER
- Restart the dead server (there might be some fixing necessary, because normally they do not die without reason)
- Clear the Error with 'Clear error'
- Open LAS controller GUI, Tab 'Commissioning', skip to the previous state (at night time normally 'READY_TO_OBSERVE'), close GUI
- Open LAN controller GUI, Tab 'Commissioning', skip to the previous state (at night time normally 'READY_TO_OBSERVE'), close GUI
- Set LAS and LAN in simulation (Arbitrator GUI: Tab Expert -> Tab Commissioning)
- Go through power up sequence until subsystems are synchronized again (normally Prepare)
- [Night time only] Put LAS and LAN back in operation (Arbitrator GUI: Tab Expert -> Tab Commissioning)
Procedure to recover from "No slopes are received by ASM", which is equal to failure of fast link to ASM
This could be the fiber switch but experience showed it is something with the
BCUs and up to now only the big hammer is known as solution.
- Open LGSW engineering GUI and go to Tab 'Rack I/O' (Press
beside LGSW)
- Reset BCU by 'BCU Reset' 'ON' and then 'OFF' after a view seconds
- Do an Adsec_stop and follow the Adsec_stop recovery procedure
Procedure to recover from Adsec_stop
In principal after an Adsec_stop/Adsec_start everything reconnects and looks fine. However experience showed, that it is not the full truth.We experienced often, that after such intervention with the next preset FLAO fails and after some debugging we normally ended in restarting FLAO.
Then, with the next preset often ARGOS had no communication. Therefore, we recommend to restart FLAO and ARGOS arbitrator immediately
after such intervention.
- In parallel:
- Start AdSec
- Set w-unit to "Operate" and start software (w_start)
- Start ARGOS arbitrator (monit interface in argos-sx-lgsw:2812 or argos-dx-lgsw:2812 respectively)
- Reconnect AOS to ARGOS
- Put LAS and LAN in simulation (Arbitrator GUI: Tab Expert -> Tab Commissioning)
- Go through power up sequence until subsystems are synchronized again (normally Prepare)
- [Night time only] Put LAS and LAN back in operation (Arbitrator GUI: Tab Expert -> Tab Commissioning)
Procedure to restart AOS
- Loop should be opened if not already done by itself
- Restart AOS from TCSGUI
- Set shell from AdSec Controller GUI
- Set focal station to "bentGregorianArgos" in the "Focal station" tab of the AdSec Controller GUI
- Put LAS and LAN in simulation (Arbitrator GUI: Tab Expert -> Tab Commissioning)
- Go through power up sequence until subsystems are synchronized again (normally Prepare)
- [Night time only] Put LAS and LAN back in operation (Arbitrator GUI: Tab Expert ->Tab Commissioning)
Procedure to restart ARGOS arbitrator
- Restart ARGOS arbitrator (monit interface)
- [Optional] Restart ARGOS arbitrator GUI. Sometimes the arbitrator GUI gets disconnected (i.e. states show wrong status)
- Put LAS and LAN in simulation (Arbitrator GUI: Tab Expert ->Tab Commissioning)
- Go through power up sequence until subsystems are synchronized again (normally Prepare)
- Put LAS and LAN back in operation (Arbitrator GUI: Tab Expert ->Tab Commissioning)
Troubleshoot LAT
LAT can fail or not align for following reasons:
- Do not reach patrol field with LAT alignment
- Nominal positions can be wrong
- Rarer case: LM1 is not working properly
- Rarer case: Calibration is wrong
- Alignment algorithm fails
ARGOS Is not connected to the AOS
- Is the AOS enabled for ARGOS? This is done automaticualy when the telescope is authorizedfor LUCI-ARGOS.
- Was there other AO work done during the day? e.g. LBTI or LN? If so you must restart the AOS.
Network card failure
- Some commands to diagnose network and network cards and the configuration of ARGOS network can be found under https://wiki.lbto.org/Software/WikiIPAddressesAtLBT and https://wiki.lbto.org/Software/WikiARGOSWorkstations
- The network configuration is also part of the repository, update there for permanent changes!
- If the network cards have been reconfigured on an LGSW machine make sure that the argos_?X_BCU.cfg is updated with the proper MAC address for diagnostics.
- If system is not working immediately on the LGSW machine try also a power cycle of the LGSW cabinet not only process restarts.
- Don't forget to checkin config files into the repository for permanent updates!
--
LorenzoBusoni - 23 Feb 2018