Full Shutdown and Restart of the TCS

These directions describe the full shutdown and restart procedures for the LBT Telescope Control System (TCS). In the event the software seems to have serious problems, these are the procedures which are the best guarantee for a good, fresh restart. Please note that there are certain circumstances which do not require a full shutdown. Assistance in identifying the specific problems encountered on the mountain systems is not addressed in this document.

The following text conventions are used in this document.
Descriptive information is in this font.
Words you should type are in a bold fixed font.
Computer responses are in a smaller fixed font.

Note: Please notify telescope work and instruments mailing lists when TCS is restarted because some instrument software needs restart.

Background Information

Before you begin:
  • The downtown cluster computer name is tcs-test
  • On the mountain the cluster computer names are tcs1 and tcs2 but most control can be done from any obs computer.
  • MCSPU machine is jet
  • AGW control is oac
  • Become user "telescope" on robs1 downtown, any obs computer on mountain before performing these operations.
  • To login to a mountain cluster computer you must become user telescope on an obs computer and then do
    ssh tcs@tcsn where n is 1 or 2. No password is needed.
    The user tcs has sudo permission.
    (Note, however, that developers may directly log into tcs1, tcs2 as user tcs from their desktops.)
  • To login to tcs-test first become user telescope on robs1 and then do ssh tcs@tcs-test. No password is needed.The user tcs has sudo permission.
    (Note, however, that developers may directly log into tcs-test as user tcs from their desktops.)

Software Systems
AOS - Adaptive Optics Subsystem
DDS - Data Dictionary Server subsystem
ECS - Enclosure Control Subsystem
ENV - Environmental Subsystem
GCS - Guiding Control Subsystem
IIF - Instrument Interface Subsystem
LSS - Logging Subsystem
MCS - Mount Control Subsystem
OSS - Optical Support Structure Subsystem
PCS - Pointing Control Subsystem
PMC - Primary Mirror Cell Subsystem
PSF - Point Spread Function Subsystem

The following subsystems have a left and a right component: AOS, GCS, PSF, PMC. As such, when these subsystems appear in on-line information, you will often see PMCL or PMCR for the left and right PMC subsystems (for example).

The TCS system should always be running as the user tcs on the mountain cluster machines.

Shutdown/Startup Using the TCS GUI

The TCSGUI (started from an eyebrow button, or command line) should be used to stop and start subsystems and GUIs in normal operation. Start the required subsystems first, and then the desired GUIs.
If something fails, or hangs, see Using Manual Tools on how to stop/restart manually.


TCSGUI-Annotated.jpg

Stopping the systems

As user telescope on any obs computer, do the following steps...

  1. Click the Stop all GUIs button on the TCSGUI to close all GUIs running on the cluster. Because the user tcs has sudo privileges all GUIs will be stopped.
  2. Click Stop all subsystems on the TCS GUI to stop all running subsystems. Wait for the command to complete. Check for any running subsystems by using the Status box Subsystem status button. When the "Subsystem status" button is clicked a window will pop up with the results of the operation. If any subsystems are listed, kill the subsystem using the TCSGUI. Check "Subsystem status" again and verify that no subsystems are running on any cluster computers. All subsystem boxes should be showing red background with text of "Stopped".
  3. Use the TCSGUI to stop the network servers on all cluster machines by clicking the Stop all button in the Network Servers box. A window will pop up for each computer as its servers are stopped. Double-check that no servers are running by using the Status box Status button with a computer name in the edit box to its right. If any network server will not stop, use killall -9 server_name on the appropriate computer. The TCSGUI background will turn blue.
  4. Close the TCSGUI.
  5. If the IRTC camera is being used, stop the "irs" service.
    Login to the irs machine as telescope.
%CODE{lang="bash"}% ssh telescope@irs sudo systemctl stop lbto-irs %ENDCODE%

Starting

When the systems have been completely shutdown, the TCS can be restarted
  1. Bring up a new TCS GUI on the obs machine as user telescope using the icon, or via the command line (TCSGUI).
  2. Select Start all from the Network Servers box in the upper left to start the network servers on all computers in the cluster. This operation also clears the contents of shared memory on each computer.
    A window will pop up as the servers are started on each computer. The window should show the three servers. The TCSGUI background should be gray. If it is yellow it is out of sync with the version of the TCS servers that were just started. This is a problem - notify software support.
  3. Pause for several seconds.
  4. Click Start to start the LSS, DDS, IIF, and ENV subsystems in the Subsystems box in the middle of the TCSGUI. The other subsystems should be started as required. If all the subsystems are needed, the Start all subsystems button will start them in the preferred order.
  5. If the IRTC is in use, start the "irs" service. Login to the irs machine (ssh telescope@irs) and type sudo systemctl start lbto-irs.
  6. If the tcs machines were rebooted, restart the TCS-to-big-displays web display application on tcs2 (see restart the web services)

MCSPU

MCSPU is separate from the TCS subsystems. See the notes on the Operators web for how to start up.

Note that If the rpcserver is not running on jet, the MCSPU system will not come up. To test if the rpc server is running on jet, log in to jet as user telescope and type

%CODE{lang="bash"}% rpcconfig -l %ENDCODE%

If it prints "rpc server not started", then you need to start it. To start the rpc server log in to jet as user telescope , and type

%CODE{lang="sh"}% rpcconfig start 192.168.18.170 %ENDCODE%

The IP argument should be the IP of the jet. To start the MCSPU software type "./gomcspu".

%CODE{lang="sh"}% ./gomcspu rpcconfig -l %ENDCODE%

The ./gomcspu command starts the MCSPU software and opens a copy of the Engineering Interface. It takes a few minutes to come up. If the mcspu is already running on the jet, "./gomcspu" will not start another copy. When MCSPU software is running, the -l option of the rpcconfig command should show you something like:

%CODE{lang="sh"}% [telescope@jet ~]$ rpcconfig -l ADDRESS #FUNCTIONS #ALIASES
---------- --------- 192.168.18.170 22 0 %ENDCODE%

To stop the mcspu, type "stopall" in the Engineering Interface window. It takes a minute to stop.

In very rare cases there can be 2 copies of MCSPU software running, which produces unpredictable behaviour. Use "ps aux | grep mcspu" to see if there is another copy of mcstemp running (the executable is a file called mcstemp). There should never be more than one mcspu running. If there is, kill it before starting mcspu again.

Full Manual Shutdown/Startup

The manual process should only be done by members of the software team. Operations personnel should use the TCSGUI for control of the TCS system.
In particular, there should be no need to start up the system manually instead of using the TCSGUI.

If you need to manually shutdown and startup the TCS systems, follow the procedure in the TCS Software Activation page, under "Using Manual Tools".

Restarting TCS on a Single Machine

If just one computer needs restarting (for instance, server after a reboot), you must first do a netconfig remove server on another cluster computer. This will clean up various status so all cluster members know the server is not in the cluster. Then on that computer (as user tcs )

%CODE{lang="sh"}% rm /var/tmp/*.conf gshmconfig -z netconfig start %ENDCODE%

The first line removes the TCS state files which keep track of where subsystems are running, and what RPCs are registered. The second line clears the reflective memory segment, and the third starts the TCS network servers. The -z operation is not strictly necessary but recommended. It isn't necessary because when a new computer joins an existing cluster, the entire contents of reflective memory are broadcast so the new system has a current set of data.

When a whole new TCS is started using the TCSGUI, the above two commands are performed on each computer in the cluster with a few seconds delay between each one.

DDViewer

There is also a GUI which allows one to view reflective memory in its entirety which can be useful for debugging purposes:

%CODE{lang="bash"}% DDViewer & %ENDCODE%

DDViewer.png

Revision History

Michele De La Peña 17 July 2006
Updated J. Hill 11 Feb 2007
Updated M. De La Peña 04 Dec 2007
Updated C. Biddick 01 Dec 2008
Updated N. Cushing 07 Oct 2009
Updated C. Biddick 08 Oct 2009
Updated C. Biddick 01 Mar 2010
Updated C. Biddick 10 Nov 2010
Updated, K. Summers July 2013
Updated, C. Biddick October 2015
Updated, C. Biddick October 2016
Updated, P. Kubánek October 2019
Topic attachments
I Attachment Action Size Date Who Comment
DDViewer.pngpng DDViewer.png manage 69 K 09 May 2014 - 15:28 UnknownUser Data dictionary browser screenshot
Topic revision: r50 - 11 Feb 2020, MatthieuBec
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback