General TCS Troubleshooting Tips

Startup

  • All subsystems should be started from the TCSGUI as user tcs. Either invoke the TCSGUI on a cluster computer, or invoke it from an obs computer as user telescope.
  • All subsystems log to the directory /lbt/log , events are written to current.events and log messages to current.log
  • Some subsystems are configured (in the tcs.conf file) to run on particular servers
    See the values like start_on map<string,string> start_on IIF:tcs2 DDS:tcs2
    But they can be forced to start on any host by typing the hostname into the box on the GUI before hitting the start button.
  • When subsystems are started normally, stdout and stderr point to the terminal that started the networkserver which may no longer be available.
    When a subsystem is started on local, both stdout and stderr are available and attached to the controlling terminal. The TCSGUI cannot start a subsystem on local; do that with netconfig start sub_name on local on a cluster computer.

Config Files

  • Subsystem public configuration files are located in /home/telescope/TCS/Configuration/<SUBSYSTEM> - These are the files which have been "delivered" to the Science staff; they modify them as necessary
  • A semi-private configuration file is <subsystem>.conf (e.g., pcs.conf). These subsystem files are part of the build and located in /lbt/tcs/current/<subsystem>/etc .
  • Any other private configuration files are part of the build and are located in /lbt/tcs/current/<subsystem>/configuration.
  • The location of the files used by the TCS are found in the <subsystem>.conf files or the /lbt/tcs/current/tcs/etc/tcs.conf file.
In some cases the configuration area will have some of the same files as located in the telescope TCS area. However, folks should NOT modify these files in general. At least in the case of PCS, these are NOT the files actually used by the TCS.

Network Servers

rpcconfig
rpcconfig is the interface to the rpcserver process which is the name server for the TCS RPC system. Use rpcconfig to see all the registered methods available on a particular machine.
We have had instances of a subsystem being up and running, but we have lost some or all of its functions. In that case, the subsystems running on that server must be restarted after stopping/starting the rpcserver process.
Usage: rpcconfig command [parameter]
Configure the rpc server

Command                 Description
-------                 -----------
help, -h                Print usage information
start address [warm|passive] [passive|warm]     Start the server
stop address            Stop the server
list, -l                List all parameters
functions, -f [address] List all registered functions
alias, -a [address]     List all aliases 

Use this command on jet to start the rpcserver before starting the MCSPU software: rpcconfig start 192.168.18.170

netconfig
netconfig is the interface to the networkserver process. netconfig provides control of all the subsystems and servers. If you are manually starting/stopping the TCS, use this command. The ps, -l, and -s options are useful to see what's going on. Note that start and stop also start/stop rpcserver and gshmserver, while start/stop all will start/stop all the subsystems on the cluster. The order for starting/stopping all the subsystems is determined by the tcs.conf variable subsystems.

[telescope@tcs2 ~]$ netconfig -h

Usage: netconfig start [left | right] (subsystem | all) [(on | -o) host] [(parameter | -p) param]
Usage: netconfig stop (subsystem | all | local)
Usage: netconfig kill subsystem
Usage: netconfig start syslogserver
Usage: netconfig stop syslogserver

Command                 Description
-------                 -----------
ps                      Show all known TCS processes
top                     Run 'top' to show all known TCS processes
help, -h                Print usage information
version, -v             Print version information
cksum, -k               Print check sum information
config, -c              Print configuration data
list, -l                List all current subsystems
servers, -s             List all current servers
xml, -x                 List all current servers as xml data
delete, -d              Delete the shared memory segment
zero, -z                Zero the shared memory segement
ZERO, -Z                Zero the shared memory segement even if in use
start [active|passive]  Start the server
stop                    Stop the server
status                  Show internal state
remove <server>         Remove server <server> (use with caution!)

gshmconfig
gshmconfig is the interface to the gshmserver process. Use this command to configure and control the reflective memory. See Software/SharedMemoryProblems#gshmconfig

syslogserver This process is the 'listener' that receives syslog output directed to facility local6 and actually writes the log file. It only runs on one machine in the cluster, determined by the DNS entry tcslog, and the environmental variable TCSSYSLOGRUNON.

Telemetry

The most likely cause of telemetry errors will be directory creation or changing the group on the files. Therefore, Telemetry complaints almost always will be at startup or rollover.

The /lbt/telemetry_data file system should be set up with the tcs subdirectory having group write permission, with the group being telemetry. On jet, if the chgrp fails when creating a new file, the group will be domain users. If you see any telemetry directories or files created with this group, it is likely a failure. Delete the file and restart the subsystem.

sample mcspu log complaints:
  Mar 10 17:07:31.731: EL telem: Exception in Sample_buf constructor! 'step2' : Failed to open file for stream servo_data because unable to create file 
Topic revision: r17 - 22 Aug 2019, PetrKubanek
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback