Software Application-level Checkouts After Power Up

After infrastructure, computers, TCS, and instruments are restarted following a power outage and the basic functionality associated with these items has already been verified, the following additional items should also be checked by TCS/Software personnel.

General

  • All mount points can be checked with "df" and using the reference informationTechnology/MountainNFSMounts.
  • Not all telemetry streams are necessarily collecting data under all circumstances. The notes below suggest the particular stream or place to check in order to ensure telemetry is being written for that software component. If you are checking the actual HDF files, you may have to wait a bit for the buffered data to be written to the file. You can check the file size, or view the actual data stream quickly by using "hdfview-2.9.sh nameOfTelemetryFile". Click on the stream name in the left frame (e.g., someData_01) which will display the data in tabular format.

Web Clusters

  • The web clusters should start on their own (both Tucson and mountain webs are clusters).
    Check the status of the cluster by running: sudo pcs cluster status on the machine.
[web@web ~]$ sudo pcs cluster status
Cluster Status:
 Last updated: Wed Aug  2 15:12:36 2017         Last change: Fri Jul 21 20:31:02 2017 by root via cibadmin on web1.mountain.lbto.org
 Stack: corosync
 Current DC: web1.mountain.lbto.org (version 1.1.13-a14efad) - partition with quorum
 2 nodes and 3 resources configured
 Online: [ web1.mountain.lbto.org web2.mountain.lbto.org ]

PCSD Status:
  web1.mountain.lbto.org: Online
  web2.mountain.lbto.org: Online

MCSPU

  • Logon to jet.mountain.lbto.org as user telescope and make sure the mcstemp software is running via "ps -ef | grep mcs".
  • Bring up an Engineering GUI by typing "./mcsDisplay/mcsDisplay"- make sure your terminal window is at least 55 lines long. If the I/O Loops indicator at the top of the POS page (azimuth and elevation coordinate display) is incrementing, and there are no negative messages regarding the DSPs in the message area, then things should be running fine.
  • Make sure the mcspu is writing telemetry. On the above POS page, the telemetry label (about two-thirds the way down the display) should have incrementing values for azimuth and elevation. You can also check /lbt/data/telemetry/tcs/mcspu from an obs# machine - streams "az.servo_data" or "el.servo_data".
  • Close the Engineering GUI by typing "stop" and log off. If you want to continue to use the terminal window, you may have to type "stty sane" to get the display to act in a normal fashion again.

TCS

  • Check all the disk mounts are available.
  • Make sure the subsystems are writing telemetry (check /lbt/data/telemetry/tcs/SUB from an obs# machine) where SUB = subsystem name.
  • Make sure the PMC vxworks crates are booting correctly
    There are notes on /lbt/vxworks/ mounted on obs machines that have the boot parameters, boot script for these machines
    They may have to be powered up manually
    If they don't boot, it could be the FTP server is not available on the boot host (as of May-2016, the host was web.mountain.lbto.org ) - check that SE Linux is disabled.
  • dstats on the TCS machines should be running on reboot, but top is not a service, it's a cron job that gets restarted at 00UTC. Start it if you like by hand: /home/tcs/bin/top.sh > /dev/null 2>&1

Virtual Machines

  • Logon vm2.mountain.lbto.org as root.
  • Run command "vim-cmd vmsvc/getallvms" -- you are interested in "mountainapp1", "linuxapps" and "mt-archive". Remember their vmid column.
  • For each of the *vmid*s you found above run "vim-cmd vmsvc/power.getstate *vmid*".
  • If the power reports as off, then run "vim-cmd vmsvc/power.on *vmid*".

AGW

  • Make sure the appropriate AIP AGW units are started with startAGW -u n. This powers the UMAC controller as well as the cameras. Check the UMAC controllers are up: getdata -u n should return various status information about the AGW. If nothing works check the oac computer and make sure the drivers are loaded and the oacserver service is running. See Commissioning/AGwStages. oacontrol AGW unit numbers are
    • 1 - left front (LUCI1)
    • 2 - right front (LUCI2)
    • 3 - left direct (PEPSIPOL1)
    • 4 - right direct (PEPSIPOL2)
    • 7 - left PEPSIPFU
    • 8 - right PEPSIPFU

  • AGWs 3,4,7,8 azcamserver computers are powered on with the startAGW command.
    Note: It is known that the AGWs must first be restarted and then reboot the AzCam server (AGW7-CAM and AGW8-CAM) or the cameras will not work properly.
    Is that really true?
  • AGWs 1,2,5,6 azcamserver computers should power up with the automated startup.
  • Make sure the AGW cameras are working and there is communication with GCS. Do this by selecting the desired AGW and using the selectAGW, readGuideCam, readWFSCam (see Commissioning/RunningGCS) commands to take images with both the Guide and WFS cameras. View the images using ds9. Note: selecting an AGW moves internal components in the AGW so there is a need to make sure it is OK and safe to select the particular AGW. Do this for both LUCI and MODS.

OBS / Remote OBS Machines

  • Check the /Repository (/lbt/data/repository) and /newdata mounts are correct.

IRS

  • If the IRTC instrument is on the telescope, the IRS service must be running on irs. Information about the IRS is found in /lbt/irtc/current/etc.
  • When the IRTC instrument is on the telescope, /newdata must be mounted on irs.

IOC, ALH

  • Verify that the two IOCs and the EPICS gateway are running on ioc
    ssh tcs@ioc.mountain.lbto.org
    [tcs@tcs1 ~]$ /etc/init.d/epics-ioc status
    tcsioc is running; itioc is running                        [  OK  ]
    [tcs@tcs1 ~]$ ps -ef | grep gateway
    ioc      39187     1  0 Sep08 ?        00:00:00 /lbt/epics/bin/linux-x86_64/gateway -log /lbt/data/logs/alh/gateway.log ...
    ioc      39188 39187  0 Sep08 ?        01:45:14 /lbt/epics/bin/linux-x86_64/gateway -log /lbt/data/logs/alh/gateway.log ... 
  • Check that the IOCs are writing to the /lbt/log/current.log - the IOCs run on tcs1 and the syslog daemon there must have the correct configuration for local6.
  • If you want to restart the TO's ALH running on obs1:
    ssh -X telescope@obs1.mountain.lbto.org
    setenv DISPLAY :0
    ALH & 
  • Need to initialize channels that are only written periodically - that applies to the MODS instrument channels and the IT channels. There are scripts in /home/telescope/bin to initialize these channels.
    ssh telescope@obs3.mountain.lbto.org
    emailMODSErrors.sh
    emailMODS2Errors.sh
    setITChannels.sh 
  • For more info, see the notes at the bottom of: Software/AlarmHandlerBuildInstall64

Big Displays

ssh tcs@tcs1.mountain.lbto.org
[tcs@tcs1 ~]$ ps -ef | grep wda
tcs      46549     1  0 Jun19 ?        01:03:39 /bin/bash /home/tcs/bin/wda.sh
  • Make sure all of the big displays are running by using "vncviewer" to look at all of the displays. The current IP addresses for the displays are found here http://info.mountain.lbto.org/displays/displays.php. When the big displays are rebooted, they automatically load and run the latest version of the designated GUI software.

OVMS

  • Make sure the mounts are OK
    disk.mountain.lbto.org:/FILESYSTEMS/tel-hdf      .... 89% /lbt/telemetry_data/ovms
    disk.mountain.lbto.org:/FILESYSTEMS/lbt/i386/UT  ...  51% /lbt/UT 
  • See Software/OVMSSoftwareTroubleshooting
    • If the telescope has power, make sure the UEI software is running on the racktangle:
      1. check graphs: http://statserv.lbto.org/indi_ao/www/ovms2.html
      2. if graphs NOT ok, log on to the UEI and check, and run ovmsSniffer on the ovms machine
    • Resume telemetry ( ovmsTelemetryClient -r ) and make sure it writes a file to /lbt/telemetry_data/ovms/YYYY/MM/DD/. You can check the telemetry file from any obs# machine, but the resume of telemetry has to be done from the ovms machine.

DIMM

  • Make sure the mounts are correct - DIMM writes to telemetry on /lbt/telemetry_data/tcs/dimm
    Execute mount -a as root if it is not there.
  • Run DIMM (if there are no objections on the mountain - DIMM does move when the mount connects) and connect to the mount, camera, focuser
    (see Starting up the DIMM control software)

Weather

  • If the power has been off, restart the weather station and AllSky applications
    (see Software/WeatherStationSoftware).
  • If the data doesn't come up when the software starts, try rebooting the PC - especially if it's both front and rear data that doesn't show up.

Archive

  • See Archive Software Startup Procedure for how to make sure the mountain and Tucson archive software is up and running.
    Keep in mind that the Tucson software MUST be restarted AFTER the mountain software is restarted.
  • Email to archive@lbto.org to ask Cristina and company to make sure the archive software is happy.
  • Take an image with an instrument and make sure it gets to newdata and then pushed to Repository
    maybe multiple instruments to check the mounts

Misc

  • check mounts on ssh.mountain.lbto.org , if SAN didn't mount correctly, the telemetry rsync's will fail
  • tcs1 is required for ALH
  • ping all the Windows machines for AzCam (see AzCamServer)
  • try an image from the guide and WFS cameras (see readGuideCam and readWFSCam notes here: Commissioning/RunningGCS) that are needed
I Attachment Action Size Date Who Comment
IMG_20150609_124635.jpgjpg IMG_20150609_124635.jpg manage 5 MB 15 Jul 2015 - 16:36 UnknownUser OBS1 boot up diagnostics screenshot
IMG_20150609_124755.jpgjpg IMG_20150609_124755.jpg manage 2 MB 15 Jul 2015 - 16:28 UnknownUser OBS1 boot failure screenshot
Topic revision: r42 - 21 Aug 2019, PetrKubanek
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback