File | Where | What |
---|---|---|
lbc.log |
on the CMU in /lbccontrol/current/log |
Main LBC log file - this is the file parsed by the GUI |
error_log access_log |
on the CMU in /var/log/httpd |
these are the web server logs - GUI problems can be found here |
lbcia.log |
on the tech Windows PCs in C:\lbcfpia\src |
output from the IDL procedures for image analysis this file is appended to with each run of the lbciaRun program; contains timestamps this file is copied to the CMU into /images/tftp/XTech/ Available from the links: Rlbcia.log and Blbcia.log |
blueYYYY-MM-DD.Log redYYYY-MM-DD.Log |
on mountain machine in /home/lbceng/FPIAlogs |
numbers, fits filenames and filter names |
AOParam.txt |
on the tech Windows PCs in D:\ao |
the active optics corrections calculated by the IA processing this file is copied to the CMU into /images/tftp/XTech/ Available from the links: RAOParam.txt and BAOParam.txt this data is also logged in the lbc.log as Active Optics Pnn |
lbciaRun.txt |
on the tech Windows PCs in D:\ (which is the LBCIA_HOME env var on the PC) |
output from the program lbciaRun.exe this file is appended to with each run of the lbciaRun program; does not currently contain timestamps |
messages |
on the CMU in /var/log |
The CMU system log - nfs problems, panics, IIF messages, etc. |
filters.log power.log housekeeping.log ... |
on the CMU, in any directory | these are output from the testxxxx programs |
hkblue.dat |
on the CMU in /lbccontrol/current/log |
Blue housekeeping data |
hkred.dat |
on the CMU in /lbccontrol/current/log |
Red housekeeping data |
lbckill/lbcstart
) and the power.conf
file is deleted, the GUI will reflect nothing ON, even if some components are powered-up. The software only checks the power status when a command is issued -- TURN ON/OFF
. If the component is already powered up, the software will see that when it tries to power it up. On the lbcstart
command, the software looks at the power.conf
file to determine what it should bring up automatically. If you delete the power.conf
file, it will not bring up anything and you have to TURN ON
via the GUI. If something failed during a TURN ON
command, the power.conf
file will reflect that component as disabled and the lbcstart
will not bring it up automatically.
The power.py
GUI does NOT refresh automatically. It is designed that way. It has a big Refresh button at the bottom. It IS very easy to turn something on/off that you may not want to, that is why it is just a troubleshooting/engineering tool.
The LBC software is stopped with the lbckill
command. Nothing is power-cycled. lbcstart
command. It will use the power.conf
file when it starts to restore the last logged power state. If this file is deleted, nothing is powered up automatically.TURN ON/OFF,
control power to the components, including the Windows PCs.
When troubleshooting a problem, you must go to the log display on the GUI. The errors are available in the GUI - you can ignore all of the Status/Note/Warning messages and just look at the Errors. If it is not a power problem (there is nothing noted about power in the log messages) you can likely use lbckill
, delete the power.conf
, and lbcstart
.
On the other hand, if you see power problems in the log during startup, you can use power.py
to visually see the power states to determine what is actually on/off. Or, if you want to manually power a single component -- like we have done with the shutter failures in the past, or when a tracker CCD is having problems or a particular filter wheel has a hardware problem.
If you cannot ping one of the dataprobes, then the LBC software will fail trying to power things up. The dataprobe may have to be manually power-cycled, via the electronics box. For example, here are pictures of the red electronics cabinet and the labeling on the red dataprobe power cables.
|
|
/home
disk is only 20GB. We keep the whole LBC release there which is large because of all of the documentation. It typically stays about 70 to 80% full because we keep multiple releases there too. We have seen it fill up on unusually large core files (see IT 4943). This appears to manifest itself as a hang of the GUI.
On the CMU (as user root
or lbccontrol
), check a df -h
command. If the /home
partition is full, check for a core.xxx
file. If you see a large one there (it would be in the /home/lbccontrol
directory on the old CMU, on the new CMU in /lbccontrol/current
), move it to /images
or delete it, if you cannot move it.
watchdog on StatusThread started a new instanceHousekeeping timeouts on both sensors (vacuum and temperature) on one side can cause things to eventually lock up. The software keeps re-trying and re-timing out:
2014/09/05 18:29:20.511874 W B HKEEPING VACUUM vacuum sensor timeout [src/housekeeping/housekeeping.c:1212] 2014/09/05 18:29:20.511952 W B HKEEPING TEMPERAT could not read temperature value [src/housekeeping/housekeeping.c:1379] 2014/09/05 18:29:20.511975 E B hinibit counter updated to 36 2014/09/05 18:29:20.511990 E B HKEEPING VACUUM pressure: 0.00E+00 [mbar] 2014/09/05 18:29:20.512033 W B HKEEPING VACUUM hinibit state raised due to pressure (0.00E+00) above 1.00E-03mbar threshold or fake reading [src/housekeeping /housekeeping.c:1404] 2014/09/05 18:29:20.512060 E B hinibit counter updated to 37 2014/09/05 18:29:20.512077 E B HKEEPING TEMPERAT temperature: 0.0 [Kelvin] 2014/09/05 18:29:20.512093 W B HKEEPING TEMPERAT hinibit state raised due to temperature ( 0.0) above 240.0K threshold or fake reading [src/housekeeping/housek eeping.c:1419] 2014/09/05 18:29:20.512181 W B HKEEPING subsystem reset start 2014/09/05 18:29:26.411589 W B HKEEPING close vacuum port 2014/09/05 18:29:26.411619 W B HKEEPING close temperature port 2014/09/05 18:29:39.570732 W - watchdog on StatusThread started a new instance 2014/09/05 18:29:42.590730 W B HKEEPING subsystem reset completedWe should take this out or fix it! But for now, it requires a restart (
lbckill/lbcstart
) or TURNOFF/TURNON.
lbckill
and lbcstart
should fix hanging status threads, etc. that can occur when we take an overall network hit.
For instance, when they did some network maintenance on 11-Dec-2013, we saw the following errors in the log:
2013/12/11 16:45:46.714904 W R HKEEPING VACUUM vacuum sensor timeout [src/housekeeping/housekeeping.c:1211] 2013/12/11 16:45:47.364932 W B HKEEPING VACUUM vacuum sensor timeout [src/housekeeping/housekeeping.c:1211] 2013/12/11 16:45:56.715227 W R HKEEPING TEMPERAT temperature sensor timeout [src/housekeeping/housekeeping.c:1299] 2013/12/11 16:45:56.715367 E R hinibit counter updated to 1 2013/12/11 16:45:56.715391 E R HKEEPING VACUUM pressure: 0.00E+00 [mbar] 2013/12/11 16:45:56.715424 W R HKEEPING VACUUM hinibit state raised due to pressure (0.00E+00) above 1.00E-03mbar threshold or fake reading [src/housekeeping/housekee ping.c:1403] 2013/12/11 16:45:56.715454 E R hinibit counter updated to 2 2013/12/11 16:45:56.715475 E R HKEEPING TEMPERAT temperature: 0.0 [Kelvin] 2013/12/11 16:45:56.715499 W R HKEEPING TEMPERAT hinibit state raised due to temperature ( 0.0) above 250.0K threshold or fake reading [src/housekeeping/housekeeping.c: 1418] 2013/12/11 16:45:56.715618 W R HKEEPING subsystem reset start ... 2013/12/11 16:45:58.499663 W - watchdog on StatusThread started a new instance 2013/12/11 16:46:18.516619 W - watchdog on StatusThread started a new instance 2013/12/11 16:46:38.547300 W - watchdog on StatusThread started a new instance 2013/12/11 16:46:52.917037 E R HKEEPING switching error rc:9 >unable to connect TCP socket< on "192.168.59.123:3,15" [src/power3/power3.c:292] 2013/12/11 16:46:52.917129 E R HKEEPING power cycle error with automatic retry [src/housekeeping/housekeeping.c:1187] ... 2013/12/11 16:47:49.118856 E B HKEEPING switching error rc:9 >unable to connect TCP socket< on "192.168.59.113:3,15" [src/power3/power3.c:292] ... 2013/12/11 16:49:58.732998 W - watchdog on StatusThread started a new instance 2013/12/11 16:50:18.743626 W - watchdog on StatusThread started a new instance 2013/12/11 16:50:38.754280 W - watchdog on StatusThread started a new instance 2013/12/11 16:50:58.764923 W - watchdog on StatusThread started a new instanceHousekeeping was not logging during this time. Everything came back ok with the
lbckill
/ lbcstart
.
2014/10/14 22:23:28.259913 W B HKEEPING not possible to connect to PortServer 2014/10/14 22:23:36.324235 W B HKEEPING not possible to connect to PortServer ...On the CMU, the ping to this device (192.168.59.112) came back with "Destination unreachable". It turned out that the switch had disabled the port because it was in an error state. To see the status of the switches in the LBC electronics boxes, log in to the switch and run the
show interface status
command:
> ssh lbc@swlbcr.network ****** LBTO personnel only -- Authorized access only ****** Password: swlbcr>show int status Port Name Status Vlan Duplex Speed Type Gi0/1 rportserver.lbc connected 159 a-full a-100 10/100/1000BaseTX Gi0/2 rdataprobe1.lbc connected 159 a-half a-10 10/100/1000BaseTX Gi0/3 rdataprobe2.lbc connected 159 a-half a-10 10/100/1000BaseTX Gi0/4 disabled 1 auto auto 10/100/1000BaseTX Gi0/5 disabled 1 auto auto 10/100/1000BaseTX Gi0/6 disabled 1 auto auto 10/100/1000BaseTX Gi0/7 [LBC-diag] notconnect 159 auto auto 10/100/1000BaseTX Gi0/8 [DHCP] notconnect 149 auto auto 10/100/1000BaseTX Gi0/9 notconnect 1 auto auto Not Present Gi0/10 Uplink to core connected trunk a-full a-1000 1000BaseSX SFP swlbcr>On the blue side, the old switch has been installed, so the output is a little different:
> ssh lbc@swlbcb.network lbc@swlbcb.network's password: ** ---> Unauthorized Access is Strictly Forbidden <--- ** swlbcb>show int status Port Name Status Vlan Duplex Speed Type Fa0/1 bportserver.lbc connected 159 a-half a-100 10/100BaseTX Fa0/2 notconnect 159 auto auto 10/100BaseTX Fa0/3 bdataprobe1.lbc connected 159 a-half a-10 10/100BaseTX Fa0/4 notconnect 159 auto auto 10/100BaseTX Fa0/5 bdataprobe2.lbc connected 159 a-half a-10 10/100BaseTX Fa0/6 notconnect 159 auto auto 10/100BaseTX Fa0/7 notconnect 159 auto auto 10/100BaseTX Fa0/8 notconnect 159 auto auto 10/100BaseTX Fa0/9 notconnect 159 auto auto 10/100BaseTX Fa0/10 notconnect 159 auto auto 10/100BaseTX Fa0/11 notconnect 159 auto auto 10/100BaseTX Fa0/12 notconnect 159 auto auto 10/100BaseTX Fa0/13 notconnect 159 auto auto 10/100BaseTX Fa0/14 notconnect 159 auto auto 10/100BaseTX Fa0/15 notconnect 159 auto auto 10/100BaseTX Fa0/16 notconnect 159 auto auto 10/100BaseTX Fa0/17 notconnect 159 auto auto 10/100BaseTX Fa0/18 notconnect 159 auto auto 10/100BaseTX Fa0/19 notconnect 159 auto auto 10/100BaseTX Fa0/20 notconnect 159 auto auto 10/100BaseTX Fa0/21 notconnect 159 auto auto 10/100BaseTX Fa0/22 notconnect 159 auto auto 10/100BaseTX Fa0/23 notconnect 159 auto auto 10/100BaseTX Fa0/24 notconnect 159 auto auto 10/100BaseTX Gi0/1 connected trunk a-full a-1000 1000BaseSX Gi0/2 notconnect 1 auto auto unknown swlbcb>If you see a port that should be up, but is disabled, contact the IT group and get it reset.
lbckill
as telescope on the CMU)
power.py
GUI or via testpower3
root
on the CMU to connect to the serial ports.
Blue Port | Description | Red Port | Protocol |
---|---|---|---|
/dev/ttyB00 | Housekeeping Vacuum | /dev/ttyR00 | 9600 8N1 |
/dev/ttyB01 | Housekeeping Thermometer | /dev/ttyR01 | 2400 8N1 |
/dev/ttyB02 | Balluff | /dev/ttyR02 | |
/dev/ttyB03 | Filter Wheel 1 (lower) closest to primary |
/dev/ttyR03 | 9600 8N1 |
/dev/ttyB04 | Filter Wheel 2 (upper) closest to dewar |
/dev/ttyR04 | 9600 8N1 |
/dev/ttyB05 | Rotator | /dev/ttyR05 | |
/dev/ttyB06 | Rotator Backlash Recovery | /dev/ttyR06 | |
/dev/ttyB07 | Goya | /dev/ttyR07 | 38400 8N1 |
/dev/ttyB08 | Shutter | /dev/ttyR08 | 38400 8N1 |
/dev/ttyB09 | Focuser (L2 motion) | /dev/ttyR09 | 9600 8N1 |
/dev/ttyB10 | Mitutoyo (L2 readout) | /dev/ttyR10 | 9600 8N1 |
-s
(settings) option minicom -s
, you will not get a failure if it was not left in good configuration
ctrl/A o
allows you to set the serial port configuration
ctrl/A e
in minicom to enable echoing of commands and answers to the screen
<ENTER>
and type it again
DOFPIA
program.
See http://wiki.lbto.org/bin/view/Software/LBCSoftwareDescription#IDL_Code for descriptions of the main functions.
The IDL code used by the real-time system is also installed in /home/lbcobs/LBCFPIA/lbcfpia/src
. You need to be logged in to a machine with access to the IDL programs and the FITS files. This example assumes you log in as lbceng
to one of the mountain OBS machines. You have to have a tec
config file local for the examples below.
setenv IDL_DIR '/lbt/astronomy/idl' setenv IDL_PATH '+/lbt/astronomy/idl/lib:+/lbt/astronomy/stow/astron/astron:+/home/lbcobs/LBCFPIA/lbcfpia/src:+/home/lbcobs/LBCFPIA/lbcfpia/lib/mpfit' cp /home/lbcobs/LBCFPIA/lbcfpia/src/lbcfpia_tec.cfg . idlUse a specific dated directory to find files, use a time that you know will find some files that have dates greater than the input date.
IDL> dotecia, 1, 'r', 100.00, datadir='/Repository/20141210/tec', today='20141210', cur_time=23411To give it a list of files (make sure you have the updated
lbcfpia.pro
for this option):
IDL> dotecia, 1, 'r', 100.00, datadir='/Repository/20141210/tec', fileList='lbcrtec.20141210.015831_2.fits lbcrtec.20141210.015818_2.fits'it uses datadir to get the files, so don't put the full path. It writes to a local
lbcia.log
file.
telescope
, lbccontrol
or root
lbckill
in the CMU window
poweroff
in the CMU window
cmuSystem.tar
in /home/ksummers/lbc
in Tucson. This file is a tar file containing compressed snapshots of the system filesystems of the cmu machine, suitable for restoration with restore
. It does not include the /lbccontrol
and /images
partitions - these can be recreated from SVN.
/lbccontrol/version
would be built from SVN according to: Software/LBCBuildAndInstall
/images
files can be copied from /lbccontrol/version/src/windowspc/images
I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|
![]() |
LBCArchitecture.jpg | manage | 3 MB | 04 Dec 2013 - 15:07 | UnknownUser | LBC Architecture |
![]() |
LBCArchitecture.png | manage | 85 K | 03 Dec 2013 - 23:59 | UnknownUser | LBC Architecture |
![]() |
LBCArchitecture.tif | manage | 2 MB | 04 Dec 2013 - 15:04 | UnknownUser | |
![]() |
RedDataProbePowerCables-sm.JPG | manage | 192 K | 19 Sep 2016 - 16:30 | UnknownUser | Red dataprobe power cables |
![]() |
RedElectronicsBox-bottom-sm.JPG | manage | 420 K | 19 Sep 2016 - 16:30 | UnknownUser | Bottom of the red electronics cabinet, where dataprobes are powered |