Operating the AIP AGw Units and oacontrol
See
MountainOperations/AGWCheatSheet for quick reference.
The following only applies to AIP AGWs. The MODS AGWs have a different interface - see
Commissioning/MODSAGWControl
See
AzCamServer for information about the guider and WFS cameras inside the AGWs.
Introduction
This describes the operation of oacontrol Version 4.3 as of May, 2015, which runs on the computer
oac (32 bit CentOS 6.5). oacontrol includes the oacserver, the client interface library, and the command line clients. Both the one-wire devices and the UMAC are managed by the oacserver. Clients for the oacserver include both GCS subsystems, and the command line operations discussed here. The gcs.conf configuration file specifies for the GCS the computer running oacserver. The command line clients use the environmental variable OACONTROLHOST for the same purpose. The command line clients can be used on any obs computer as well as the oac computer. It is only necessary to log into the oac computer to start/stop the oacserver.
Starting in the summer of 2014, the oacserver was extensively reworked to insure that errors were properly reported and timeouts were reasonable. Errors can happen if the oacserver cannot be reached, the UMAC cannot be reached, the one-wire file system is not working, and some rare internal errors inside the oacserver. The one-wire file system management has also been reworked, making starting/stopping it more reliable and faster. The
oac computer should never need to be rebooted. However, having said that, there are still times when the one-wire file system is so scrambled that a reboot is necessary. This may happen when an AGW is disconnected from the network for an extended period of time and then reconnected.
(N.B. There is an engineering project to replace the one-wire file system devices with another scheme; we hope it will work better.)
Configuration
The oacontrol installation is in
/lbt/oacontrol/current/ on
oac. The configuration file for the oacserver is
/lbt/oacontrol/current/etc/oacontrol.conf.
Currently all six AGWs are active: AGW1, AGW2, AGW3, AGW4, AGW7 and AGW8.
If oacontrol.conf is modified, it can be re-read without stopping/starting the oacserver with the command
rdwrconfig -u n **NOTE** The appropriate GCS side must be restarted for the new configuration to be used by GCS.
where n is the AGW unit number of interest. Setting the IP addresses of the Moxa and the UMAC to the special string "0.0.0.0" will disable that AGW: the oacserver will not try to communicate with it.
The configuration file for the Moxa npreal2 driver is
/lbt/oacontrol/current/etc/npreal2d.cf This has to be modified when AGWs are added or removed from the telescope or if the network configuration changes. After modification the oacserver and device drivers need to be restarted.
Note that both npreal2d.cf and oacontrol.conf have the Moxa terminal server IP address. If any network changes are made, both files have to be updated.
Unit numbers and IP addressing
The AGW units are specified by number, and there are associated IP addresses for each number. Note that units 5 and 6 are not managed by oacserver. The following table show the normal addressing assignments:
Location |
AGW unit |
MOXA IP address |
UMAC IP Address |
Left Front (LUCI 1) |
1 |
192.168.2.11 |
192.168.2.21 |
Right Front (LUCI 2) |
2 |
192.168.2.12 |
192.168.2.22 |
Left Direct (PEPSIPOL 1) |
3 |
192.168.2.13 |
192.168.2.23 |
Right Direct (PEPSIPOL 2) |
4 |
192.168.2.14 |
192.168.2.24 |
Left Direct (MODS 1) |
5 |
Managed by MODS |
Right Direct (MODS 2) |
6 |
Managed by MODS |
Left Fiber (PEPSIPFU 1) |
7 |
192.168.2.17 |
192.168.2.27 |
Right Fiber (PEPSIPFU 2) |
8 |
192.168.2.18 |
192.168.2.28 |
ERROR RECOVERY that needs intervention by operations staff
The oacontrol log file on the
oac machine is
/var/log/oacontrol.log.
The npreal log file on the
oac machine is
/var/log/npreal2d.log.
RPC error
This class of errors means the oacserver is not running, is hung, the oac computer is not up or not on the network, the client is talking to the wrong computer, the UMAC is seriously confused, or it is disabled in oacontrol.conf. If everything just mentioned is OK you will not get an RPC error. The steps to resolve RPC errors are
- Make sure you can ping the correct computer from the machine issuing the commands
ping oac
If the ping fails, contact IT support.
- Log onto the oac computer (oac@oac) and verify the oacserver is running (see below). If it is not, start the oacserver.
- If a GCS is having problems, verify the gcs.conf file is pointing to the oac computer: GCS[L|R].oacontrol_IP_AGW variables should be oac. If you have to change the value, restart both GCS subsystems.
- Verify the command line clients are using the correct computer
env | grep OACONTROLHOST OACONTROLHOST=oac
If OACONTROLHOST is not correct notify the software group. If you are comfortable with Linux you can set the environmental variable OACONTROLHOST to the value oac and retry the command. Also note that all command line commands accept a -s host option which can be used to override the default computer. So getdata -s oac -u 1 will force the command to use computer oac.
- Ping the Moxa IP (see above for IP addresses). If the ping fails, check oacontrol.conf for a moxaip address of "0.0.0.0" which means the unit is intentionally disabled. If the address is OK, contact IT support. If the address is "0.0.0.0" and you expect the unit to be available, both oacontrol.conf and npreal2d.cf need to be changed. Contact the software group.
- If the Moxa ping succeeded, ping the UMAC IP (see above for IP addresses). If the ping fails, check oacontrol.conf for an ip address of "0.0.0.0" which means the unit is intentionally disabled. If the address is OK, start the AGW. If the errors continue contact the software group. Note: if the startAGW continues to fail, the oac computer may need to be rebooted. If the address is "0.0.0.0" and you expect the unit to be available contact the software group.
NOTE: if a UMAC command fails with RPC Timed out, and the above conditions are all met, the UMAC in the AGW is in a comatose state where it won't accept commands even though it can be pinged. The solution is to power down the unit and wait for several seconds before powering it back up. Use the stopAGW/startAGW commands to power cycle the unit and the home command. REMEMBER, if you power cycle the AGW unit that was in use with GCS, you have to restart the GCS before it will reconnect to the AGW unit.
UMAC error
This error means the oacserver is OK and the command communicated properly with it but it cannot communicate with the UMAC processor. Either the AGW is not connected to the network, the network is not properly configured, the AGW is not powered on, or the UMAC processor is confused. For the latter two case the AGW must be power cycled and the stages homed (see
stopAGW,
startAGW, and
home below). For the other cases contact IT support.
If you receive a UMAC error following a "getdata -u N" command and after power cycling the unit's power, do the following:
while true; do sudo netstat -alpn | grep 1025 ; done
As soon as you seeen a connection made to the UMAC IP address, run the "startAGW -u N" command.
LINUX system error on oac
If you get the error message:
startAGW: init() failed AGW3: LINUX system error on oac the OSA wisdom is that you need to reboot the oac machine (sudo reboot). After it comes back on, wait 5 minutes longer than you think you need to in order for the processes to connect to all the AGWs. See IT 6657.
Failed to communicate with MOXA on localhost
This error is more rare, so not sure that we have the best solution. Try rebooting oac first. If that doesn't work, you may need a hard power cycle of the AGW unit (unplugging the cable).
Starting/stopping/checking the oacserver
Log onto the oac computer from an obs computer as user telescope
ssh oac@oac No password is required. Check the current status
sudo /etc/init.d/oac status If the device drivers are loaded and the oacserver is running you should see (the 12xxx numbers (PIDs) will be different)
npreal2 module is loaded.
12630 /lbt/oacontrol/npreal2-1.18/npreal2d -t 1.
There are 6 OWFS mount-points.
oacserver is 12655 /lbt/oacontrol/current/sbin/oacserver.
6 OWFS mount-points is the current number (January 2017).
If this is not what you see, stop the oacserver
sudo /etc/init.d/oac stop wait 10 seconds (or so)
sudo /etc/init.d/oac status and you should see
npreal2 module is not loaded.
npreal2d not running.
There are 0 OWFS mount-points.
oacserver is not running.
If this is not what you see, try the stop operation again. If you can not get everything stopped contact IT support. Once everything is stopped
sudo /etc/init.d/oac start wait 10 seconds (or so) and verify with the status commanFailed to communicate with MOXA on localhostd described above. After stopping and restarting the oacserver it may be necessary to restart the GCS as its oacontrol library might have lost connection to the oacserver.
The
oac stop command stops the oacserver, dismounts the one-wire file systems, and unloads the device drivers. The
oac start command loads all the necessary drivers, mounts the one-wire file systems, and then starts the oacserver. After starting the oacserver, you
may need to stop/start each AGW and home its stages. First, check if the AGW units are responding with the
getdata command. If the command returns with data, no further action is needed for that AGW. If the command returns a UMAC error, the AGW must be stopped and started. Issue the
stopAGW command, wait a few seconds, issue the
startAGW command, wait a few seconds, and try the
getdata command again. Sometimes it takes several more seconds before the UMAC will respond. If the UMAC error persists, try stopping/starting the AGW one more time, and if that fails contact the software group. Once the getdata commands returns data, the stages must be homed (GCS will not home it unless it had to turn it on). If you have power cycled an AGW a GCS was using you should restart that GCS.
A
stopAGW/startAGW command also cycles power / exrestarts the camera controllers but not the Windows PCs that run the AzCam processes and the frame readout cards in them. So, in the rare event of a failing AzCam process, Windows Camera Controller Computer (computer room B, rack #13) with the frame grabber card in them and AzCam server (computer room rack #4) have to be restarted manually. After the computer room computers reboot, go back to the control room, try to ping the machines you rebooted and when this works out fine, restart the GCS on the side you had trouble with.
Turning on the AGW unit
In a terminal window on any
CentOS obs computer (on Fedora, ssh oac@oac)
startAGW -u n
AGW n started
Where n specifies the AGW. This command requests the Moxa Terminal Server to turn the power on inside the AGW unit, specifically to power the camera controllers and the UMAC motor controller computer. If this command fails repeatedly with an RPC error the
oac computer may need to be rebooted.
Note: All AGW commands require the
-u n parameter. See above for unit assignments.
Homing the stages
To home the stages, you will use the command
home. To see the possible parameters, type
home -h
usage: home [-s <host>] -u <UMAC> -m motors
motors: linked by OR
1 for Motors Y & X, 4 for Motor F, 16 for wheel
to home all motors use 21
For AGWs 1,2,3,4 use -m 21. For AGWs 7,8 use -m 20. (The PEPSI PFU AGWs do not have an X/Y stage)
Turning off the AGW unit
To power down an AGW, do
stopAGW -u n
AGW n stopped
This turns off the power in the AGW.
Engineering Operations
getdata -u [AGW]
-p getxy -u [AGW]
setxy -u [AGW]
-x [X]
-y [Y]
getfilter -u [AGW]
setfilter -u [AGW]
-f [FILTER]
getfocus -u [AGW]
setfocus -u [AGW]
-p [FOCUS]
getposition -u [AGW]
-m [motor]
For each command you can get the usage by adding the parameter -h
$
getdata -h
usage: getdata [-s <host>] -u <UMAC> [-o <options> -p -h]
The p option only checks for power on.
$
getdata -u 1
general state: 0x201
general errors: 0
THETA state: 0x3
THETA errors: 0
R state: 0x3
R errors: 0
FOC state: 0x3
FOC errors: 0
FIL state: 0x3
FIL errors: 0
UMAC Temp: 24 degC
Ambient Temp: 7 degC
UMAC Hum: 26 %RH
Home Status: 0x17
Power Status: 1
OAC Version: 4.3
Date: Fri, 31 Aug 2018 21:39:14 -0700
From: Dan Cox <dcox@lbto.org>
Starting and stopping the AGW works the same way on both original and modified
AGWs.
The ‘getdata’ command is a bit different with the new AGWs, in that you add an extra
‘-n’ switch, i.e. getdata -u4 -n . getdata dumps out a bunch of temperatures and
voltages along with the important “HK Power Status”, 0 or 1. When the AGW is powered off,
that status will be 0 and the +- 15V output will also be 0, as a confirmation that power
is off.
$
getxy -u 1 to get the guide probe coordinates in mm.
The coordinates of the center of the field are approximately 0.0, 612.5 mm.
$
getxy -u 1
Guide probe position
x: 0.000000
y: 425.001000
$
getxy -u 1 -o 16 to get the answer in encoder coordinates rather than x/y.
Note that the x-coordinate reported, is the encoder coordinate of the ROT axis relative to the homeoffset configuration value, and is NOT an absolute encoder reading.
$
getxy -u 1 -o 16
Guide probe position
x: 0.000000
y: 32649.000000
$
setxy -h
usage: setxy [-s <host>] -u <UMAC> [-o <options> -v <velocity> -n] -x <x coordinate> -y <y coordinate>
$
setxy -u 1 -x 10.0 -y 410.0
Command successfully transmitted
$
getfocus -u 1
Focus position: 10.000000
$
setfocus -h
usage: setfocus [-s <host>] -u <UMAC> [-o <options> -v <velocity> -n] -p <position>
$
setfocus -u 1 -p 10.0
Command successfully transmitted
Note that moving the AGW focus stage in the + direction requires a focus correction of -Z4 to refocus the telescope.
$
getfilter -u 1
Filter number: 2
$
setfilter -h
usage: setfilter [-s <host>] -u <UMAC> [-o <options> -v <velocity>] -f <filter>
$
setfilter -u 1 -f 3 (possible 0-4)
Command successfully transmitted
Note: oacontrol is zero-based, so the filter numbering starts with filter 0. GCS however is one-based, so the corresponding filter for oacontrol.filter 0 is GCS.filter 1.
The default filter to be used for each AGw is defined in the corresponding public, instrument-specific GCS configuration file (e.g., LUCI_R.cfg).
$
getposition -h
usage: getposition [-s <host>] -u <UMAC> [-x winkel_x -y winkel_y -f file -c -t timeout_sec] -m <motor_no>
where "winkel" means "angle". This runs until stopped, gives encoder values and can write them to a file. The purpose of the angles is not known. Be careful using this command or leaving it running unattended as it is capable of overwhelming the oacserver with position requests and can therefore slow down any other request like the ones sent by GCS.
Testing basic connectivity
If the mountain network configuration is changed, first check the accessibility for related IPs. In this case, logon as oac@oac on the mountain and ping the hardware from there. As the unit could be powered off, the only reliable IP address to ping is the MOXA which controls the power to the rest of the unit. DNS entries for the AGw machines, MOXAs and PMACs (UMACs) are set up so you don't need to know IP addresses. The names are
AGW1 - AIP Left Front Bent Gregorian
agw1-moxa 192.168.2.11 (in the Agw unit)
agw1-pmac 192.168.2.21 (in the Agw unit, power switched)
agw1-cam 192.168.2.31 (CRB #13, serves both cameras via azcamserver 192.168.2.199). Note: agw1-azcamg and agw1-azcamw retired, now handled by agw1-cam.
AGW2 - AIP Right Front Bent Gregorian
agw2-moxa 192.168.2.12 (in the Agw unit)
agw2-pmac 192.168.2.22 (in the Agw unit, power switched)
agw2-cam 192.168.2.32 (CRB #13, serves both cameras via azcamserver 192.168.2.199). Note: agw2-azcamg and agw2-azcamw retired, now handled by agw2-cam.
AGW3 - AIP Left Direct Gregorian (not on the telescope as of May 2015, in small clean room).
agw3-moxa 192.168.2.13 (in the Agw unit)
agw3-pmac 192.168.2.23 (in the Agw unit, power switched)
agw3-cam 192.168.2.33 (in the Agw unit, serves both cameras without azcamserver 192.168.2.199). Note: agw3-azcamgw retired, now handled by agw3-cam.
AGW4 - AIP Right Direct Gregorian (not on telescope as of May 2015, in downtown lab)
agw4-moxa 192.168.2.14 (in the Agw unit)
agw4-pmac 192.168.2.24 (in the Agw unit, power switched)
agw4-cam 192.168.2.34 (in the Agw unit, serves both cameras without azcamserver 192.168.2.199). Note: agw4-azcamgw retired, now handled by agw4-cam.
AGW5 - MODS Left Direct Gregorian inside MODS1 (not controlled by oacserver.)
mods1-cam 192.168.2.35 (CRB #13, serves both cameras via azcamserver 192.168.2.199)
AGW6 - MODS Right Direct Gregorian inside MODS2 (not controlled by oacserver.)
mods2-cam 192.168.2.36 (CRB #13, serves both cameras via azcamserver 192.168.2.199)
AGW7 - AIP Left Rear Fiber Feed Bent Gregorian
agw7-moxa 192.168.2.17 (in the Agw unit)
agw7-pmac 192.168.2.27 (in the Agw unit, power switched)
agw7-cam 192.168.2.37 (LTH serves both cameras without azcamserver 192.168.2.199)
AGW8 - AIP Right Rear Fiber Feed Bent Gregorian
agw8-moxa 192.168.2.18 (in the Agw unit)
agw8-pmac 192.168.2.28 (in the Agw unit, power switched)
agw8-cam 192.168.2.38 (LTH serves both cameras without azcamserver 192.168.2.199)
MOXA Terminal Server
The MOXA is the ethernet to serial interface in the AGW ring. The MOXA is always powered up. As long as the MOXA is alive, the one-wire mount points should be functional. If the one-wire mount points do not seem to work when the MOXA is pingable, a reboot of
oac may be required.
You can control the MOXA via telnet, if it need to reset it for some reason. But, this should be rare.
Link to AGW Off-Axis Change Log (documenting repairs, upgrades, normal maintenance, etc.):
https://docs.google.com/spreadsheets/d/1eaImLHHYpgfXoVRZAUqUvP4V6ZnCoh3vJDYdgJYmf4Q/edit?usp=sharing
--
ChrisBiddick - 03 Nov 2015
--
ChrisBiddick - 23 May 2015
--
TarasGolota - 29 Jul 2014
--
TorstenLeibold - 20 Jun 2011
--
JohnHill - 19 Apr 2008
--
DougMiller - 18 Apr 2008