Computer Shutdown and Startup Procedures
Computer room shutdowns/startups are mostly automated. A script monitors UPS status and commands shutdown(s) as needed.
An email is generated to
telescopework
when the UPS goes
OnBattery
. This is the trigger for non-automated shutdown activities.
Prior to computer hosts shutting down , the following non-automated activities need to be done (Time is of the essence!):
The shutdown script runs from the
ups
account on
monitor
host (
/home/ups/servers
). As time thresholds are crossed while OnBattery, shutdowns (to include in some cases software) occur. Hosts are brought down in priority order as defined by a config file (
/home/ups/servers/servers.config
). Currently, the scheme allows mountain staff enough time to get on generator (15 minutes) before shutdowns begin. Shutdown occurs in waves, finishing about 52 minutes after OnBattery. Emails are sent to the IT staff as the shutdown progresses. They will send emails to telescopework to communicate status. IT staff will also send email to
tucson@lbto.org and to
mountain@lbto.org to let everyone know the shutdown/restart state.
The defined time thresholds for host group shutdowns are:
software |
24 minutes |
PEPSI, MODS, & AO shell rest/fsm currently |
group 1 |
30 minutes |
ARGOS, oac, ovms, LBC, AGw/AzCam, DIMM |
group 2 |
34 minutes |
OBS hosts (not including OBS1) |
group 3 |
38 minutes |
TCS, AO, LUCI |
group 4 |
42 minutes |
web, ssh, obs1, Weatherstation, VM services |
group 5 |
46 minutes |
DNS, SAN |
critical_threshold (PDU shutdown) |
52 minutes |
|
See the
InstrumentFacilityHostContacts page for individual contact information if there are issues with any of the instruments.
- Shutdown/Restart Config File header and contents:
# This is the configuration file for automated host shutdown/startup.
# Comment lines are allowed/ignored (# char MUST be in column 1).
# Any line with a space character in column 1 (regardless of remaining text) is treated as a blank line and is ignored.
# All lines that are not blank or commented MUST contain valid config data.
# Do not add extra space at end of lines.
#
# The following fields ranges are currently defined/allowed/intended:
# group grp1 through grp5
# type linux, windows, VM-server, or PDU
# Note: a PDU type only turns OFF the PDU (i.e. no host shutdown command).
# PDU shutdown is done at critical threshold crossing.
# PDUs turn back ON with specified group.
# Note2: A VM type will not include PDU info (it's a VM).
# hostname host name (only from set connected to Toshiba UPS)
# subtype HW, VM (hard host, or Virtual Machine)
# <PDU1_IP_ADDRESS port> <PDU1 info is optional...only if a PDU is to be disabled/enabled.>
# <PDU2_IP_ADDRESS port> <PDU2 info is also optional, only if a 2nd PDU is to be disabled/enabled.>
#
# ------------------------------------------------------------------------------------
#grp type hostname subtype PDU1-IP Port PDU2-IP Port
#-------------------------------------------------------------------------------------
grp1 linux argos-sx-lalas HW 192.168.52.40 10 192.168.52.41 10
grp1 linux argos-sx-lgsw HW 192.168.52.40 19 192.168.52.41 19
grp1 linux argos-dx-lalas HW 192.168.52.40 9 192.168.52.41 9
grp1 linux argos-dx-lgsw HW 192.168.52.40 17 192.168.52.41 17
grp1 linux oac HW 192.168.52.32 5
grp1 linux ovms HW 192.168.52.35 8 192.168.52.35 9
grp1 linux lbccontrol HW 192.168.52.47 23 192.168.52.48 23
grp1 windows agw1-cam HW 192.168.52.50 13 192.168.52.50 14
grp1 windows agw2-cam HW 192.168.52.49 22 192.168.52.49 23
grp1 windows mods1-cam HW 192.168.52.50 22 192.168.52.50 23
grp1 windows mods2-cam HW 192.168.52.50 20 192.168.52.50 21
grp1 windows azcamserver HW 192.168.52.50 11 192.168.52.50 12
grp1 linux dimm HW
grp1 linux lsys.linc.lbto.org HW 192.168.52.62 15 192.168.52.63 15
grp1 linux lsys2.linc.lbto.org HW 192.168.52.62 14 192.168.52.63 14
grp1 linux laos.linc.lbto.org HW 192.168.52.62 22 192.168.52.63 22
grp1 linux laos2.linc.lbto.org HW 192.168.52.62 16 192.168.52.63 16
grp1 linux lircs.linc.lbto.org HW 192.168.52.62 11 192.168.52.63 11
###grp1 linux lircs2.linc.lbto.org HW 192.168.52.62 ?? 192.168.52.63 ??
grp1 linux lffts.linc.lbto.org HW 192.168.52.62 20 192.168.52.63 20
grp1 linux ln-x1.linc.lbto.org HW
grp1 linux ln-x2.linc.lbto.org HW
##grp1 linux ln-x3.linc.lbto.org HW
grp1 linux ln-x4.linc.lbto.org HW
grp2 linux obs6 HW
grp2 linux obs5 HW
grp2 linux obs4 HW
grp2 linux obs3 HW
grp2 linux obs2 HW
grp3 linux tcs1 HW 192.168.52.32 7 192.168.52.33 7
grp3 linux tcs2 HW 192.168.52.32 6 192.168.52.33 6
grp3 linux flao-sxwfs HW 192.168.52.34 3 192.168.52.35 4
grp3 linux flao-dxwfs HW 192.168.52.35 5 192.168.52.35 6
grp3 windows mountainapp1 VM
grp3 linux linuxapps VM
grp3 linux mt-archive VM
grp3 linux luci.luci.lbto.org HW 192.168.52.42 16 192.168.52.43 16
grp3 linux lucix.luci.lbto.org HW 192.168.52.42 15 192.168.52.43 15
grp3 linux sxadsec HW 192.168.52.34 1 192.168.52.34 2
grp3 linux dxadsec HW 192.168.52.35 1 192.168.52.35 2
grp4 linux web1 HW 192.168.52.32 2 192.168.52.32 3
grp4 linux web2 HW 192.168.52.33 2 192.168.52.33 3
grp4 linux ssh HW 192.168.52.33 5 192.168.52.33 8
grp4 linux obs1 HW
grp4 windows weatherstation-pc HW 192.168.52.45 18
grp4 VM-server vm1 HW 192.168.52.32 4 192.168.52.33 4
grp4 VM-server vm2 HW 192.168.52.32 12 192.168.52.33 12
grp4 windows mt-vmmgr HW 192.168.52.34 7 192.168.52.35 7
#---------------------------------------------------------------------------
# grp5 is reserved for DNS & SAN
#---------------------------------------------------------------------------
grp5 windows mountaindc1 HW 192.168.52.60 7 192.168.52.60 8
grp5 windows mountaindc2 HW 192.168.52.61 7 192.168.52.61 8
grp5 linux node1.san HW 192.168.52.60 1 192.168.52.60 2
grp5 linux node2.san HW 192.168.52.61 1 192.168.52.61 2
To Do
- The following machines may still need to be tested: LN, ARGOS, obs3, vmhost1, vmhost2
All these machines have been updated since the last test and some portion (shutdown or startup) hasn't been re-tested.
- MODS computers were added to the software shutdown script (MODS team wrote the shutdown process separately). Startup is not automated as a result. After handover, a review should be done to standardize the shutdown/restart sequencing. The desire is to have MODS instrument computers come up before the data computers. This will require some kind of "group delay" between startups since right now they are all done at once for startup.
- weatherstation-pc - does that come up on PDU toggle?