HDF5 Telemetry Data Maintenance Procedures
Users / Machines
ksummers
is the user for replicating from the mountain to Tucson. We use the
ssh
machine on the mountain and
tel1.tucson.lbto.org
.
(
the password for the teladmin
account is the Tucson root password, backwards )
The
teladmin
user is in the
telemetry
group. All the telemetry files should be read/write by the telemetry group so that it can clean up.
Maybe in Tucson they should all be read only by everyone? Or is that taken care of by the mount?
The web root is on
telemetry.lbto.org
at
/telemetry/html
Yearly Cleanup / Maintenance
As noted in section 7 of the
Telemetry System User's Guide, we can clean up telemetry every year. This is a manual process, so that we do not lose valuable data.
It has changed each year as we have sometimes had dedicated disks for "current" and "archive" telemetry, and sometimes to sharing a disk, with different directories. Likewise when we changed between SAN and NAS systems. But, here's what we did in January-2018. All of these steps are done on the Tucson disk systems, the mountain just rolls over its storage.
- create directory
/lbt/data/telemetry/2017
- from the new directory:
rsync -av --stats /lbt/telemetry_data/tcs/* ./
- delete
2018
data from the new /lbt/data/telemetry/2017
directory
- delete the old
2017
data from the "current" tcs directory /lbt/data/telemetry/tcs
- delete PMC actuator data more than 1 year old (all of 2016
actuator
files) cd /lbt/data/telemetry/2016/pmcl/2016
find . -type d -exec chmod 755 {} \;
find . -name \*actuator\* -exec chmod 644 {} \;
find . -name \*actuator\* -exec rm {} \;
find . -type d -exec chmod 555 {} \;
cd /lbt/data/telemetry/2016/pmcr/2016
find . -type d -exec chmod 755 {} \;
find . -name \*actuator\* -exec chmod 644 {} \;
find . -name \*actuator\* -exec rm {} \;
find . -type d -exec chmod 555 {} \;
Telemetry Retention Policy
The original plan had
iif
telemetry as only a three year retention. But, this is the older seeing data, so it should be kept permanently.
(Note that as of May-2015,
DIMM is writing the seeing data.
IIF stopped writing a separate dimm stream in Oct-2015.)
The following table is in order of retention and daily rate of storage -
PMC is the biggest disk space user of the subsystems. The size numbers for subsystems that are not used all the time (for example,
AOS is only used on AO nights,
GCS is not used on
LBC nights,
DIMM is not available on bad weather nights, etc.) are 2017 full-year numbers.
subsystem/stream |
retention |
size(1yr) |
notes |
pmc-actuators |
1 year |
1T x 2 |
actuator_NNN_NNN streams |
pmc-other |
3 years |
350G x 2 |
non-actuator streams |
mcs/mcspu |
3 years |
279G |
all streams |
pcs-nonpointing |
3 years |
138G |
stream names are mountequatorial trajectories weather sx.*rotatordemand dx.*rotatordemand |
oss |
3 years |
91G |
all streams |
aos |
3 years |
3.5G x 2 |
all streams |
gcs |
3 years |
250M x 2 |
all streams |
dimm-acquisition |
3 years |
3.5M |
acquisition stream |
ecs |
permanent |
31G |
all streams |
psf |
permanent |
1.5G x 2 |
all streams |
dds |
permanent |
4.6G |
all streams |
env |
permanent |
2.6G |
all streams |
dimm-seeing |
permanent |
36M |
seeing stream |
iif |
permanent |
60M |
historical seeing data |
pcs-pointing |
permanent |
195M |
stream names are sx.telescope , dx.telescope , sx.guide , dx.guide |
All Sky images |
permanent |
16-18G |
Mountain Disk Space
TCS Data
The mountain disk is currently (2017) a shared space, but it's monitored to stay under 3TB
192.168.39.20:/volume6/lbto_telemetry_tcs
OVMS Data
2016 |
shared disk now, we should monitor it and keep it under 2TB |
June-2015 |
|
Mountain disk is now 2TB |
Dec 11-31, 2012 |
204G |
|
Jan-2013 |
283G |
9.2G per day 3 months * 283GB = 849GB (83% of the 1TB disk) |
Feb-2013 |
189G |
11G one day; 0G Feb 12-18 |
We are using rsync to replicate mountain data down to Tucson every morning - from
ssh.mountain.lbto.org
to
tel1.tucson.lbto.org
.
It takes about 20 minutes each morning to rsync TCS data down to Tucson and about 25 minutes for OVMS data (as of May-2013, with disk 90%) The rsync jobs have begun failing as of the end of October-2013, it's impossible to say how long it really takes. The bwlimit parameter made it slow down, but did not make a difference in the failure rate.
The less data on the mountain, the less it has to go through to determine what needs rsync'ing. I tried making it smarter, pruning its list to just the files we know are different from day to day, but it still has to go through all the files to find those, so it's not any faster. It's cleaner to just let it go through everything and decide what has to be replicated to Tucson.
Note: When we transitioned to 2014, I had to modify the rsync script to only look at 2014 files on the mountain. Previously, the script didn't have to prune that way, it just brought everything down. When we cleaned up and then only had 2014 files on the mountain, went back to an rsync command that doesn't prune.
Note: When the mounts change, you have to modify the rsync commands to match - for instance, whether the
ovms
is part of the path or not, etc. If you have rsync problems it could be due to mounts changing on the machines that we rsync between.
On 26-June-2015 here are the rsync jobs.
The cron table for the rsyncs on
ssh.mountain.lbto.org
:
# each day sync the mountain telemetry data down to tucson
#
00 14 * * * /home/ksummers/bin/rsync-tcs.sh
30 14 * * * /home/ksummers/bin/rsync-ovms.sh
Here's the
rsync-tcs.sh
script that runs on
ssh
:
today=`date +%Y%m%d%H%M`
rsync -avz --stats --exclude 'current*.h5' -e "ssh -i /home/ksummers/.ssh/ksummers-dsa-ssh-new.key" /lbt/telemetry_data/tcs/ teladmin@web.tucson.lbto.org:/lbt/tele
metry_data/tcs/ > /home/ksummers/logs-rsync/tcs-${today}.log 2>&1
files=$(grep "files transferred" /home/ksummers/logs-rsync/tcs-${today}.log | cut -f5 -d" ")
tail -15 /home/ksummers/logs-rsync/tcs-${today}.log | /bin/mail -s "TCS: $files telemetry files rsync'ed" ksummers@lbto.org
Here's the
rsync-ovms.sh
script:
today=`date +%Y%m%d%H%M`
rsync -avz --stats --exclude 'current*.h5' -e "ssh -i /home/ksummers/.ssh/ksummers-dsa-ssh-new.key" /lbt/telemetry_data/ovms/ teladmin@web.tucson.lbto.org:/lbt/tel
emetry_data/ovms > /home/ksummers/logs-rsync/ovms-${today}.log 2>&1
files=$(grep "files transferred" /home/ksummers/logs-rsync/ovms-${today}.log | cut -f5 -d" ")
tail -15 /home/ksummers/logs-rsync/ovms-${today}.log | /bin/mail -s "OVMS: $files telemetry files rsync'ed" ksummers@lbto.org
Both the
TCS and
OVMS mountain disks go up about 1% per day.
Note: we are not running any automatic cleanup yet. When we did that, it hung the SAN
The cleanup procedure I am implementing runs on the first day of each month. It will keep the last one month's of TCS data, deleting the month before that. For example, when it runs at the beginning of April, it deletes all of February's data.
See the scripts cleanup-tcs-telem.sh
and cleanup-ovms-telem.sh
in /home/teladmin
. The scripts can be run manually, and will prompt the user for whether to delete or not.
Tucson Disk Space
Tucson disk is divided between the Starboard, SAN and Synology NAS. The old data is on the SAN. The "current" data is on the NAS. Here's a
df
command on web (from 20180108) to illustrate what's on the Synology (134.20) and what's SAN (134.11):
10.130.134.11:/disk/lbto_telemetry_tcs 68337154048 42424126464 25913027584 63% /lbt/data/telemetry
10.130.134.20:/volume3/lbto_telemetry/tcs 19433417472 14482291840 4951006848 75% /lbt/data/telemetry/tcs
10.130.134.20:/volume3/lbto_telemetry/ovms 19433417472 14482291840 4951006848 75% /lbt/data/telemetry/ovms
The following numbers include AllSky image data files (about 50Mb/day)
pre2013 |
31G |
|
2013 |
871G |
2014 |
751G |
2015 |
874G |
2016 |
910G |
2017 |
2.9T |
will clean up to <1T after PMC actuator data is deleted |
There is a script in
/home/ksummers/telemetry/sizing/telemetrySizing.csh
that takes a UTC date (e.g.,
20140227
) and greps out the sizes of the files on the Tucson disk. It provides an output file named
TCSOneNightTotal-20140227.csv
aosr 2631928
aosl 5209992
mcs 1196726856
ecs 59157816
pmcr 4178178616
pmcl 4175054216
iif 105680
oss 312832404
gcsr 0
gcsl 2361496
psfr 404320
psfl 9632104
pcs 452074860
env 6354476
TOTAL
10400724764
The numbers are in bytes, so this total is 10400724764/1024/1024/1024 = 9.69 GB
Since the 2013 data is on a separate disk, use the script
telemetrySizing2013.csh
to get totals for that.