HDF5 Telemetry Data Maintenance Procedures

Users / Machines

ksummers is the user for replicating from the mountain to Tucson. We use the ssh machine on the mountain and tel1.tucson.lbto.org.
( the password for the teladmin account is the Tucson root password, backwards )

The teladmin user is in the telemetry group. All the telemetry files should be read/write by the telemetry group so that it can clean up.
Maybe in Tucson they should all be read only by everyone? Or is that taken care of by the mount?

The web root is on telemetry.lbto.org at /telemetry/html

Yearly Cleanup / Maintenance

As noted in section 7 of the Telemetry System User's Guide, we can clean up telemetry every year. This is a manual process, so that we do not lose valuable data.

It has changed each year as we have sometimes had dedicated disks for "current" and "archive" telemetry, and sometimes to sharing a disk, with different directories. Likewise when we changed between SAN and NAS systems. But, here's what we did in January-2018. All of these steps are done on the Tucson disk systems, the mountain just rolls over its storage.

  1. create directory /lbt/data/telemetry/2017
  2. from the new directory: rsync -av --stats /lbt/telemetry_data/tcs/* ./
  3. delete 2018 data from the new /lbt/data/telemetry/2017 directory
  4. delete the old 2017 data from the "current" tcs directory /lbt/data/telemetry/tcs
  5. delete PMC actuator data more than 1 year old (all of 2016 actuator files)
        cd /lbt/data/telemetry/2016/pmcl/2016
        find . -type d -exec chmod 755 {} \;
        find . -name \*actuator\* -exec chmod 644  {} \;
        find . -name \*actuator\* -exec rm  {} \;
        find . -type d -exec chmod 555 {} \;
        cd /lbt/data/telemetry/2016/pmcr/2016
        find . -type d -exec chmod 755 {} \;
        find . -name \*actuator\* -exec chmod 644  {} \;
        find . -name \*actuator\* -exec rm  {} \;
        find . -type d -exec chmod 555 {} \;  

Telemetry Retention Policy

The original plan had iif telemetry as only a three year retention. But, this is the older seeing data, so it should be kept permanently.
(Note that as of May-2015, DIMM is writing the seeing data. IIF stopped writing a separate dimm stream in Oct-2015.)

The following table is in order of retention and daily rate of storage - PMC is the biggest disk space user of the subsystems. The size numbers for subsystems that are not used all the time (for example, AOS is only used on AO nights, GCS is not used on LBC nights, DIMM is not available on bad weather nights, etc.) are 2017 full-year numbers.

subsystem/stream retention size(1yr) notes
pmc-actuators 1 year 1T x 2 actuator_NNN_NNN streams
pmc-other 3 years 350G x 2 non-actuator streams
mcs/mcspu 3 years 279G all streams
pcs-nonpointing 3 years 138G stream names are
oss 3 years 91G all streams
aos 3 years 3.5G x 2 all streams
gcs 3 years 250M x 2 all streams
dimm-acquisition 3 years 3.5M acquisition stream
ecs permanent 31G all streams
psf permanent 1.5G x 2 all streams
dds permanent 4.6G all streams
env permanent 2.6G all streams
dimm-seeing permanent 36M seeing stream
iif permanent 60M historical seeing data
pcs-pointing permanent 195M stream names are sx.telescope, dx.telescope, sx.guide, dx.guide
All Sky images permanent 16-18G

Mountain Disk Space

TCS Data

The mountain disk is currently (2017) a shared space, but it's monitored to stay under 3TB


2016 shared disk now, we should monitor it and keep it under 2TB
June-2015   Mountain disk is now 2TB
Dec 11-31, 2012 204G  
Jan-2013 283G 9.2G per day
3 months * 283GB = 849GB (83% of the 1TB disk)
Feb-2013 189G 11G one day; 0G Feb 12-18

We are using rsync to replicate mountain data down to Tucson every morning - from ssh.mountain.lbto.org to tel1.tucson.lbto.org. It takes about 20 minutes each morning to rsync TCS data down to Tucson and about 25 minutes for OVMS data (as of May-2013, with disk 90%)
The rsync jobs have begun failing as of the end of October-2013, it's impossible to say how long it really takes. The bwlimit parameter made it slow down, but did not make a difference in the failure rate.

The less data on the mountain, the less it has to go through to determine what needs rsync'ing. I tried making it smarter, pruning its list to just the files we know are different from day to day, but it still has to go through all the files to find those, so it's not any faster. It's cleaner to just let it go through everything and decide what has to be replicated to Tucson.

Note: When we transitioned to 2014, I had to modify the rsync script to only look at 2014 files on the mountain. Previously, the script didn't have to prune that way, it just brought everything down. When we cleaned up and then only had 2014 files on the mountain, went back to an rsync command that doesn't prune.

Note: When the mounts change, you have to modify the rsync commands to match - for instance, whether the ovms is part of the path or not, etc. If you have rsync problems it could be due to mounts changing on the machines that we rsync between.

On 26-June-2015 here are the rsync jobs.
The cron table for the rsyncs on ssh.mountain.lbto.org :
# each day sync the mountain telemetry data down to tucson
00 14 * * * /home/ksummers/bin/rsync-tcs.sh
30 14 * * * /home/ksummers/bin/rsync-ovms.sh

Here's the rsync-tcs.sh script that runs on ssh :

today=`date +%Y%m%d%H%M`
rsync -avz --stats --exclude 'current*.h5' -e "ssh -i /home/ksummers/.ssh/ksummers-dsa-ssh-new.key" /lbt/telemetry_data/tcs/ teladmin@web.tucson.lbto.org:/lbt/tele
metry_data/tcs/ > /home/ksummers/logs-rsync/tcs-${today}.log 2>&1
files=$(grep "files transferred"  /home/ksummers/logs-rsync/tcs-${today}.log | cut -f5 -d" ")
tail -15 /home/ksummers/logs-rsync/tcs-${today}.log | /bin/mail -s "TCS:  $files telemetry files rsync'ed" ksummers@lbto.org

Here's the rsync-ovms.sh script:
today=`date +%Y%m%d%H%M`
rsync -avz --stats --exclude 'current*.h5' -e "ssh -i /home/ksummers/.ssh/ksummers-dsa-ssh-new.key" /lbt/telemetry_data/ovms/ teladmin@web.tucson.lbto.org:/lbt/tel
emetry_data/ovms  > /home/ksummers/logs-rsync/ovms-${today}.log 2>&1
files=$(grep "files transferred"  /home/ksummers/logs-rsync/ovms-${today}.log | cut -f5 -d" ")
tail -15 /home/ksummers/logs-rsync/ovms-${today}.log | /bin/mail -s "OVMS:  $files telemetry files rsync'ed" ksummers@lbto.org

Both the TCS and OVMS mountain disks go up about 1% per day.

Note: we are not running any automatic cleanup yet. When we did that, it hung the SAN
The cleanup procedure I am implementing runs on the first day of each month. It will keep the last one month's of TCS data, deleting the month before that. For example, when it runs at the beginning of April, it deletes all of February's data.
See the scripts cleanup-tcs-telem.sh and cleanup-ovms-telem.sh in /home/teladmin . The scripts can be run manually, and will prompt the user for whether to delete or not.

Tucson Disk Space

Tucson disk is divided between the Starboard, SAN and Synology NAS. The old data is on the SAN. The "current" data is on the NAS. Here's a df command on web (from 20180108) to illustrate what's on the Synology (134.20) and what's SAN (134.11):       68337154048 42424126464 25913027584  63% /lbt/data/telemetry    19433417472 14482291840  4951006848  75% /lbt/data/telemetry/tcs   19433417472 14482291840  4951006848  75% /lbt/data/telemetry/ovms

The following numbers include AllSky image data files (about 50Mb/day)
pre2013 31G  
2013 871G
2014 751G
2015 874G
2016 910G
2017 2.9T will clean up to <1T after PMC actuator data is deleted

There is a script in /home/ksummers/telemetry/sizing/telemetrySizing.csh that takes a UTC date (e.g., 20140227 ) and greps out the sizes of the files on the Tucson disk. It provides an output file named TCSOneNightTotal-20140227.csv
    aosr    2631928
    aosl    5209992
    mcs     1196726856
    ecs     59157816
    pmcr    4178178616
    pmcl    4175054216
    iif     105680
    oss     312832404
    gcsr    0
    gcsl    2361496
    psfr    404320
    psfl    9632104
    pcs     452074860
    env     6354476

The numbers are in bytes, so this total is 10400724764/1024/1024/1024 = 9.69 GB

Since the 2013 data is on a separate disk, use the script telemetrySizing2013.csh to get totals for that.
Topic revision: r28 - 08 Jan 2018, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback