telemToCSV Command Line Tool

Application used by the web visualization tools to:figure out where the HDF5 files are and which ones to process,
process each via h5csv , and merge the multiple streams, if necessary.


usage:  telemToCSV --stream1 stream1 --stream2 stream2 \  
                   --bdate YYYY/MM/DD [--edate YYYY/MM/DD] \  
                   [-a HH:MM:SS]   [-g HH:MM:SS] \ 
                   [--fields1 "name1,name2,nameN"]  [--fields2 "nameX,nameY" \ 
                   --csv outputfilename \ 
                   --timeconvert 
      where stream1/2 is a dot-delimited string (e.g., pcs.trajectories, mcspu.az.servo_data)
            fields1/2 is a comma-delimited string (e.g., "windspeed,temperature")
            HH:MM:SS are specifiers for start time (-a) and end time (-g)

      For example:
          telemToCSV --stream1 dimm.seeing --bdate 2016/03/19 --timeconvert --csv seeing-temperature.csv --fields1 "seeing" --stream2 env.lbt_weather --fields2 "temperature"
          telemToCSV --stream1 ovms.opd_estimation_mode2 --bdate 2016/04/19 -a 02:00:00 --edate 2016/04/19 -g 02:02:00 --timeconvert --csv opdestimation20160419.csv

      Be aware that time parsing (-a,-g) only works on machines running in UTC
      and the timeconvert option is required for merging two streams.

Features

  • First line of the CSV generated is a comma-delimited string of units for each field - works for merged as well as single streams
  • Supports finding all TCS subsystems, DIMM and OVMS as subsystems in the current as well as previous year's files
  • Picks up all the HDF5 files in a directory if a whole day is specified
  • Supports begin and end time parameters (applicable to the begin and end dates specified)
  • Uses a python program to merge CSV files, to sort CSV files to get timestamps in order
  • Returns codes for all error conditions and prints error messages so that the web side can print the error message, returns zero length CSV when a file is expected

Issues

  • doesn't handle time frame that spans a year roll over - so if you select 29-Dec-2015 to 2-Jan-2016, for instance
  • when I merge two full streams, we get two tai_offset fields in there - one named tai_offset_x and one tai_offset_y -- I think the Pandas stuff might be doing that when it finds both. Does it matter?
  • I thought tai_offset was not in the whole stream dump, but it seems to be there
  • files have to be listed to h5csv in order because it does not sort on timestamp -- but, do if we sort all CSV files, not just ones being merged, that could fix that

Testing

  1. Verify comma-delimited units string is the first line of the file
  2. Verify field names are comma-delimited second line
  3. Verify dates are correct

Full Stream
 telemToCSV --stream1 dds.operations --bdate 2016/07/04 --timeconvert --csv operations20160704.csv

Column Selection
 telemToCSV --stream1 dimm.seeing --bdate 2016/07/11 --timeconvert --csv dimm.seeing.20160711.csv --fields1 "seeing" 

Merging Streams

Make sure mergeFiles.py is in the same dir as you are testing.

Test combining streams together:
 telemToCSV --stream1 gcsl.guiding --stream2 env.lbt_weather --csv gcs-weather-fullstreams20160319.csv --bdate 2016/03/19 --edate 2016/03/19 --timeconvert

 telemToCSV -t -a 03:13:45 -g 05:15:35 --stream1 gcsl.guiding --fields1 "sxfwhm" --stream2 env.lbt_weather --fields2 "temperature,windspeed" --bdate 2016/03/19 --edate 2016/03/19 --csv gcs-weather-3fields.csv

 telemToCSV -t -a 03:13:45  --stream1 gcsl.guiding --fields1 "sxfwhm" --stream2 env.lbt_weather --fields2 "temperature,windspeed" --bdate 2016/03/19 --edate 2016/03/19 --csv gcs-weather-3fields-notime.csv

Time Parsing

For time parsing to work, you must be on a UTC machine (web in Tucson)

If you enter only a --bdate, you get only that date.
If you enter both --bdate and --edate, you get all the files for those dates, inclusive.

 telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 --timeconvert --csv temperature201608.csv --edate 2016/08/20  --fields1 "temperature" 

 telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 -a 19:00:00 --timeconvert --csv temperature201608-timeslice.csv --edate 2016/08/05 -g 19:00:00 --fields1 "temperature" 

 telemToCSV --stream1 ovms.opd_estimation_mode2 --bdate 2016/07/11 -a 02:00:00 -g 04:00 --timeconvert --csv 20160711-estmode2.csv --fields1 "diff_z_fp" --debug

 telemToCSV -t -a 05:13:45 -g 05:15:35 --stream1 ovms.opd_estimation_mode2  --bdate 2016/03/24 --edate 2016/03/24 --csv ovms-mode2-timesample.csv

Single file, with time specifications (make sure it processes the file only once):
    telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 -a 17:00:00 -g 19:00 --timeconvert  --fields1 "temperature" --csv temperaturetimeslicetest.csv

Parse over month boundary:
   [teladmin@web telemToCSV]$ telemToCSV --stream1 env.lbt_weather --bdate 2016/07/30 -a 19:00:00 --timeconvert --csv humid201608-timeslice.csv --edate 2016/08/02 -g 19:00:00 --fields1 "humidity" --debug
   /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5  fileTime:7300001  begin:7301900  end:8021900
   /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5  file too early 
   /lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5  fileTime:7310000  begin:7301900  end:8021900
   /lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5  file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5
   /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5  fileTime:8010001  begin:7301900  end:8021900
   /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5  file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5
   /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5  fileTime:8020000  begin:7301900  end:8021900
   /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5  file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5
   adding /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5  and quitting ...
   Execute:  /web/modules/hdf5-1.10.0/bin/h5csv -A -o  units.csv  -d lbt_weather_01 /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 > /dev/null 2>&1 
   Execute:  /web/modules/hdf5-1.10.0/bin/h5csv -o humid-timeslice.csv --unixtime  --enum  -n  " humidity "  -d lbt_weather_01 /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 /lbt/telem
etry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5 /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5 /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5 
   Execute:  cat humid-timeslice.csv | awk  '{time = $1 / 1000.0 ; if ( NR <= 1 || (time >= 1469905200 && time <= 1470164400) ) {print $0} }'  > humid-timeslice.csv.pared
   (don't) Execute:  mv humid-timeslice.csv.pared humid-timeslice.csv
   Execute:  sed -i '1 i\microsecond, percent' humid-timeslice.csv
   (don't ) Execute:  rm -f units.csv columns.txt

   [teladmin@web telemToCSV]$ ll humid*
   -rw-r--r-- 1 teladmin users 8050857 Aug 24 21:53 humid-timeslice.csv
   -rw-r--r-- 1 teladmin users 6032882 Aug 24 21:53 humid-timeslice.csv.pared
   
   [teladmin@web telemToCSV]$ converttime -u 1469905200; converttime -u 1470164399
   *** Unix translation   Unix time:1469905200 is UTC time:Sat Jul 30 19:00:00 2016
   
   *** Unix translation   Unix time:1470164399 is UTC time:Tue Aug  2 18:59:59 2016


Notes

20-July-2015

A few days of work with the latest hdf5 version (1.8.15-patch1) got a version of h5csv working that can dump specified columns. It seems the file writing is the time waster. If I use a large, 100MB file with 26 of its own fields (PCS trajectories) and two telemetry fields (timestamp and secs TAI), I get the following results.

-- no column numbers specified
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories.csv

real    0m57.352s
user    0m54.120s
sys     0m0.527s

[ksummers@rm580f-1 h5dump]$ cp colNumbers.txt.save colNumbers.txt                (3 columns)
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories-3cols.csv

real    0m8.960s
user    0m8.791s
sys     0m0.146s

-- make sure we got all the data (51,000 lines in both files)
[ksummers@rm580f-1 h5dump]$ wc 2015*pcs*.csv
   515202    515205  40700926 201506250709.pcs.trajectories-3cols.csv
   515202    515229 271388249 201506250709.pcs.trajectories.csv
  1030404   1030434 312089175 total

-- try 3 columns at the end instead of the beginning - does it affect the timing?
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories-3othercols.csv

real    0m8.616s
user    0m8.455s
sys     0m0.138s
Topic revision: r8 - 08 Oct 2018, PetrKubanek
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback