Application used by the web visualization tools to:figure out where the HDF5 files are and which ones to process,
process each via
h5csv
, and merge the multiple streams, if necessary.
usage: telemToCSV --stream1 stream1 --stream2 stream2 \
--bdate YYYY/MM/DD [--edate YYYY/MM/DD] \
[-a HH:MM:SS] [-g HH:MM:SS] \
[--fields1 "name1,name2,nameN"] [--fields2 "nameX,nameY" \
--csv outputfilename \
--timeconvert
where stream1/2 is a dot-delimited string (e.g., pcs.trajectories, mcspu.az.servo_data)
fields1/2 is a comma-delimited string (e.g., "windspeed,temperature")
HH:MM:SS are specifiers for start time (-a) and end time (-g)
For example:
telemToCSV --stream1 dimm.seeing --bdate 2016/03/19 --timeconvert --csv seeing-temperature.csv --fields1 "seeing" --stream2 env.lbt_weather --fields2 "temperature"
telemToCSV --stream1 ovms.opd_estimation_mode2 --bdate 2016/04/19 -a 02:00:00 --edate 2016/04/19 -g 02:02:00 --timeconvert --csv opdestimation20160419.csv
Be aware that time parsing (-a,-g) only works on machines running in UTC
and the timeconvert option is required for merging two streams.
Features
- First line of the CSV generated is a comma-delimited string of units for each field - works for merged as well as single streams
- Supports finding all TCS subsystems, DIMM and OVMS as subsystems in the current as well as previous year's files
- Picks up all the HDF5 files in a directory if a whole day is specified
- Supports begin and end time parameters (applicable to the begin and end dates specified)
- Uses a python program to merge CSV files, to sort CSV files to get timestamps in order
- Returns codes for all error conditions and prints error messages so that the web side can print the error message, returns zero length CSV when a file is expected
Issues
- doesn't handle time frame that spans a year roll over - so if you select 29-Dec-2015 to 2-Jan-2016, for instance
- when I merge two full streams, we get two
tai_offset
fields in there - one named tai_offset_x
and one tai_offset_y
-- I think the Pandas stuff might be doing that when it finds both. Does it matter?
- I thought
tai_offset
was not in the whole stream dump, but it seems to be there
- files have to be listed to h5csv in order because it does not sort on timestamp -- but, do if we sort all CSV files, not just ones being merged, that could fix that
Testing
- Verify comma-delimited units string is the first line of the file
- Verify field names are comma-delimited second line
- Verify dates are correct
Full Stream
telemToCSV --stream1 dds.operations --bdate 2016/07/04 --timeconvert --csv operations20160704.csv
Column Selection
telemToCSV --stream1 dimm.seeing --bdate 2016/07/11 --timeconvert --csv dimm.seeing.20160711.csv --fields1 "seeing"
Merging Streams
Make sure
mergeFiles.py
is in the same dir as you are testing.
Test combining streams together:
telemToCSV --stream1 gcsl.guiding --stream2 env.lbt_weather --csv gcs-weather-fullstreams20160319.csv --bdate 2016/03/19 --edate 2016/03/19 --timeconvert
telemToCSV -t -a 03:13:45 -g 05:15:35 --stream1 gcsl.guiding --fields1 "sxfwhm" --stream2 env.lbt_weather --fields2 "temperature,windspeed" --bdate 2016/03/19 --edate 2016/03/19 --csv gcs-weather-3fields.csv
telemToCSV -t -a 03:13:45 --stream1 gcsl.guiding --fields1 "sxfwhm" --stream2 env.lbt_weather --fields2 "temperature,windspeed" --bdate 2016/03/19 --edate 2016/03/19 --csv gcs-weather-3fields-notime.csv
Time Parsing
For time parsing to work, you must be on a UTC machine (web in Tucson)
If you enter only a
--bdate
, you get only that date.
If you enter both
--bdate
and
--edate
, you get all the files for those dates, inclusive.
telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 --timeconvert --csv temperature201608.csv --edate 2016/08/20 --fields1 "temperature"
telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 -a 19:00:00 --timeconvert --csv temperature201608-timeslice.csv --edate 2016/08/05 -g 19:00:00 --fields1 "temperature"
telemToCSV --stream1 ovms.opd_estimation_mode2 --bdate 2016/07/11 -a 02:00:00 -g 04:00 --timeconvert --csv 20160711-estmode2.csv --fields1 "diff_z_fp" --debug
telemToCSV -t -a 05:13:45 -g 05:15:35 --stream1 ovms.opd_estimation_mode2 --bdate 2016/03/24 --edate 2016/03/24 --csv ovms-mode2-timesample.csv
Single file, with time specifications (make sure it processes the file only once):
telemToCSV --stream1 env.lbt_weather --bdate 2016/08/01 -a 17:00:00 -g 19:00 --timeconvert --fields1 "temperature" --csv temperaturetimeslicetest.csv
Parse over month boundary:
[teladmin@web telemToCSV]$ telemToCSV --stream1 env.lbt_weather --bdate 2016/07/30 -a 19:00:00 --timeconvert --csv humid201608-timeslice.csv --edate 2016/08/02 -g 19:00:00 --fields1 "humidity" --debug
/lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 fileTime:7300001 begin:7301900 end:8021900
/lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 file too early
/lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5 fileTime:7310000 begin:7301900 end:8021900
/lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5 file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5
/lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5 fileTime:8010001 begin:7301900 end:8021900
/lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5 file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5
/lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5 fileTime:8020000 begin:7301900 end:8021900
/lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5 file between times, adding, prevFilename /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5
adding /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5 and quitting ...
Execute: /web/modules/hdf5-1.10.0/bin/h5csv -A -o units.csv -d lbt_weather_01 /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 > /dev/null 2>&1
Execute: /web/modules/hdf5-1.10.0/bin/h5csv -o humid-timeslice.csv --unixtime --enum -n " humidity " -d lbt_weather_01 /lbt/telemetry_data/tcs/env/2016/07/30/201607300001.env.lbt_weather.h5 /lbt/telem
etry_data/tcs/env/2016/07/31/201607310000.env.lbt_weather.h5 /lbt/telemetry_data/tcs/env/2016/08/01/201608010001.env.lbt_weather.h5 /lbt/telemetry_data/tcs/env/2016/08/02/201608020000.env.lbt_weather.h5
Execute: cat humid-timeslice.csv | awk '{time = $1 / 1000.0 ; if ( NR <= 1 || (time >= 1469905200 && time <= 1470164400) ) {print $0} }' > humid-timeslice.csv.pared
(don't) Execute: mv humid-timeslice.csv.pared humid-timeslice.csv
Execute: sed -i '1 i\microsecond, percent' humid-timeslice.csv
(don't ) Execute: rm -f units.csv columns.txt
[teladmin@web telemToCSV]$ ll humid*
-rw-r--r-- 1 teladmin users 8050857 Aug 24 21:53 humid-timeslice.csv
-rw-r--r-- 1 teladmin users 6032882 Aug 24 21:53 humid-timeslice.csv.pared
[teladmin@web telemToCSV]$ converttime -u 1469905200; converttime -u 1470164399
*** Unix translation Unix time:1469905200 is UTC time:Sat Jul 30 19:00:00 2016
*** Unix translation Unix time:1470164399 is UTC time:Tue Aug 2 18:59:59 2016
Notes
20-July-2015
A few days of work with the latest hdf5 version (1.8.15-patch1) got a version of h5csv working that can dump specified columns. It seems the file writing is the time waster. If I use a large, 100MB file with 26 of its own fields (
PCS trajectories) and two telemetry fields (timestamp and secs TAI), I get the following results.
-- no column numbers specified
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories.csv
real 0m57.352s
user 0m54.120s
sys 0m0.527s
[ksummers@rm580f-1 h5dump]$ cp colNumbers.txt.save colNumbers.txt (3 columns)
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories-3cols.csv
real 0m8.960s
user 0m8.791s
sys 0m0.146s
-- make sure we got all the data (51,000 lines in both files)
[ksummers@rm580f-1 h5dump]$ wc 2015*pcs*.csv
515202 515205 40700926 201506250709.pcs.trajectories-3cols.csv
515202 515229 271388249 201506250709.pcs.trajectories.csv
1030404 1030434 312089175 total
-- try 3 columns at the end instead of the beginning - does it affect the timing?
[ksummers@rm580f-1 h5dump]$ time ./h5dump -d trajectories_01 /lbt/telemetry_data/tcs/pcs/2015/06/25/201506250709.pcs.trajectories.h5 > 201506250709.pcs.trajectories-3othercols.csv
real 0m8.616s
user 0m8.455s
sys 0m0.138s