h5csv Tool

usage: h5csv [OPTIONS] -d dataset files
     -h,   --help         Print a usage message and exit
     -V,   --version      Print version number and exit
     -o F, --output=F     Output raw data into file F
     -c 4,5,6             Print the specified columns (by number) of the dataset
                          (note that time_stamp (col 1) is included by default)
                          This option only works for flat structures (weather, seeing, etc.) 
                          and cannot be used at the same time as -n 
     -n "name1, name2"    Print the specified columns using the names of the dataset
                          (note that time_stamp is included by default)
                          This option cannot be used at the same time as -c 
     -s    --strtime      Convert MJD time microsecs to UTC time string
     -u    --unixtime     Convert MJD time microsecs to Unix UTC time millisecs
     -e    --enumnumber   Print enum strings as integers (for graphing)
     -f    --fields       Print only field names and column numbers 
                          This option must be used on its own
     -A    --onlyattr     Print only the attributes. field names/number, descriptions, units 
                          This option must be used on its own, and must provide a filename.
     -d P, --dataset=P    Use the specified dataset
                          This argument is ALWAYS required and must be the LAST option

--------------- Examples ---------------

  1) Dump the operations dataset from DDS HDF5 file 201601190000.dds.operations.h5 to a CSV file named operations.csv
       h5csv -o operations.csv -d operations_01 /lbt/telemetry_data/tcs/dds/2016/01/19/201601190000.dds.operations.h5 
       h5csv -d operations_01 /lbt/telemetry_data/tcs/dds/2016/01/19/201601190000.dds.operations.h5 > operations.csv 

  2) Select columns 3,4 from dataset seeing_01 and convert the timestamp from MJD microseconds to unix time microseconds in 201506290541.dimm.seeing.h5
       h5csv -c 3,4 --unixtime -d seeing_01 /data/201506290541.dimm.seeing.h5 > seeing.csv 

  3) Select column 5 from dataset seeing_01 in 201506290541.dimm.seeing.h5, output to seeing.csv
       h5csv -c 5 -o seeing.csv -d seeing_01 /data/201506290541.dimm.seeing.h5 

  4) Select columns named sxflux and sxfwhm from dataset guiding_01 in 201601191109.gcsl.guiding.h5
       h5csv -n "sxflux, sxfwhm" -d guiding_01 /lbt/telemetry_data/tcs/gcsl/2016/01/19/201601191109.gcsl.guiding.h5 

  5) Dump the attributes from dataset offload_ttf_command_01 in 201601041558.aosl.offload_ttf_command.h5
       h5csv -A columns.txt -d offload_ttf_command_01 /lbt/telemetry_data/tcs/aosl/2016/01/04/201601041558.aosl.offload_ttf_command.h5 



Build and Install

Summer-2016, the telemetry collection library and h5csv both transitioned to hdf5-1.10.0
(this link has changed, they're not always consistent, so we should put it somewhere maybe)


The build is set up to use the HDF5 tools directory structure and just put our modified files on top. The main build is for the telemetry visualization running on the 64-bit web host. A 32-bit build can also be done for the mountain, when required, for the LBTplot tools.

KS does the following on shell64 as user ksummers in the directory /home/ksummers/telemetry/h5csv-64bit :

1. Download and untar http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.10.0.tar.bz2 into hdf5-1.10.0

2. Checkout the updates from svn:
cd hdf5-1.10.0/tools
cp -pR h5dump h5csv
cd ..
svn export https://svn.lbto.org/repos/tools/trunk/h5csv
    This should checkout the following files into a h5csv directory in hdf5-1.10.0 :

3. Move the files to the appropriate directories (on top of the hdf5 installed files) with the script mvFiles , configure and make:
./h5csv/mvFiles                            (complains that it cannot delete the directory, but it's ok)

./configure --prefix=<your-path-here>/hdf5-1.10.0  --enable-shared=no \
   --enable-static-exec --enable-threadsafe --with-pthread --disable-hl
    If you want a development version, configure with debug by adding:
    Then build
  gmake                          (Takes several minutes and there are lots of warnings) 
  gmake install                  (puts the executable into the prefix/bin directory)  
    Builds a lot more than it needs to, but only builds the one executable of tools

    The executable h5csv is in the tools/h5csv and in the top-level bin directory

4. Copy the executable to /web/modules/hdf5-1.10.0/bin (Tucson) for use by the telemetry visualization application

5. Repeat the steps on a 32bit host in Tucson (I use rm580f-1 ), if necessary, using the lbtscm account in /home/lbtscm/lbt32/astronomy/hdf5-1.10.0 . Using the configure command:
./configure --prefix=/home/lbtscm/lbt32/astronomy/hdf5-1.10.0 \
 --enable-shared=no --with-zlib=/home/lbtscm/lbt32/lbto_runtime/zlib-1.2.7 --enable-static-exec \
 --enable-threadsafe --with-pthread --disable-hl 

6. Copy the executable to /lbt/astronomy/stow/h5csv/bin (Tucson and the mountain)


Feb-2017 Version

This version is called 1.10.0(Feb2017)

Added a required command-line parameter for the -A option for a filename. It used to just write to a file called columns.txt but now it will be called multiple times at once in the same directory, so it has to have a unique filename. This filename will be passed in from the telemetry visualization tools.

Note: There is some weird behavior with this parameter. When used with the -o option (send output to this filename, like telemToCSV does), then it does not put the column names in the -o file, but only in the file specified with -A.
When used without the -o option, redirecting the output instead, the column names and attributes are sent to the -o file.
Either way, the file specified with the -A option has only the column names.

Is that OK?

Aug-2016 Version

This version is called 1.10.0(Aug2016)

No functionality mods - just updated to HDF5 1.10.0.

Apr-2016 Version

This version is called 1.8.16(Apr2016)

Minor mods to allow the "Units" attributes to be dumped and parsed; only affects the -A option:
  • delete quotes from strings (h5tools_str.c)
  • h5tools_dump_attribute doesn't do anything unless the name is "Units" (h5tools_dump.c)
  • change call to h5tools_dump_attribute to use rawdatastream instead of rawoutstream
    delete use of field number in column list dump (h5dump_ddl.c)
  • columns.txt set as h5tools_set_data_output_file if -A is used (h5dump.c)

Jan-2016 Version With Named Column Selection

This version is called 1.8.16(Feb2016)

Column selection using numbered columns didn't work correctly for nested structures. Also, column numbers could change as streams are modified by applications that are writing the streams. This version allows column selection by column name, using -> for the syntax.

For instance:
h5csv -u -n "hp3->absenc,hp4->loadcell,hp5->command" -d hardpoints_01 /lbt/telemetry_data/tcs/pmcr/2016/01/19/201601190000.pmcr.hardpoints.h5
 timestamp_utc, hp3->absenc, hp4->loadcell, hp5->command

h5csv -n "temperature,pressure"  -d weather_01 /lbt/telemetry_data/tcs/pcs/2016/01/19/201601190001.pcs.weather.h5
 time_stamp, temperature, pressure

lib/h5tools.h/.c added to ctx structure - named field list of structs with parent/name strings
new parentInList method so that we can know to traverse through a data type to look for columns we want
changed columnInList method to use column numbers or names
lib/h5tools_dump.c append new cmpd_fieldsep (using colon) to end of a column name when dumping types so that we can parse out the column number
change the traversing to also check parentInList so that we go in to compound types if individual columns are requested
more checking required on some of the output to make sure we don't print a "parent" name or separator when we want the "parent->child" field
dump_attribute method modified for our use
change datasetblockbegin to be CR so we get a CR after stream name when dumping attributes only
lib/h5tools_str..h/.c h5tools_print_char modified to replace comma with blank in case we have commas in descriptions (which we don't after the next TCS build, but we do now)
h5tools_str_sprint added parent argument for recursive calls - this function is just like print_datatype the way it's used for compound data types. So, similar to mods in print_datatype, had to check parentInList to make sure it traversed through compound types and make sure we don't get "parent" or separator when we want the "parent->child"
If doing field lists, don't use the quote character
h5csv/h5dump.c New arguments implemented -A for attributes only (used to build the telemetry map), -n for columns by name
h5csv/h5dump_ddl.h/.c handle_datasets and dump_datasets modified to take a list of column names
added parsing of the columns by name

Should go back and revisit how the parent is created and used in h5tools_str.c to make sure it's cleaning up - may be memory leaks there. But, h5csv is a standalone exe so it doesn't matter a whole lot...

October-2015 Version With Column Selection and Time Conversion

Based on hdf5-1.8.15-patch1 (http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.15-patch1/src/hdf5-1.8.15-patch1.tar.bz2)

lib/h5tools.h added to ctx structure - parent, column info, file number
lib/h5tools.c added columnInList function
render_element change to basically ignore ncols so it doesn't put a line break in the middle of our list
lib/h5tools_dump.h add parent to h5tools_print_datatype
lib/h5tools_dump.c change fmt_double, fmt_float to match our formats
cmpd_sep, cmpd_pre, cmpd_suf, cmpd_end, datasetend, datatypeend, fileblockend, datasetblockend, datablockbegin, datablockend, strucblockend to get rid of CR and {}
dset_format, datasetbegin, databegin to not have titles
elmt_suf1 to be newline to make multiple files have newlines after each row of data
added cmpd_nestsep for the separator between compound data type names and the fields (using ->)
h5tools_dump_simple_data function added a need_prefix FALSE before for loop on elements
print_datatype lots of changes: added column check so we can get the header for only the columns we want; changed ncols - but did that work?? also deleted the names of the datatypes printed; tweaking for CRs and delimeters
h5tools_dump_datatype don't print datatype if it's not the first H5 file we're working on, CR tweaking
h5tools_dump_data string_dataformat.idx_format changed to make the indexes for each row NOT print ; CR changes
change the name of the first field in the first compound data type to timestamp_utc if time conversion requested
lib/h5tools_str.c h5tools_str_sprint modified float format, added column check, don't use line_indent between values or cmpd_name
new method h5tools_str_sprintUTC created to call if time conversion requested
h5csv/h5dump.c The main program for h5csv, this is modified extensively to delete most of the options and call handle_datasets directly instead of through the handle functions set up
h5csv/h5dump_ddl.h dump_dataset and handle_datasets modified to take column list and file number params
h5csv/h5dump_ddl.c dump_datatype sets line_ncols to 1024 -- do we need that?
don't include dataset begin/name/blockbegin
parse column list command-line parameter into array of col numbers
call h5tools_dump_datatype with rawdatastream instead of rawoutstream
don't call h5tools_dump_dataspace
don't iterate over attributes attr_iteration
send datasetend to rawdatastream instead of rawoutstream

Initial Version

The initial version used by John's python tools is based on hdf5-1.8.10 and does only the CSV dump.


Telemetry files used by John's python tools are a good set of data for testing.

With most of the mods, I've been able to directly diff the csv files generated.

See the script: /home/ksummers/telemetry/csv-testing/Apr2016/jmhCSVTesting.sh

Test column selection, nested structures

No columns:
h5csv  -d weather_01 /lbt/telemetry_data/tcs/pcs/2016/01/19/201601190001.pcs.weather.h5 | head
time_stamp, tai_offset, temperature, pressure, humidity, stationid

Multiple columns in a non-nested stream:
h5csv -n "temperature,pressure" -d weather_01 /lbt/telemetry_data/tcs/pcs/2016/01/19/201601190001.pcs.weather.h5 | head
time_stamp, temperature, pressure

Check nested columns:
h5csv -u   -d y_01 /lbt/telemetry_data/tcs/oss/dyb/2016/01/19/201601190000.oss.dyb.y.h5 | more
timestamp_utc, tai_offset,errors->is_flow_error_active,errors->is_flowmeter_alarm_active,errors->is_flow_out_of_range,errors->is_general_alarm_active,errors->is_latch_interlock_faulted,errors->is_overfl
is_pump_on,inprocess->is_pump_on_fwd,inprocess->is_pump_on_rev,inprocess->is_tank_overflowing,inprocess->is_valve_closed,inprocess->is_valve_open, tank_level_r, tank_level_f, accum_inbalance, flow, pump
_rate, temp, trim, left_sa_moment_0, left_sa_moment_1, left_sa_moment_2, left_sa_moment_3, right_sa_moment_0, right_sa_moment_1, right_sa_moment_2, right_sa_moment_3, plusmomrem, minusmomrem
Check column selection with nested names:
h5csv -n "hp3->absenc,hp4->loadcell,hp5->command" -d hardpoints_01 /lbt/telemetry_data/tcs/pmcr/2016/01/19/201601190000.pmcr.hardpoints.h5 | head


  • bug when doing lots of h5 files - it seems to put random extra CRs in and then after every row. But the funny thing is that they have the same number of rows when you do a wc on the files. ??? See the env files for the whole month of June. Seems that it doesn't look like a problem when you're just getting a few cols. But the wc is the same there too. Is it just a visualization problem with the csv file?

When using --fields option, don't use other options. pipe it through sed to get newline separated file and then you can also substitute the colon for a comma to get what we need for Doug's file.
 /data/hdf5-1.8.15-patch1/tools/h5csv/h5csv --fields -d secondarymirror_collimation_01 /lbt/telemetry_data/tcs/psfl/2015/10/01/201510010200.psfl.secondarymirror_collimation.h5 | tr , '\n' 
Topic revision: r31 - 15 May 2017, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback