For the telemetry visualization, we require a kind of dictionary defining all the streams and fields within.
Initially, we were thinking that field names would have to be mapped to column numbers to allow us to choose only particular columns for the CSV file. We would like to be able to have a list of fields that can be presented to the user. Behind the scenes, the code knows what streams each field is in. They should be able to select fields and then select dates. The code should be able to figure out the h5 files for the set of fields and dates and build a single CSV file, merging (?) dates. But, fields from different streams will NOT have the same timestamps.
Dygraphs allows you to have disparate timestamps in the CSV file (just use a blank for the field value), but this is going to increase the number of rows in the CSV. For instance, if you have temperature on 1sec intervals from 1 stream and azimuth at 10Hz, you will have a CSV something like:
timestamp, tai_offset, temperature, azimuth
We're going to have to limit the number of columns and streams that we allow in combination.
It's easy enough to create a dictionary of the current date. But it has to be more dynamic. Streams have changed in time and streams will continue to be modified. We have to be able to find the fields in any date of a stream.
Might be nice to have a top-level field-to-column map by date. But, our directory structure, with dates at the lowest level instead of the highest level, doesn't make that very convenient. We could create field-to-column maps once a day for each stream. But, they really do not change often at all. Do we really want to be parsing daily
files for each day of someone's request when the stream itself may not have changed during that time?
We could have a scheme where we
- look at the lowest level for a map file, if it's there, the stream has changed this month
- look in the month-level directory, if it's there, the stream has changed since last month
- look in the year-level directory
- look in the top-level subsystem directory
Second Cut at Dictionary (January-Mar, 2016)
executable (see h5csv
) was modified to have an attributes option to help build a map of field information. Using the
option and a script (
) that peruses the directory structure, we can build a map (used by the web visualization tools). But, this tool puts in all sided subsystems instead of generisized. So, we have to edit it to replaced the sided systems with the non-sided and delete the other sides.
The script does not put everything in the right order, so there's more manual effort required.
So, the steps required are basically:
extractMap.sh 20160711 (or whatever date you want to use)
- The resulting file will be
telemetry_vars_20160711.csv in this case
- Move the
mcs,,sys_var streams to be with the other "non-substream" streams of those subsystems
- Add the
iif,,dimm entries (to support older data)
- Add any missing streams you can find - things like
aos,,offload_ho_command , etc.
The current map is pretty complete now, so when re-running the extraction, it's better to just look at the differences and then maybe just edit the current map.
Ran the script on a "busy" night, 20160119 (both sides running, multiple instruments, using DIMM
Then added in the few that you know of that are only "on-demand".
For example, here's what's missing from 20160119:
Not bad! looks like 20160104 is a good day to run (all the ao files and both these VD files)
First Cut at Dictionary (July-2015)
Stephen's first cut at a parser (July-2015) includes field name, subsystem, units, and description (see sample below). It can be sorted by field or subsystem. The full file is captured below as an attachment.
refract,env,arcsecond,Refract calc. from data
humidity,env,none,Relative humidity (%)
Yielded some more thoughts:
- We have about 1900 parameters (not too bad).
- We need to ignore the two fields in every stream that are put in by telemetry (
- There aren't too many duplicates outside of sided vars.
- I think a generic dictionary enhancement would be to add the stream name in a column, and then (bigger effort required) add start and end times for each parameter definition (i.e. what period it is defined over). There could be problems in this last item (gaps). Not sure if that would imply a new definition (and redundant parameter) or not.
What I see from sorting on name and on subsystem is that both sorts are interesting and informative in their own right. I suspect a sort on stream name would also be interesting in it's own right.
A few more brainstorm thoughts after looking at the telemetry dictionary data more closely:
- we need to eliminate tai_offset (already noted below)
- we need stream name (already noted below)
- we need the column number for the field in the HDF5 file
- we could use beginning date and (if available) end date columns too...can these be provided?
- we need to work on the TCS developers to improve some descriptions and units
GUIs and (eventual) use of data:
- Our parameter search GUI will likely need optional subsystem and side filters (which simplifies and reduces the resulting parameter list output)
- we can probably allow GUI and graph magic for parameters whose subsystems contain "l" or "r" as the last letter of the subsystem name
- only display one parameter name vs two, and graph both (which then can be optionally selected for display)
- there are some auto unit conversions we need to consider adding (radians, unix times, etc.)