OVMS Software Troubleshooting

Machines / Users

The telemetry processes (and Monitor GUI) run on ovms.mountain.lbto.org as user ovms

   

ovms@ovms.mountain.lbto.org

The UEI RACKtangle interface to the accelerometers runs on ovms-uei.mountain.lbto.org as root

   

root@ovms-uei.mountain.lbto.org

The RACKtangle LEDs go out when the real-time software is down.
When the real-time software is running, red or blinking red LEDs are a bad state and the data will peg at +/-25.
The picture below shows LEDs in bad states for board 2, slots 1 (LED out) and 2 (LED red), and board 3 slot 1 (no connection at all).

OVMSRacktangleSlotsShowingLEDs-sm.jpg
From James: Here's what the LEDs indicate:
Green - OK
Solid Red - Open Sensor
Flashing Red - Shorted Sensor (could be on the card itself and not necessarily the cable or accelerometer)
Orange - Invalid state
Off - Bias current turned off

Checking / Starting / Stopping

To check OVMS telemetry
  1. Check if real-time data is coming to the ovms telemetry machine
     
  If these graphs are updating: http://statserv.lbto.org/indi_ao/www/ovms2.html
    the real-time data is OK; if not, it may be an INDI problem, not necessarily an OVMS problem
and/or
ssh ovms@ovms
ovmsSniffer -i 192.168.53.60 -f 1000
cntl-C to get out of the barrage of data 
If the data looks ok, there's no need to do anything on the racktangle with the real-time software.

  1. Check if telemetry is being recorded
     ssh ovms@ovms
     ovmsTelemetryClient -s 
     Checking connection with LBT...
     Ctrl \ to abort the command.
     Telemetry is paused
  2. If you see
    OvmsTelemetryI::telException::Telemetry service is not connected to the LBT
    then you are running -s the first time after telemetry restart. Just resume telemetry:
ovms@ovms ~]$ ovmsTelemetryClient -r
Checking connection with LBT...
Ctrl \ to abort the command.
Resuming the telemetry service
Telemetry is running at 1000 Hz
Max Freq =2600 Hz
Min Freq =1000 Hz
and pause it with -p if you really need it to be paused.
  1. If you want to pause/resume telemetry collection, use the -p or -r options to ovmsTelemetryClient (more details below).

To check/stop/restart the real-time software on the racktangle:
  1. Log in to the RACKtangle (root@ovms-uei) and do a ps command (see more detail below)
  2. If it's not running, or you kill it, restart it with
     
      ./ovmsrtService -f 1000 -d 12 -D 3.0 -H 0 -G 0 -P LBTI-noM1 -E 2 &

    Use -d 11 for 11 slots which we used prior to summer 2018. We are using 12 slots since summer 2018 to include the rovers, so -d 12 is now the routine startup.
      We use -E 2 for OPD estimation mode 2 (separate tip-tilt parameters)
      All other command-line parameters can typically default (IP address of the uei, 1KHz frequency, gain set to -12.5/25v, no high-pass filter, LBTI as the instrument)

   

_Note that if you stop/restart the real-time software while telemetry is being recorded (unpaused),    

the telemetry collection will go down and the telemetry daemon must be restarted and telemetry must be resumed on the ovms host with the command:_

sudo service ovms-svc stop/start



Configuration

The accelerometers are identified by slots and channels. 4 channels in each of our 12 slots. Table 5 in the OVMS Hardware Installation and Software Operation manual describes where they were installed. Keep in mind, it may not always be up to date because of troubleshooting repairs, etc. The slot/channel info maps directly to software configuration:

  • The ovmsMonitor.cfg file (in /lbt/ovms/current/etc/) tells the applications running on the ovms Linux box which slot/channel is which accelerometer. This file is used by the ovms monitor GUI as well as telemetry. This file does not influence the multicast data.
    This file references the channels by slot and number.
    This file must be modified if hardware is modified.

  • The config.txt file on the ovms-uei machine tells the OVMSplus prediction software which channels to use in the calculations.
    This file references the channels by incremental numbers, 1 through 44. Slot 1, channel 1 is 1; slot 2 channel 4 is 8; etc.

  • The sensitivities.txt file on the ovms-uei machine tells the OVMSplus prediction software which gain factors to use in the calculations for particular focal station geometries (specified with the -P command line argument). Beware that focal station geometries are not applied in Mode 3 (see IT 7486).
    Copy in a modified version with scp sensitivities.txt root@ovms-uei: and then restart ovmsrtService.

Big OVMS Monitor Displays

Use the INDI ovms page to see the OVMS+ graphs in real-time.

We don't run the OVMS monitor on the big displays anymore. See older versions of this page if you want the historical info.

Telemetry

The OVMS telemetry service typically runs as a daemon, on reboot: ovmsTelemetry --daemon
Or can be started by: sudo /etc/init.d/ovms-svc [start|stop|restart|status] (why does it have to be root?)

The data is always broadcast by the OVMS application (for use by clients such as the ovmsMonitor). When the ovmsTelemetry service is running, telemetry collection can be run or paused using the ovmsTelemetryClient. Typically the service is left running and there is a cron job which pauses and resumes the collection using the ovmsTelemetryClient so that it only runs at night (0:00 UT to 14:00 UT; 5:00 pm - 7:00 am local time).
The cron job looks like this (in ovms user ):
SHELL=/bin/bash
OVMS_TEL_CLIENT_CONFIG=/lbt/ovms/current/etc/ovmsTelClientConfig.cfg
00 00 * * * /lbt/ovms/current/bin/ovmsTelemetryClient -r > /dev/null 2>&1
00 14 * * * /lbt/ovms/current/bin/ovmsTelemetryClient -p > /dev/null 2>&1


[ovms@ovms]$ ovmsTelemetryClient -h
Client for the OVMS Telemetry Interface Service (Ver. 0.6.1)
Usage :  ovmsTelemetryClient [-Option] [  ]
  -p (pause) : Pauses the archive of the vibration data into the Telemetry system.
  -r (resume) : Resumes the archive of the vibration data into the Telemetry system.
  -s (getStatus) : Shows the status of the OVMS Telemetry interface.
  -f Freq (setSamplingRateHz) : Sets the Sampling rate to archive the vibration data into Telemetry.
  -x (shutdown) : Shutdown the OVMS Telemetry interface. This may take some seconds...
  -h (help) : Shows the help page.(See also: man ovmsTelemetryClient)

You can run the ovmsTelemetry manually with logging enabled to get more information if there's a problem:

  login to ovms as ovms
  > ovmsTelemetryClient -s         to check status
  > ovmsTelemetryClient -p         to pause collection if it is running
  > ovmsTelemetryClient -x         to stop the ovmsTelemetry daemon
  > cd /lbt/ovms/current/etc
  > edit ovmsTelConfig.cfg, setting TelService.Logger.Level to 2
  > cd
  > ovmsTelemetry &                    to run the ovmsTelemetry daemon so that it prints debug/error/trace messages
  > ovmsTelemetryClient -r             to resume telemetry collection and see what the problem is

Make sure the /tmp/telemetry_buf file is writeable by the ovms user; this directory is specified in the ovmsTelConfig.cfg file for the ovmsTelemetry daemon

Reduced rate Telemetry

Telemetry collection normally runs at 1kHz. Kellee provided these instructions on 20-Dec-2017.

  I ran a test of the OVMS telemetry collection at 500Hz.
This applies only to the HDF5 telemetry collection, not the real-time
broadcast data.

The OVMS telemetry collection frequencey defaults to the minimum rate
in the configuration file.  A config file change, and a bounce of the
telemetry daemon will make it run at the lower rate. I tested it
today, but put the configuration back to minimum 1KHz after my test.

To default to 500Hz:
   1. pause telemetry and stop the daemon - ovmsTelemetryClient -x
   2. change the file  /lbt/ovms/current/etc/ovmsTelConfig.cfg
       TelService.Sample.MinFreq=1000  to  500
   3. restart the daemon - sudo service ovms-svc start
   4. restart telemetry collection - ovmsTelemetryClient -r 

Real-Time Multicast

Note ssh root@ovms-uei may not work from any location. Login to ovms first and ssh from there.

To stop/start the real-time service:

ssh root@ovms-uei.mountain.lbto.org
~ # ./ovmsrtService -f 1000 -d 12 -D 3.0 -H 0 -G 0 -P LBTI-noM1 -E 2 &



==== ovmsrtService version 1.0.3
#Board is set to 12
Estimation mode set to 2 (5 estimation channels)
#Board is set to 12
There are 4 channels specified:   0   1   2   3 
Multicast IP Address is set to 192.168.53.62
RT Service Listen Port is set to 4321
Operation Frequency is set to 1000.000000 Hz
Status Package Period is set to 2
High-Pass filter is set to 0
Gain is set to 0
Verbose level set to 0
ICP current set to 5 mA
Prediction Horizon is set to 0.000000 ms
Estimation mode set to 2 (5 estimation channels)
Configuration "LBTI" used for focal plane sensitivities

Mirror Data initialized
~ # Estimator initialized
Estimation successfully initialized: k=0.98  dt=0.00  T_delay=0.000000
Opening the datagram socket...OK.
Adding multicast group...OK.
Disabling the loopback...OK.
Setting the local interface...OK
Setting the socket priority...OK
Setting the socket multi-cast TTL...OK
One shot mode: Setting task period to 1000000 ns (1 ms)

A ps command will show the process running:

 1098 root      9428 S    ./ovmsrtService -f 1000 -d 12 -D 3.0 -H 0 -G 0 -P LBTI-noM1 -E 2


Options for use in the real-time service:
usage: ./ovmsrtService [options]

        -h : display help
        -d n : selects the number of devices to use (default: 1)
        -f nnnn : set the rate of the UEI operation ([1-5000]Hz, default: 1000 Hz)
        -c "x,y,z,..." : select the number of channels to use (default: 4 channels)
        -s n : set the period to send the status package (default: 2 sec)
        -i "xxx.xxx.xxx.xxx" : set the multicast ip-address of the UEI device (default: 192.168.53.62)
        -p n : set the port of the UEI RT Service  (default: 4321)
        -a n: Set the ICP current value in mA (default: 5mA)
        -G n : set the Gain of the input cards (default: 0) where            
                 n = 0 -> Gain 1 (-12.5v/25v)           
                 n = 1 -> Gain 2 (-12v/12v)           
                 n = 2 -> Gain 5 (-5v/5v)            
                 n = 3 -> Gain 10 (-2.5v/2.5v)
        -H n : set the High-pass filter of the input cards (default: 0) where             
                n = 0 -> DC Coupling             
                n = 1 -> 0.1 Hz             
                n = 2 -> 1 Hz             
                n = 3 -> 10 Hz
        -E n : selects the OPD Estimation Mode (default: 0), where            
                 n = 0 -> no estimation           
                 n = 1 -> estimate differential opd, and focal plane motion (3 values)           
                 n = 2 -> estimate differential opd and SX/DX focal plane motion (5 values)           
                 n = 3 -> estimate displacement and rotation around x- and y-axis for each mirror separately (18 values)
        -v n : Set the verbose mode on for debugging purposes where             
                n = 0 -> Debug messages off             
                n = 1 -> Debug messages on
        -D n : Set the prediction horizon for the delay compensation (in ms, default: 0.0)
        -P n : Set the instrument for focal plane sensitivities configuration to use from "sensitivities.txt" (default: n=LBTI)

If you see the error:
Error -10 Initializing Communication with IOM

It means the real-time system is already running.

To shutdown the racktangle:

ssh root@ovms-uei.mountain.lbto.org
~ # poweroff

Hardware Issues and History

Sep-2017 IT6759 Phil, SX - but data looked bad on DX_M2, but all channels were reading incorrectly from slots 7 and 8
31-Aug-2017 IT6735 All the data from racktangle cards 7 and 8 are railed at +-25V.
That means that DX M2 is not contributing anything to the OVMS+ predictions (4 of its 5 accelerometers read zero).
  IT6736 The OVMS+ SX Tip-Tilt prediction is full of noise. However, the only SX accelerometer channel that is obviously noisy is what telemetry calls SX_M3_3 which should be card 3 slot 2 (but it's configured to read slot 3, channel 1 which has nothing plugged in to it). The other four SX_M3 channels are OK. The channel that telemetry calls SX_M3_5 looks normal even though that cable isn't plugged in (because it's reading slot 3, channel 2 which was labeled M3-3).
30-Mar-2017 IT6496 swapped the slots of DX_M2_3 and DX_M2_2 because racktangle card 8 slot 3 was bad.
14-Apr-2016 IT5943 card 3 slot 1 was bad (why nothing is plugged in to it now).
SX_M3_5 is hanging, SX_M3_3 is slot 3, channel 2
Mar-2016 IT5911 swapped DX M2 2 and 3
Mar-2016 IT5912 slot 7 bad
Dec-2014 IT5423 slot 3, channels 1 and 2 red, tightened

Multicast Tools

There are some tools available to "sniff" for simulated or real data. They are installed in /lbt/ovms/current/bin .

For example, the script sniffUEILBT1kHz.sh does this:

 ovmsSniffer -i 192.168.53.60 -f 1000 

where 192.168.53.60 is the IP address of the ovms machine.

And the similar script for simulation, called sniffSimulation1kHz.sh does this:

 ovmsSniffer -i 127.0.0.1 -f 1000

Troubleshooting Notes from User Manual

The following is from the MPIA user manual:


Is the UEI real time service running?
Login into the ovms-uei racktangle:
 > ssh root@ovms-uei

Check if the real-time service is running by typing:
# ps

If you see something like the following, the service is running:
 1098 root      9428 S    ./ovmsrtService -f 1000 -d 12 -D 3.0 -H 0 -G 0 -P LBTI-noM1 -E 2

Which version of the software is running?
This information can be retrieved from the help menu of the Monitor GUI.
Additionally, all the command line tools provide the version when there are invoked with the -h parameter.


Where are the OVMS configuration files located?
All configuration files for OVMS are installed in /lbt/ovms//etc
The corresponding environmental variables are pointing to these files.


Can I see the status of the OVMS telemetry service?
There are two ways, from the Monitor GUI or from the command line, using:
> ovmsTelemetryClient -s

To learn more about this command type
> man ovmsTelemetryClient


Why there are no alarms given in the monitor GUI?
It might be that the specified thresholds are simply not exceeded or that the threshold monitoring is not switched on.
Note: The PSD thresholds are only checked in detail when the related PSD check box in the monitor GUI is enabled. For all other active accelerometers there is the option (via menu bar, see earlier) to enable automatic threshold checking with much less data points. This is due to performance reasons.


Where do I change the thresholds?
The thresholds are specified in the ovmsMonitor.cfg configuration file. A detailed description of these configuration files can be found in 509g503 - Development Reference


Why is there no logging information in the system log?
The system logging for the monitoring GUI needs to be enabled in the ovmsMonitor.cfg configuration file. See 509g503 - Development Reference for more details.


Telemetry Service: Timeout

[ERROR]: WorkerThread::Exception: [Error] DataBroker [::recv()] Error reading
from socket. No data. Too many timeouts.

This means that the telemetry service can not contact the UEI racktangle nor the UEI simulator due to a network issue and has timed out.
  • Make sure that the real-time service/simulator is properly running.
  • Check the configuration files to make sure the IP address and ports arecorrect.
  • Make sure the IGMP protocol of the involved computers is version 2.0.
Note:The service keeps running (but in TIMEOUT state). After fixing the problem, the service can be resumed.


Telemetry Clients: UEI Connection Timeout

[ERROR]: Telemetry connection with UEI has timeout.

This message is returned to the clients when the service has timed out due to network issues. See above. Once detected the issue, try to resume the service from the GUI or doing

 > ovmsTelemetryClient -r

Telemetry clients: LBT Telemetry exception

[ERROR]: LBT Telemetry is not working. Once the problem is fixed restart the
OVMS Telemetry service.

A LBT Telemetry exception occurred and the system could not recover. The service must be restarted.


Telemetry client: Unknown exception

[ERROR]: Telemetry service is in an error state. Try to resume it or restart
the OVMS Telemetry service.

An unknown exception occurred and the system could not recover. The service must be resumed or restarted.


Telemetry client: System paused

[INFO]: Telemetry is paused.

The telemetry service is working, but paused. Therefore, the issued command can't be executed. To resume acquisition use the GUI or type:

 > ovmsTelemetryClient -r


Telemetry client: Connection refused

[ERROR]: Connection refused, Telemetry service [ovmsTelemetry] is not accepting
connections or it is not running. 

The service is not running/responding and has to be restarted.
I Attachment Action Size Date Who Comment
DisplayControl.pngpng DisplayControl.png manage 43 K 02 Feb 2015 - 16:59 UnknownUser Displays php page to start/stop big displays in Tucson and on the mountain
Displays.pngpng Displays.png manage 18 K 03 Jun 2014 - 22:30 UnknownUser Displays php page to start/stop big displays in Tucson remote ops
OVMSMonitor-mountain.pngpng OVMSMonitor-mountain.png manage 334 K 06 May 2014 - 17:38 UnknownUser Snapshot of mountain OVMS display with shortcut to ovmsMonitor tool
OVMSRacktangleSlotsShowingLEDs-sm.jpgjpg OVMSRacktangleSlotsShowingLEDs-sm.jpg manage 184 K 20 Sep 2017 - 17:47 UnknownUser RACKtangle slot showing LEDs
UEI-AI-211-manual.pdfpdf UEI-AI-211-manual.pdf manage 1 MB 05 Nov 2015 - 15:32 UnknownUser UEI Vibration Sensor Interface Boards User Manual
ovms-client-passwordprompt.pngpng ovms-client-passwordprompt.png manage 6 K 28 Aug 2014 - 18:22 UnknownUser ovms-client password prompt
Topic revision: r56 - 13 Sep 2022, JohnHill
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback