LBTO INDI Troubleshooting

Restart a Driver

The indiserver processes will restart drivers they control if it goes down. So, to bounce a driver, just ps to find it and then kill it.

For instance:
[web@web1 tmp]$ ps -ef | grep indi
web       2518     1 10 21:43 ?        00:03:33 /web/statserv/cgi-bin/indi.fcgi +age=2 +reconnects=100 +log=/web/server/logs/indi.fcgi.log +host=localhost:7624
root      4032     1  0 21:48 ?        00:00:00 /usr/bin/su - web /web/modules/INDI/bin/runindi
web       4034  4032  0 21:48 ?        00:00:00 -bash /web/modules/INDI/bin/runindi
web       4059  4034  2 21:48 ?        00:00:49 /web/modules/INDI/bin/indiserver -l /web/modules/INDI/logs/IS ./indialh LBTO@192.168.3.17:7630 OVMS@192.168.53.60
   flao_sx_ccd39@192.168.39.56 flao_sx_ccd47@192.168.39.56 flao_sx_wfs_msgd@192.168.39.56 flao_sx_wfsbcu@192.168.39.56 flao_sx_adsec_msgd@192.168.39.56 
   flao_dx_ccd39@192.168.39.57 flao_dx_ccd47@192.168.39.57 flao_dx_wfs_msgd@192.168.39.57 flao_dx_wfsbcu@192.168.39.57 flao_dx_adsec_msgd@192.168.39.57
web       4070  4059  1 21:48 ?        00:00:30 ./indialh

You can see from this ps command that the indiserver here is using one local driver (indialh) and several on other machines (LBTO, OVMS, flao_xxxx).
If you kill process ID 4070, the indiserver here will restart it.

The drivers running on LBTO, OVMS, AO machines can be bounced on those machines, if necessary.

Stop a Server running on the web cluster in Tucson:

To kill off the indi server running from the cluster (as root from the machine running "web.tucson"):
  • Type: "pcs resource disable web.tucson-indi", and then "pcs status" should show that it is disabled.
  • Find the INDI processes and kill them individually -- "systemctl stop indi" does not seem to kill everything.

Start a stopped Server running on the web cluster in Tucson:

  • Type: "pcs resource enable web.tucson-indi", and then "pcs status" should show that it is enabled and started (after a bit).

Restart a Server

To restart an indiserver process on:
  • web.tucson.lbto.org, you can kill it and the indi.service will restart it (using the /web/modules/INDI/bin/runindi script)
  • flao-dxwfs.mountain.lbto.org or flao-sxwfs.mountain.lbto.org ...
  • ovms.mountain.lbto.org
  • the LBTO properties are written from an INDI thread in DDS, there's no INDI server there to restart

Note: If you modify the runindi script, you have to reload it in the systemctl. Otherwise, it just manages the process with the same version of the file it has loaded.
     sudo systemctl daemon-reload

Not sure if the service has to be stopped and restarted. When I did this today, because we did not have the flao_xx_wfsbcu drivers in the script, I had the service stopped when I did the daemon-reload.
  sudo systemctl stop indi.service
  sudo systemctl daemon-reload
  kill 2175                        to kill the indiserver running
  sudo systemctl start indi.service


[web@web1 IS]$ sudo systemctl status indi
 indi.service - Cluster Controlled indi
   Loaded: loaded (/etc/systemd/system/indi.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/indi.service.d
             50-pacemaker.conf
   Active: active (running) since Fri 2017-07-21 17:45:46 UTC; 2 days ago
 Main PID: 21624 (su)
   CGroup: /system.slice/indi.service
            21624 /usr/bin/su - web /web/modules/INDI/bin/runindi

Jul 21 17:45:46 web1.tucson.lbto.org systemd[1]: Started Cluster Controlled indi.
Jul 21 17:45:46 web1.tucson.lbto.org systemd[1]: Starting Cluster Controlled indi...
Jul 21 17:45:46 web1.tucson.lbto.org su[21624]: (to web) root on none

Notes

  • The indiserver on web logs to /web/modules/INDI/logs/IS
  • The indiserver process core dumps occasionally when the AO drivers drop out. It retries many times and eventually crashes.
  • What can we use the config files for?
  • Initially, when trying to chain to the AO virtual machine for indi_sx_ccd39@150.135.245.104 from web , we were getting No route to host complaints. I could ping the IP, but the driver wouldn't run. Stephen suggested a firewall problem and found that the firewall was enabled on the AO virtual machine.
  • What are the client numbers in the indiserver log? How do they map to the drivers?
  • Is apache fast enough for the AO GUI?
  • Over a long weekend in Jan-2017, the ovms driver went down. The only messages in the log are:
    2017-01-16T01:50:58.435: Client 5: new arrival from 150.135.245.233:51780 - hello!
    2017-01-16T01:51:37.395: from Client 3: read error: Connection reset by peer
    2017-01-16T01:51:37.398: Client 3: shut down complete - good-bye!
    2017-01-16T02:08:19.056: from Client 5: read error: Connection timed out
    2017-01-16T02:08:19.077: Client 5: shut down complete - good-bye!
    The host IP there is web2 (the Tucson host that we are running a chained server on)
  • Why did I have to add state elements? If LBTI code didn't have them, why would I need them? for instance, in the LBTO.Pointing property?
  • indidevapi.txt: man page for INDI device driver C-language reference API
Topic revision: r3 - 14 May 2018, StephenHooper
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback