upcoming: adsec - AnemomterMon failed connections: test manual connections to Moxa

  • Hardware used: connected to DX anemometer Moxa while the DX adsec was removed and no processes were running on dxadsec.
  • Software versions used:
    • (vm): stable, only AnemometerMon and msgdrtdb need to run
    • (vm): SOUL-anemometermon-catch-aoexception, only AnemometerMon and msgdrtdb need to run
  • SW: Brandon

Background

This is a duplicate of AOSWTest-202207XX-anemometermon, except we will urn the test using sxadsec and the SX anemometer. Brandon will prepare the new AnemometerMon process to be run from a test directory, and if there are any problems, the old AnemometerMon process can be restarted from the system processes GUI.

ao-supervisor issue #144 describes a frequent problem where AnemometerMon will repeatedly fail to connect to the Moxa/anemometer until the Moxa is reset. AdamHousekeeper will then trigger TSS if the most recent wind speed value was received from AnemometerMon too long ago. This has been recorded in 5 ITs, potentially for different reasons, which are listed in the issue description. PR 143 fixes a bug in Issue #46 where the TCP socket is not closed for every type of connection failure, so subsequent re-connections will probably fail because the Moxa will think the client (sxadsec or dxadsec) is still connected and only allows one client to be connected at a time. From the logs, I can see that the one exception type that does delete the connection has never been raised (see my comment in the pull request for details), so the connection has never been deleted on connection failures, which include failures to configure the device after successfully establishing the TCP connection.

Code changed (PR 143)

  1. The TCP connection is deleted and the connection flag is toggled off if there is any type of exception caught on AnemometerMon::Connect() or while requesting and reading a frame in Anemometer::Periodic(). It will be re-created the next time Connect() is called. Currently, this only happens if one specific type of exception is caught, which is not the type raised by TCP connection errors or errors configuring the anemometer.
  2. AnemometerMon::Connect() now properly catches AOExceptions, which include TCPExceptions (e.g. timeouts or connection refused) and the AOExceptions raised by Anemometer::setup() (errors configuring the anemometer) and will log the reason for the connection failure, which should help future debugging.
  3. The TCPConnect class will log when a socket is created, connected, and closed in anemometermon.L.*.log.
  4. AnemometerMon will log when the anemometer and character device (TCP connection) objects are deleted.

Tests

Before running these tests, make sure the SX adsec shell is rested.

Step Result
Attempt to reproduce the problem with the current AnemometerMon by resetting the Moxa.
1. Watch the anemometermon.L.* logs.  
2. Go to http://192.168.18.164/ and reset the Moxa ("Save/Restart"), which should interrupt the connection.  
3. Verify that the AnemometerMon log reports "Failed to connect" after an error reading the frame with no description of the failure. If it doesn't, skip to #5 to try power cycling the anemometer via the PDU instead.  
4. Verify that the AnemometerMon continues trying to re-connect but fails. If it doesn't, skip to #5 to try power cycling the anemometer via the PDU instead.  
5. If TSS is activated, rest the shell.  
Attempt to reproduce the problem with the current AnemometerMon by power cycling the Moxa and anemometer.
6. Go to http://192.168.52.72/ Device Manager -> Control, select the SX-3D-Anemometer, choose the control action "Off Immediate", and click Next to power off the SX anemometer. Wait about 5 seconds before turning it back on with "On Immediate."  
7. Verify that the AnemometerMon log reports "Failed to connect" after an error reading the frame with no description of the failure. If it doesn't, skip to #14.  
8. Verify that the AnemometerMon continues trying to re-connect but fails. If it doesn't, skip to #14.  
9. If TSS is activated, rest the shell.  
Try increasing the maximum number of connections.
10. Kill AnemometerMon.  
11. Reset the Moxa to clear the old connection if there is one.  
12. Increase the maximum number of connections to 2 under "Operating Settings -> Port 1."  
13. Start AnemometerMon.  
14. Reset the Moxa again to interrupt the connection.  
15. Check the log to see if this still results in re-connection failures or if the re-connection succeeds. If it succeeds, check the Moxa configuration page to see if the first connection is removed when it does connect.  
16. If TSS is activated, rest the shell.  
Try the new AnemometerMon.
17. Kill AnemometerMon.  
18. Set the maximum number of connections back to 1.  
19. Reset the Moxa to clear the old connection if there is one.  
20. Start the new AnemometerMon process. Run "~/soul/test.[tbd].anemometer/aoroot/bin/AnemometerMon -i anemometermon"  
21. Reset the Moxa again.  
22. In the log, see if the AnemometerMon either repeatedly fails to connect and reports the reason or that it successfully reconnects.  
23. Power cycle the anemometer again via the PDU.  
24. In the log, see if the AnemometerMon either repeatedly fails to connect and reports the reason or that it successfully reconnects.  
25. If TSS is activated, rest the shell.  
See if the TCP alive check time works in TCP server mode.
The "TCP alive check time" under "Operating Settings -> Port 1" in the Moxa configuration is set to 7 minutes. This will automatically disconnect a host that hasn't responded in 7 minutes. However, it's unclear whether this parameter is only used for TCP Client Mode (Moxa sends data to host) or for both client mode and server mode, which is what we are using (clients request data from Moxa and Moxa replies). If the TCP alive check timeis used in server mode, it's possible that the re-connection attempts are resetting the timer, since on 2022-06-24, the connection had been failing for at least 30 minutes.
26. Kill AnemometerMon with a hard kill -9  
27. Wait to see if the Moxa clears the entry from the list of connected clients. If it takes less than 7 minutes, skip to #29.  
28. Reset the Moxa to clear the old connection if there is one.  
29. Decreasing the TCP alive check time to 1 minute.  
30. Start AnemometerMon (new or old, or a manual telnet session to 192.168.18.164)  
31. Kill AnemometerMon with a hard kill -9  
32. Wait to see if the Moxa clears the entry from the list of connected clients after the specified amount of time.  
33. Set the TCP alive check time back to 7 minutes.  
All done.
34. Reset the Moxa and make sure TCP alive check time = 7 minutes and the maximum number of connections = 1.  
35. If the new AnemometerMon is running, kill it (Ctrl-C) and restart the old AnemometerMon process via the system processes GUI.  
36. Rest the shell.  

Logs:

Topic revision: r1 - 08 Sep 2022, BrandonMechtley
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback