2022-04-25 - verify AbstractSystem::_cmdResult thread safety bug


Software testing queue

Servers and software versions used:
sxadsec: stable
lbti-sxwfs: stable

Script used: AOeng@sxadsec:~/scripts/tests/bm.20220425.idlsystem-thread-safety/save_pause_resume.sh

Problem: While investigating interrupted offsets, I ran a script to send Ice commands to the AdSec Arbitrator and WFS Arbitrator in two processes, which crashed the AdSec Arbitrator. After some debugging and inspecting code, I suspect the issue is with thread safety of the command result from IdlSystem (AbstractSystem), as it occurs between sending wfs.saveOpticalLoopData (which calls adsec.savedata) and adsec.Pause in two separate threads without any pause in between. IdlSystem has a mutex for executing the command, but there is no mutex for the command result, which gets re-initialized immediately when the command begins.

Test: To verify that this could be a thread safety issue, I can try adding a 2s pause between wfs.saveOpticalLoopData and adsec.PauseAo to check that it no longer makes the AdSec Arbitrator crash.

Result (Log):
  • Everything worked as expected. Without the sleep between wfs.saveOpticalLoopData and adsec.PauseAo, the AdSecArbitrator crashed as before. With the sleep, the AdSecArbitrator did not crash after 3 iterations.
  • I'll make a pull request for the fix that we can test another time.
-- BrandonMechtley - 25 Apr 2022
Topic revision: r2 - 19 May 2022, BrandonMechtley
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback