Notes taken while troubleshooting Blue Filter Wheel 1 Error (IT 7294)

Following are some notes which I have been collecting during the troubleshooting of IT 7294.
  • Querying the status of a dataprobe outlet:
    • telnet 192.168.59.113
      • get outlets
    • on the CMU command line, type
      • powstatus bluefilt2:status
[lbccontrol@cmu current]$ powstatus bluefilt2:status 
[OFF] BFilt2
returning 0
or
[lbccontrol@cmu current]$ powstatus bluefilt1:status 
[ON ] BFilt1
returning 64

For LBCB, 192.168.59.113 port 6 is FW2 and port 7 is FW1. This is correct in bluechannel.conf but the testpower3 program and power.py programs think that 6 is FW1 and 7 is FW2. When using testpower3, go by the port number. I created a version of power.py (called power_opk_temp.py) in which I changed three lines. This version turns on/off the correct FW, as verified by powstatus and get outlets.
[root@cmu current]# diff power.py power_opk_temp.py 
115c115
<         self.set_title("LBC Power Control")
---
>         self.set_title("LBC Power Control (OPK version)")
180,181c180,181
<             self.bfil1_box = cPowerButtonFrame("Filter1", "bluefilt1", (blue1Status&0x40), blue)
<             self.bfil2_box = cPowerButtonFrame("Filter2", "bluefilt2", (blue1Status&0x20), blue)
---
>             self.bfil1_box = cPowerButtonFrame("Filter2", "bluefilt2", (blue1Status&0x40), blue)
>             self.bfil2_box = cPowerButtonFrame("Filter1", "bluefilt1", (blue1Status&0x20), blue)

But, although we can toggle power to the filter wheels, the LBC SW cannot. Every time we try to turn on LBCB, we get the error:
2018/06/13 22:43:59.398147 E B FILTERS  WHEEL#1  timeout error on check [src/filters/filters.c:2776]
2018/06/13 22:43:59.398202 E B FILTERS           bad communication with serial device [src/filters/filters.c:550]
2018/06/13 22:43:59.398217 W B FILTERS           system has been deactivated

line 2776 is in the CheckWheel function. It is:
 } else if ( rc == 0 ) {
      LogMessage( LOGLEVEL_ERROR, pW->szLogSourceID, "timeout error on check [%s:%d]", __FILE__, __LINE__ );
      retcode = 0;
where a few lines up, rc is defined as:
rc = select( pW->port+1, &rset, NULL, NULL, &delay );
CheckWheel returns retcode=0 if there is an error. For wheel 1, CheckWheel is called on line
rc = (int)CheckWheel( (void*)&pFS->Wheel[0] );
if ( !rc ) {
      LogMessage( LOGLEVEL_ERROR, pFS->szLogSourceID, "%s [%s:%d]", FiltersErrorMessage(FILTERSERROR_COMMUNICATION), __FILE__, __LINE__ );
      LogMessage( LOGLEVEL_WARNING, pFS->szLogSourceID, "system has been deactivated" );
      pthread_mutex_unlock( &pFS->access_mutex );
      FiltersUninitialize( pFS );
      return (void*)FILTERSERROR_COMMUNICATION;
    }
There is a "long" version of CheckWheel called WheelPosition, but maybe it is not used or needed.
//
// WheelPosition
//   this is identical code to CheckWheel, but the signature is such that it will 
//   return the position as a long. 
Too bad one cannot reverse the checks on FW1 and FW2. What would happen if we changed the ports in bluechannel.conf and tried to run testfilters or run up LBCB via the UI? Would it then successfully power up FW2 and fail on the CheckWheel of FW1?

IT 5654 describes the infamous swap of ports for Blue FW1 and FW2, which was done during startup 2015. In that case, the SW threw the same error when trying to check the status of FW1, but this time FW1 was port 6. But a workaround then was to turn on FW2 in advance and then, for some reason, the SW did not complain and both filter wheels came up. Since that time, the ports have been swapped in the bluechannel.conf file and all has worked well until now.
I tried to turn on FW1 in advance, and FW2 in advance. But neither situation worked. Each time, when I turned on testfilters or the UI, I saw the error about the FW1 checking.

[root@cmu current]# powstatus bluefilt1:status
[ON ] BFilt1
returning 64
[root@cmu current]# powstatus bluefilt2:status
[ON ] BFilt2
returning 32
[root@cmu current]# powstatus redfilt2:status
[ON ] RFilt2
returning 64
[root@cmu current]# powstatus redfilt1:status
[ON ] RFilt1
returning 32

-- %USERSIG{OlgaKuhn - 2018-06-14}%

Comments

As of 20180625, it appears that blue FW1 is still plugged into port 7 and FW2 is plugged into port 6.

- power.py had a bug (fixed 20180625) in the "refresh" that swapped the ports status between blue fw1 and fw2; however, it was turning on/off the correct ports

- testpower3 was never updated after the cables swapped, so it was reporting incorrectly that port 6 was FW1 (fixed 20180625)

- powstatus was correct for the mask value for the filter wheel outlets (power.py uses powstatus)

- swapping the config values for FW1 and FW2 in bluechannel.conf was a good test and helped us discover that FW2 powered up fine and FW1 was having a problem -- this ruled out the portserver as a problem.

-- %BUBBLESIG{KelleeSummers - 2018-06-25}%
 
Topic revision: r4 - 25 Jun 2018, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback