Special diagnostic session on AdSec672a

List of operations

aospare PC reboot and ao_restart (in order to fully reset any softwareproblem)

10:30 power on --> error found on crate 1 board 6: wrong reading of coil currents and temperature.

logic SW version verified on crate1 board 6, through EngPanel

11:45 currents on crate1 board6 are now ok (temperatures still wrong)
11:50 currents on crate1 board6: wrong values

12:00 currents and temperatures ok: instability of crate1 board6 readings
12:40 fastlink alignment test failed...
power off, sleep(120), power onpower on
fastlink alignment ok

12:49 Error reading BCU. viewing the MIRRORCTRL00.log
MIRRORCTRL00 |WAR| 3790847|2009-10-09 12:49:24.601574| UDPCONNECTION > UdpConnection: received -1 bytes instead of 1470
MIRRORCTRL00 |WAR| 3790848|2009-10-09 12:49:24.601595| UDPCONNECTION > UdpConnection: recovering from EAGAIN error in recvfrom() - teration MIRRORCTRL00 |WAR| 3790849|2009-10-09 12:49:24.603744| ROUNDQUEUE_BCU-2 > UnexpectedBcuPacketException:recoverUnexpected
MIRRORCTRL00 |WAR| 3790850|2009-10-09 12:49:24.603759| ROUNDQUEUE_BCU-2 > UNEXPECTED id 97: recovering...

then (this is the typical output when communication is failing):

MIRRORCTRL00 |DEB| 3796235|2009-10-09 12:52:57.258039| BCUCOMMANDHANDLE > --> Waiting reply...
MIRRORCTRL00 |ERR| 3796236|2009-10-09 12:52:57.258382| MAIN > ****** FAILED PACKET FROM BCU 6 ******
MIRRORCTRL00 |ERR| 3796237|2009-10-09 12:52:57.258433| MAIN > FAILED PACKET FROM BCU 5 **
MIRRORCTRL00 |ERR| 3796238|2009-10-09 12:52:57.258476| MAIN > ***
FAILED PACKET FROM BCU 4 **
MIRRORCTRL00 |ERR| 3796239|2009-10-09 12:52:57.258510| MAIN > ***
FAILED PACKET FROM BCU 7 **
MIRRORCTRL00 |ERR| 3796240|2009-10-09 12:52:57.258543| MAIN > ***
FAILED PACKET FROM BCU 8 *******
MIRRORCTRL00 |ERR| 3796241|2009-10-09 12:52:57.258582| BCUCOMMANDHANDLE > BcuCommandHandler::handleBcuReply: REQUEST COMPLETED but FAILED :-(
MIRRORCTRL00 |DEB| 3796235|2009-10-09 12:52:57.258039| BCUCOMMANDHANDLE > --> Waiting reply...
MIRRORCTRL00 |ERR| 3796236|2009-10-09 12:52:57.258382| MAIN > ****** FAILED PACKET FROM BCU 6 ******
MIRRORCTRL00 |ERR| 3796237|2009-10-09 12:52:57.258433| MAIN > FAILED PACKET FROM BCU 5 ***
MIRRORCTRL00 |ERR| 3796238|2009-10-09 12:52:57.258476| MAIN > ****
FAILED PACKET FROM BCU 4 ***
MIRRORCTRL00 |ERR| 3796239|2009-10-09 12:52:57.258510| MAIN > ****
FAILED PACKET FROM BCU 7 ***
MIRRORCTRL00 |ERR| 3796240|2009-10-09 12:52:57.258543| MAIN > ****
FAILED PACKET FROM BCU 8 *******
MIRRORCTRL00 |ERR| 3796241|2009-10-09 12:52:57.258582| BCUCOMMANDHANDLE > BcuCommandHandler::handleBcuReply: REQUEST COMPLETED but FAILED :-(
trying to identify the failed DSP --> this is done using the command:

print, read_seq_dsp(nn,'80000'xl,1l, buffer, /bcu, /UL) explanations: nn=DSP# to read, 80000xl= physical address to be read, 1L (or 8L) number of digital words to be read and to be stored into buffer variable)

failed DSP:48,49, 50, 51,68,69, 96,97,124, 125, 152, 153 (both for 1L and 8L reading)

ao_restart, to verify wheter this is induced by a software error. DSPs are not responding the same

error found in MIRR00001253636210, 2009-09-22, 18:05:21 EAGAIN, communication error trapped by SOCKET (insert the logfile line to show the error


14:00 fastlink alignment test: during 1 hour test no errors detected.

Hypothesis: the errors are induced by some calls during the startup procedure, that implies also the SIGGEN configuration, the sleep4update() and the interrupt managing. the fastlink alignment procedure is tested after run the cited processes. no error is found after multiple itarations

15:39:59 (view in IDLCTRL.log)

fsm_reset: adam is not responding to commands; adam responds to ping after 2000ms

16:23:21: trying to set the shell
DSP errors: # 50,51

16:31:35 system shutdown commanded by the HOUSEKEEPER. induced by a over temperature (false probe reading)

HOUSEKEEPER |WAR| 1455367|2009-10-09 16:31:35.512956| FUNCTEMERGENCYST > DSPPowerTemp-0008 = 135.25 [/home/adopt/work/AO/Supervisor/DiagnApp/Funct.h:258]
HOUSEKEEPER |INF| 1455368|2009-10-09 16:31:35.513027| ADAM > Adam: disabling coils...
HOUSEKEEPER |DEB| 1455369|2009-10-09 16:31:35.513069| ADAM > Command to Adam: #011300
HOUSEKEEPER |DEB| 1455370|2009-10-09 16:31:35.513640| ADAM > Answer from Adam: 01^M#011300
HOUSEKEEPER |INF| 1455371|2009-10-09 16:31:35.513675| ADAM > Adam: coils succesfully disabled
HOUSEKEEPER |INF| 1455372|2009-10-09 16:31:35.513687| ADAM > Adam: disabling main power...
HOUSEKEEPER |DEB| 1455373|2009-10-09 16:31:35.513702| ADAM > Command to Adam: #011700
HOUSEKEEPER |DEB| 1455374|2009-10-09 16:31:35.514303| ADAM > Answer from Adam: 01^M#011700
HOUSEKEEPER |INF| 1455375|2009-10-09 16:31:35.514314| ADAM > Adam: main power succesfully disabled
HOUSEKEEPER |ERR| 1455376|2009-10-09 16:31:36.016717| FUNCTEMERGENCYST > Couldn't close hexapod brakes. Error -5001 [/home/adopt/work/AO/Supervisor/DiagnApp/Funct.h:274]
HOUSEKEEPER |ALW| 1455377|2009-10-09 16:31:36.016879| ARB-INTERFACE > My level is 6 (INF)


to avoid future probe errors the housekeeper.param configuration file has been modified

18:00 (...) capsens miscalibration due to a (real) overheating, induced by a cooling system failure.

2009_10_12

10:00 power on
10:01 fastlink alignment passed
10:02 load program failed
10:04 test_fibra passed il problema e' nella keyword /stop che chiama un differente programma di mario: con la keyw il test va sempre, senza keyw nn va mai (verificato dopo che questo statement non e' vero,vedi di seguito)

--fastlink alignment per 1 sec e poi 10:08 read_seq_dsp(sc.all, '8000'xl, 20L, bb) per torvare errore
trovato errore scheda 25



10:21 load_program fallito pesantement, power off then power on again

10:43 load_program failed, then fsm_reset(), then load_program succedeed.

10:43-->11:07 fastlink alingment and test_fibra passing always, load_program never. looking for differences in scripts. INIT_ADSEC_SDRAM is the only difference between startup and test_fibra

verifica con MG:

scheda 11 crate 1= scheda 25 (watchdog dsp): SN 146 tutto ok da tabella
scheda 6 crate 1=scheda 20 (spi problem) : SN 141 : sostituiti 2 componenti che non hanno influenza con gli ADC

spostata la lettura della sdram in proc_startup alla fine

11:53 load program passed

power_off, then power on to test load_program. passed

verifica con MARIO: se si ferma la SIGGEN tutti i crate danno errore DSP Watchdog

FSM: recovery: dare ad alfio una funzione test per vedere se il sistema e' spento

13:43 shell ripped

IDLCTRL00 |ALW| 6687|2009-10-12 13:41:31.509293| MAIN > During the gain raising: pos. amplitude exceeded max level at 155
1
FSM_SET_FLAT: -----------> ERROR ON SETTING PROCEDURE!!! -10001

IDLCTRL00 |ALW| 6688|2009-10-12 13:41:31.509346| MAIN > % RAMP_GAIN: -----------> Erron ramp control loop gains: the gain was reset
to the initial gain.

IDLCTRL00 |ALW| 6689|2009-10-12 13:41:31.511336| MAIN > #0100B7

IDLCTRL00 |ALW| 6690|2009-10-12 13:41:31.511369| MAIN > % Temporary variables are still checked out - cleaning up...

IDLCTRL00 |ALW| 6691|2009-10-12 13:41:31.517291| MAIN > Coils succeffully disabled.
EndCmd

IDLCTRL00 |ALW| 6692|2009-10-12 13:41:31.517333| MAIN > % Temporary variables are still checked out - cleaning up...

oscillazione attuatore 155, su valori possibilimente piu' bassi di quelli ammessi dalla fast --> la fast resta in piedi es.

FAST > Processed 70 cycles of 7637 vars in 0.914 s ( 76.6 Hz......................

ma il programma disabilita i coi ma il programma disabilita i coil attuatore 115: crate1 scheda1 attuatore 115: crate1 scheda0 ---+ Cooling system failure 15:00 power on, fastlink_alignment: never passed, DSP 50,51 error. 2009_10_09 18:30: fridge is found completeley frozen and stopped. the cooling system is powered off (pumps, heaters, fridge) 16:05 old logic flashed on board 24,25(crate1, board10,11) ver 6.03 (info: the pc network board must be configured using an ipmatching with the same subnet as the secondary, for instance 192.168.1.139.... check in /etc/hosts for existing addresses) 2009 10 10 11:00 cooling circuit is found ice-free ad is powered on to let the compressor warm up. after warm up time, the fridge is switched. 16:29 fastlink_alignment failed, even withold logic installed. DSP #50,51 errors. fridge temperature reaches the setpoint (2°C) in few minutes. ice forms and immediately melts at any fridge cycle trying to work together with W friends: 16:00 bench circuit switched on, pump speed=80% then 100% to avoid fast cooling. see plots for thermal behaviorof the system. 17:05 rip due to #0 misreading (possibly accumulator error?) everything seems to work fine FASTDGN00 |WAR| 24646|2009-10-12 17:05:02.075655| FUNCTEMERGENCYST > ChDistAverage-0000 = 0.00163316 [/home/adopt/work/AO/Supervisor/DiagnApp/Funct.h:258]
FASTDGN00 |INF| 24647|2009-10-12 17:05:02.075699| ADAM > Adam: disabling coils...
FASTDGN00 |DEB| 24648|2009-10-12 17:05:02.075719| ADAM > Command to Adam: #011300
FASTDGN00 |DEB| 24649|2009-10-12 17:05:02.076288| ADAM > Answer from Adam: 01^M#011300
FASTDGN00 |INF| 24650|2009-10-12 17:05:02.076301| ADAM > Adam: coils succesfully disabled

---++ NUOVA INTMAT 18:40 power_off, power_on

Misurata nuova intmat B1 perche' la vecchia aveva forte instabilita' sul lato sx gia' a G=0.5
REC 20091012_181115
questa a gain 0.5 oscilla appena appena, poi a 0.6 mostra instabilita' vistosa sempre sul lato sx della pup.
fastlink_alignment failed *CLOOP* <_181931>
_182011
IRTC dark = <_182710>
---+ Cooling system failure IR exp =50000us
IR n frames = 100
Bin = <1>
LoopHz = <800>
Pup = <tracking#>
Dist =<OVFREQ800.>
Gain = <0.5>
Mod = 2 l/D
Rec = <REC_20091012_181115>
SN = zero
FW1 = <600>
FW2 =
Phot = 1800 phot/subap
Staged: <OFF>
Nota = <................>
2009_10_09 18:30: fridge is found completeley frozen and stopped. the cooling system is powered off (pumps, heaters, fridge) -- RunaBriguglio - 2009-10-102009 10 10 11:00 cooling circuit is found ice-free ad is powered on to let the compressor warm up. after warm up time, the fridge is switched.

fridge temperature reaches the setpoint (2°C) in few minutes. ice forms and immediately melts at any fridge cycle

16:00 bench circuit switched on, pump speed=80% then 100% to avoid fast cooling. see plots for thermal behaviorof the system.

everything seems to work fine

-- RunaBriguglio - 2009-10-10
Topic revision: r7 - 12 Oct 2009, RunaBriguglio
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback