AO Support Call Log

Please add an entry for each Support Call received. Start with Date, Time, Type of Suport (Active Optics, Adaptive Optics), person receiving the call and a description of the reason for the call, the explanation or solution if found, or IssueTrak number created and course of action to be taken.

Please put the most recent at the top of this document.

All times are in Local Time

Thanks, Doug



27 Nov 2014, 10:00 pm, 20 minutes, SX Adsec, Juan Carlos

The TO called about an issue to set the shell, the shell was set at the bergining of the night and it rip. The TO tried top recover the system setting the shell again and doing adsc_stop/start and it didn't solve the problem.

The problem was that the actuator #229 jumped 0.2m. The actuator has been put in the list of bad actuator for position.

05 Nov 2014, 10:05 pm, 3 hours 20 minutes,DX AdSec, Doug

All offsets while the AO loop was running on the DX side would fail when the ResumeAO was requested. It was found that a gain file was missing. See IT5378

02 Nov 2014,20:05, 30 minutes,DX Adsec, Juan Carlos

The unit is in Panic mode and was in power off. The temperature values for the DSP crate 0 _board 8 were too high and the stratixtemperature was 58.4 C. The values of this set of temperatures values were jumping and when the stratixtemperature went above 50 C triggered the power off of the unit. After a adsc_stop/start and power on, the sensor started to read the right values. But a few minutes later the values went to zero and back to normal,

This fault has to be investigated and if the sensor is not working properly it should be disabled from the housekeeper configuration file.

housekeeper.R |ERR| 395874|2014-11-02 02:19:27.492119| FUNCTEMERGENCYST > DSPSTRATIXTEMP-0008 = 58.4375 [Funct.cpp:94]

An issue track is open: #5370

29 Oct 2014, 4:00pm, 15 minutes, DX AdSec, Juan Carlos

Steve called about an issue with the Adsec which it was in continous recover failure procedure adn the IDL license was missing.

I've checked the processes and the IDL was zombie, the elevation telemetry was not displayed in the Adsec.

To recover it, I told Steve to restart the AOS. After the AOS restart the Adsec was reading the elevation telemetry. The IDL process was restarted from the System Processes GUI.

The adsec was still in a skip frame status doing many cycles of recovery failure process. From the IDL terminal a command print,fsm_reset() was done and after the system did a recover failure went back to normal. The shell was SET and REST without any further problems.

The trigger of the failure has to be investigated. An issue track is open (IT#5367)

13 Oct 2014, 1:00am, 15 minutes, SX AdSec, Doug

Steve called after the SX AdSec failed twice while Set and observing. It looks actuator 450 is jumpy. I did not put it in the do-not-use list yet but will wait to see if it occurs again. Steve noted that the seeing was so bad that the observation continued while the AdSec performed a recover fail and then was Set.

fastdiagn.L        |ERR|   2707473|2014-10-13 06:28:56.762070| FUNCTEMERGENCYST > CHDISTAVERAGE-0450 = 2.08498  [Funct.cpp:94]
fastdiagn.L        |ERR|   2717875|2014-10-13 06:40:41.219423| FUNCTEMERGENCYST > CHDISTAVERAGE-0450 = 2.08498  [Funct.cpp:94]

4 Oct 2014, 7:52pm, 15 minutes, SX AdSec, Juan Carlos

The TO called about an issue to bring up the AdsecSX, the unit couldn't be power on. The problem came from a non startup of the housekeeper process. The housekeeper process was restarted but it didn't solve the problem. The solution was to adsc_stop/start to sync all the other process. The unit was power on and set without further problem. It has to be investigated what caused the rip.

4 Oct 2014, 12:00am, 1h35minutes,DX AdSec, Juan Carlos

The TO called about an issue of the communication with the AdsecDX. The check showed that the unit was loosing the Elevation telemetry, also from the remote connection from home was not too good. The connection with the adsec was slow, also the access to the logs directory was too slow. The LBTI had some issue to run the lbti web server. So I think that we had a communication (network) issue. Also we had two failures of the AdsecDX due to loosing the elevaiton telemetry and the AOS subsystem had to be restarted. The failure to have the telemetry has to be investigated. The AdsecDX was monitored and it didn't show any further problems after 2am.

1 Oct 2014, 8:09pm, 10 minutes, DX AdSec, Doug

David called and said the SX AdSec was continually performing a recover fail. He had tried three times to stop and restart everything, but to no avail. When I logged in all the adsc processed were not running. I started all processes, powered on, loaded program and Set the shell with no problems. Observing continued. I did find the error

fastdiagn.L |ERR| 1330978|2014-10-02 02:55:12.167633| FUNCTEMERGENCYST > emergencyReact = 1 [Funct.cpp:94]

but have not tracked down the original cause for the Rip and subsequent recover fails

8 May 2014,5:32am, x minutes, DX Adsec, Juan Carlos

Call from the mountain about status of the secondary DX in Panic state and it can not be recover. The error happened during the close dome process. This failure didn't cost any on sky time.

The failure happened just when the telesccope when to horizon to close the dome and back to zenith, the Adsec went into "pie shape". The pie shape didn't recover after several recover failure processess. The Unit is power off from 6:31 up to..

7 May 2014,11:27pm,11 minutes, SX Adsec, Juan Carlos

I received a call from the TO about that the AdsecSX can not be recovered.

The AdsecSX went to rip state triggered by the fastdiagnostic:

fastdiagn.L |ERR| 413058|2014-05-08 06:16:06.480642| FUNCTALARM > FunctAlarm CHINTCONTROLCURRENT-0352 -0.809165

fastdiagn.L |ERR| 413060|2014-05-08 06:16:06.486303| FUNCTEMERGENCYST > CHCURRAVERAGE-0352 = -0.8 [Funct.cpp:94]

The Adsec is in close loop and jump of the Act 353 occured in current.

The Adsec couldn't be restated properly because the housekeeper process didn;t come up and the system couldn't be recover properly.

jump in current of the act#353

3 May 2014,12:30am,5 minutes,SX Adsec,Juan Carlos

I've received a call from the instrument tech about a problem setting the shell. They were looking at the error messages in the TCS and the message reported that the shell could not be set because the elevation telemetry is not available.

This is a fault that it is most luckly that the AOS is not updating the elevaiton value to the Adsec control software.

I explain to them that restarting the AOS left it will solve the problem. The instrument tech stop/start the AOSleft and the command set from the AOS finished succesfull.

The shell could be set.

24 Mar 2014, 12:49 am, 5:00 hours, DX Adsec, Juan Carlos

Mountain called about an issue with the Adsec, the mirror was in a Panic state. The Panic state was due to the "Pie Shape" event.

The mirror could be recover after removing the act#561, but this didn't fix the problem, the pie shape came up again after the telescope moved from zenith to 40 degrees to make a test before to go back to the observations. The shell RIP about 70 degrees in elevation.

Armando was called by the Arcetri team at the telescope to look at the problem. We have been looking the failure and we couldn't find a solution. I told Armando that it maybe the accelerometer board is making the crate to fail and this is an issue that it has been discussed in the LBTO and we have been talking to do some test disconnecting the accelerometer board before the coming lbti run.

The unit is left with the process running to ensure the TSS is operational.

23 Mar 2014, 9:00 pm, 1:00 hours, DX Adsec, Juan Carlos

Mountain called me about 9pm about a problem with the AdsecDX, The mirror couldn't be recover and adsc_stop/start was done by the TO.

The status of the mirror was in PANIC and with a pie shape on crate #5. In the IDLCTRL reported misfunciton actuators of the crate #5, this error didn't allow to Load the program and it reported the gap with the Pie-Shape.

idlctrl.R |INF| 42681|2014-03-23 03:55:32.899535| MAIN > Found misfunctioning capacitive sensors

idlctrl.R |INF| 42682|2014-03-23 03:55:32.899612| MAIN > 560:00000177um;561:00000181um;562:00000162um;563:00000148um;564:00000191um;565:00000261um;567:00000183um;568:00000137um;569:00000148um;570:00000151um;574:00000151um;575:00000138um;576:00000154um;577:00000147um;578:00000158um;579:00000163um;580:00000154um;581:00000152um;582:00000157um;583:00000199um;587:00000135um;588:00000146um;589:00000161um;590:00000169um;591:00000283um;592:00000192um;593:00000182um;594:00000176um;596:00000248um;597:00000194um;598:00000176um;599:00000208um;600:00000143um;601:00000162um;6

idlctrl.R |INF| 42683|2014-03-23 03:55:32.899659| MAIN > -10063

The Adsec_stop/Start was done with the same result of pie shape.

The mirror was recovered after disabling the actuator #560 in position which it is the first actuator of the crate #5. The logs have to be processed to understand better this event and the trigger of the pie shape.

The mirror is recovered and SET without problem.

pie_shape_DX_20140323.png

09 Mar 2014, 9:35 pm, 0:20 hours, DX Adsec, Juan Carlos

First night of the LBTI run. TO called me about a pie shape on DX. The loop was closed and the shell RIP showing the "pie shape".After applying a gain command the shell skip frame and 3 minutes after the "Pie shape" showed up. We can not see a large tilt and the loop was close with out problem. could be an issue after changing the gain with a variable seeing?

The Secondary recoverred after doing a recover failure process and after checking the Adsec for 10 minutes, the observations could be continued.

Crop of the logs:

adsecarb.R |INF| 52113|2014-03-10 04:27:27.925754| COMMANDHANDLER > FSM (status: AORunning) has received command 2029 (TTOffload)

adsecarb.R |INF| 52114|2014-03-10 04:27:27.951421| COMMANDHANDLER > Command TTOffload (code 2029) successfully completed

adsecarb.R |WAR| 52115|2014-03-10 04:29:10.940792| MAIN > Skip frame detected!

adsecarb.R |WAR| 52116|2014-03-10 04:30:13.313771| MAIN > Skip frame detected!

adsecarb.R |WAR| 52117|2014-03-10 04:30:49.512734| MAIN > Skip frame detected!

adsecarb.R |INF| 52118|2014-03-10 04:30:52.304349| COMMANDHANDLER > FSM (status: AORunning) has received command 2018 (SetGain)

adsecarb.R |INF| 52119|2014-03-10 04:30:52.321247| COMMANDHANDLER > Command SetGain (code 2018) successfully completed

adsecarb.R |WAR| 52120|2014-03-10 04:31:01.067180| MAIN > Skip frame detected!

adsecarb.R |ERR| 52121|2014-03-10 04:34:22.195490| ALERTHANDLER > Received an ERROR alert! Forcing Failure state.

adsecarb.R |ERR| 52122|2014-03-10 04:34:22.195525| ALERTHANDLER > Alert message: COILS DISABLED from fastdiagn.R

adsec.R |INF| 34184|2014-03-10 04:34:24.710726| MAIN > Fam: CHDISTAVERAGE in Alarm over indexes: 560 561 562 563 564 565 566 567 568 569 570 571 572 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 663 664 665 666 667 668 669 670 671

pie_shape_DX20140310.png

20140310_Pie_DX.png

diff_oneframe_before_pie_20140310.png

09 Mar 2014, 9:00 pm, 0:10 hours, SX_DX Adsec, Juan Carlos

First night of the LBTI run. At the begining of the night the observers contacted me about a failed AO-ACE preset using the AOS. The issue was; the WFS was not selected for LBTI. After talking with them and configuring the AOS for LBTI, a test preset from the AOs command GUI was succesfully sent. Not time lost.

19 Feb 2014, 7:00 pm, 2:30 hours, SX Adsec, Juan Carlos

At the begining of the night during the set of the shell, the TO reported that it has been two time to set the shell and the Adsec goes in failure and recover failure.

The shell couldn't be set. The actuator #37 was jumping and triggering the disable coils and shell to rip.

2014-02-19 00:40:52.008670| FUNCTEMERGENCYST > CHDISTAVERAGE-0037 = 0.244661

2014-02-19 00:42:49.499580| FUNCTEMERGENCYST > CHDISTAVERAGE-0037 = 0.163124

|2014-02-19 00:44:54.164299| FUNCTEMERGENCYST > CHDISTAVERAGE-0037 = 0.326197 [Funct.cpp:94]

|2014-02-19 00:44:54.164392| ADAM-MODBUS > AdamModbus: disabling coils...

2014-02-19 02:10:00.847156| FUNCTEMERGENCYST > CHDISTAVERAGE-0037 = 0.326536

Before to remove the actuator I tried to set the shell two more tiemes and the shell went into recover failure with the error in the fastdiagn about a jump of the acturator.

The Actuator #37 was inserted in the list of the bad actuator for positions. The system is restarted, power on and load program without problem. Just at midway of teh set of the shell process the system goes to a failure state with the system process in an failure state as. The TO in the AOS gets the error message IDL missing license and this is not a problem with the IDL server or license

Check of the process:

Configuration directory: /home/aoacct/AO/current/conf/adsec/current/processConf

msgdrtdb running
AOARB running and connected to Msgd
adsecarb NOT running
idlctrl running and connected to Msgd
IDL process Zombie process
housekeeper running and connected to Msgd
fastdiagn running but NOT connected to Msgd
masterdiagnostic running and connected to Msgd
mirrorctrl running and connected to Msgd
adamhousekeeper running and connected to Msgd
varsmonitor running and connected to Msgd
anemometermon running and connected to Msgd

With all these three process down the system goes in a failure state and it didn't allow to set the shell.

* Check of the message Deamon. The message deamon looks ok any time.

tcp 0 0 0.0.0.0:9752 0.0.0.0:* LISTEN 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50702 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50703 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50700 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50701 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50709 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50718 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50719 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50716 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50717 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50714 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50715 ESTABLISHED 8300/msgdrtdb
tcp 0 0 127.0.0.1:9752 127.0.0.1:50713 ESTABLISHED 8300/msgdrtdb
tcp 0 0 192.168.13.12:9752 192.168.58.17:60122 ESTABLISHED 8300/msgdrtdb

The IDLCTRL reported the failure;

MAIN > Error in IDLRPCExecuteStr () [idl_ctrl.cpp:608], cmd = print,test_
demo_mode()

I talk with Stephen Hooper to check if the IDL server was running and the IDLRPC process, he checked the IDLserver and it nothing was wrong and an idl session could be restarted from the adsec terminal. The IDLRPC process was launched everytime that the process were running.

I talk to Doug to know if he has got any experience with an error like this and we talk and we checked a few thing, but we didn't find a solution for the problem.

The sequence of error was repeated many times and at the end incluiding a second actuator #36 to the list of the bad actuator solved the issue and the mirror could be set.

3367|2014-02-19 02:09:59.379142| FUNCTWARNING > FunctWarning CHDISTRMS-0036 3.38169e-07

The Actuator #36 was detected has a jump in distance rms.

This is the first event that I can remember something like this.

16 Feb 2014, 08:00 am(UT), 4 hours, SX Adsec, Juan Carlos

I found the AdsecSX in a offline state with the AdsecArb and fastdiagn process down, also the IDL was in Zombie state. A telescope work request was requested to work on the AdsecSX. The telescope is at zenith. The processes are stopped/start and the shell is set for one hour without problem. The shell was forced to rip and the recover failure process finish successfully. A set of Rest/Set didn't show any problem. Flat for MODS was loaded several times after the shell is reflattened and the unit applied #439 modes. Open Issue track #5064

  • Adsecarbitrator process stop communication with the MsgD.
adsecarb.L |INF| 3330|2014-02-16 13:05:07.488964| MAIN > Communication with MsgD stopped adsecarb.L |ERR| 3331|2014-02-16 13:05:07.489152| IDL-INTERFACE > Error sending IDL message: -3606 (No connection with MsgD) [IdlCtrlInterface.cpp:87] adsecarb.L |ERR| 3332|2014-02-16 13:05:07.489549| IDL > [AOException] Failed to send command to IDL (code -3606) No connection with MsgD File IdlCtrlInterface.cpp line 88
  • Fastdiagnostic sopped:
fastdiagn.L |ERR| 1130857|2014-02-16 13:05:46.891649| MASTDIAGN-INTF > Error in thSendMsg: -3606 (No connection with MsgD)

14 Feb 2014, 1:10 am, 1:00 hour, SX Adsec, Juan Carlos

Another call from the mountain about instability of the loops and many RIPs, checking the log it was an actuator #230 that it was triggering some rips in current. At the time to connect the secondary went into a failure state with the 100% skip frames and the recover failure procedure running. The recover failure didn't work and the adsc_stop/start was applied. The actuator #230 is disabled in position and the adsec recovered and it was back to operations. I've been following the observations for one hour and the systems seems that it was working ok. We had another large tilt failure but this time it was due to a mistake of the observers sending a wrong preset.

14 Feb 2014, 7:10 pm, 4:00 hour, SX Adsec, Juan Carlos

Vanessa call by Skype to control an early error when the shell didn't set properly. A serie of jumping actuators triggered that the shell didn't set. #230,#16. During the call I saw an event when the mirror when in a large tilt which didn't offload the shell went into a failure mode with the skip frame 100% running. The AOS showed that the system was in close loop, it seems that the adsec didn;t recover from the failure. The adsecarbitrator process was stop and hung, I couldn't restarted from the processes GUI. To exit from this scenario was necessary to adsc_stop/start. About 8:00 we got another similar event when the secondary is in 100 skipping frame with the shell rip and the Adsec status in failure. This time Geno called me, and we did adsc_stop/start. The shell recover ok and it back to operations.

About 11 pm, the adssec suffered the same event of skipping frame in a failure state and to exit of the failure state is necessary to adsc_stop_start. The LBTI is using this approach to exit of the failure state and recover the system.

I disconnected from the mountain and the loop was close without any problem.

Issue track #5056 opened to follow up the failure+100%skipping frame events

13 Feb 2014, 2:00 am, 2:00 hour, SX Adsec, Juan Carlos

Mountain called me about a problem with the left side probabaly due to disk full. The disk was full 100% in the local partition. We started to clean up all the duplicated logs files and we were available to have about 400G free which is about 76%. I am keeping following the restart of the observations, after a restart of the processes (adsec_stop) and IDL error came out:

idlctrl.L |ERR| 1122|2014-02-13 09:57:39.700948| MAIN > Error in IDLRPCExecuteStr () [idl_ctrl.cpp:608], cmd = print,test_dem o_mode() idlctrl.L |WAR| 1123|2014-02-13 09:57:39.700977| MAIN > IDL in demo mode

The IDL process has to be restarted after that.

12 Feb 2014, 2:22 am, 1 hour, SX_DX Adsec, Juan Carlos

Mountain called about problems with the secondaries and many RIPs. I connected and it seems everything was going right, I've talking with Vanessa and I am checking the logs. We have some events that require a further investigation tomorrow. During the time that I was connected both secondaries didn't get any failure. Seeing is improving and getting stable.

12 Feb 2014, 12:08 am, 12 minutes, DX Adsec, Juan Carlos

DX stop offloading. During LBTI observing run it has been observed that the offloading on DX stopped. Vanessa called by Skype to talk about an issue with the DX offloading. LBTI stop/start the process LO Offload to TCS from the AdsecControl Gui but it didn't work. I suggested that the problem is at the AOS level. The AOS right was restarted and the offloading start to work.

08 Feb 2014, 12:58 pm, 2 hours, DX Adsec, Doug

Dust contamination reported of ~60 um at 4:00. Bump later went away and LBTI observing continued. See IT#5047

23 Dec 2013, 9:42 pm, 20 minutes, SX Adsec, Doug

Removed Actuator 116 for CHDISTAVERAGE. LBTI (Vanessa) said 116 had jumped 6 or so time in the last 20 minutes. Logs confirm.

23 Oct 2013, 12:00 am, 1 hour, DX Adsec, Doug

Removed Actuator 358 for CHCURRAVERAGE.

21 Oct 2013, 09:00(UT) am,12 hours + SX Adsec, Juan Carlos

Roberto Biasi answered the email and after checking he history of this sensor I saw that it has intermittent failures. The solution was to remove the single sensor in the housekeeper configuration file. After talking with Marco Xompero I disabled of the DSPtemperature sensor-0060 which correspond with the DSP in crate#4 board#4. The new configuration file was installed (make install-conf) and the adsecsx didn't show the problem of the sensor and the adsecsx could be set without problem The adsecsx was in a continuous testing all the day making sure that we don't see any other effects in the mirror. Non Pie shape showed up. The system was ready to go on sky. I followed the night with the LBTI team to see if the adsec is working properly and we didn't have any issue during all the AO night. During the day we had a power outage and when the power was restored to the commercial lines we had a failure in the telescope chiller that forced to power off the right secondary. John Little called me to notice the problem with the telescope chiller and the secondary was power off and back on when John L said that the chiller was running fine. The temperature in adsecdx was monitored and the cooling flow all the day. The system was running fine, not problems.

About 9:44(UT) the mountain called about a rip on DX side. The dump generated were checked and it is multiple jumps of the actuator #147. The act#146 is removed in position. Back on sky at 9:57.

20 Oct 2013, 22:20(UT) pm,3 hours minutes SX Adsec, Juan Carlos

SX Oct.20 Panic Mode
A call from the mountain saying that the adsecsx can not be set. The LBTI team showed that they show an strange shape in the mirror. The shape is related to the Pie shape of the crate#5. After a adsc_stop/start the pie shape didn't show up again. The issue to continue to operate the adsecsx was an error in the reading of one of the DSPtemperature sensors which trigger the panic mode for overcurrent. The system is left power off and an email to Roberto Biasi and Arcetri is sent to get a bit more information about if the reading from the temperature sensor is real or we are having a failure in the electronic.

History of the DSPDrivertemp#60

996|2013-09-24 07:47:20.211321| DSPDRIVERTEMP-0060 = -64 2351|2013-09-24 08:01:56.98740 DSPDRIVERTEMP-0060 = 96.125 303144|2013-09-28 12:22:18.259712| DSPDRIVERTEMP-0060 = 192 3388|2013-10-05 06:21:17.244144| DSPDRIVERTEMP-0060 = 85 62791|2013-10-05 12:38:44.530357| DSPDRIVERTEMP-0060 = 135.5 200665|2013-10-09 08:28:10.349005| DSPDRIVERTEMP-0060 70.25 200714|2013-10-09 08:28:16.910266| DSPDRIVERTEMP-0060 70.25 376952|2013-10-09 15:26:56.162205| DSPDRIVERTEMP-0060 = 208.062 1091299|2013-10-12 02:18:42.989256| DSPDRIVERTEMP-0060 = -237 1838859|2013-10-15 06:29:47.344379DSPDRIVERTEMP-0060 = 132.5 11958|2013-10-15 18:24:02.347625| DSPDRIVERTEMP-0060 = 137.5 361067|2013-10-16 13:24:34.472009| DSPDRIVERTEMP-0060 = 131.5 592144|2013-10-17 14:30:21.261376| DSPDRIVERTEMP-0060 73.5 592650|2013-10-17 14:31:35.229066| DSPDRIVERTEMP-0060 = 147.5 9278|2013-10-17 15:43:26.620377| DSPDRIVERTEMP-0060 = 117 2313|2013-10-17 16:28:52.991566| DSPDRIVERTEMP-0060 = 113 2650|2013-10-20 05:45:36.715820| DSPDRIVERTEMP-0060 = 100.5 1460|2013-10-20 07:28:50.150463| DSPDRIVERTEMP-0060 = 184 5934|2013-10-20 08:27:31.697512| DSPDRIVERTEMP-0060 = 227

15 Oct 2013, 22:20(UT) pm,30 minutes SX Adsec, Juan Carlos

SX Oct. 15 22:20:47 Panic Mode
IDL process Zombee.
SX found in Panic Mode
To start with a clean startup, the adsc systems were stopped. adsc_stop/start. The Unit is power on and load the program. During the load program process the Unit went into Failure and Recover Failure processes started. Dump saved at 22:55:24. The actuator #32 jump in CHDISTAVERAGE value. The recover failure process finished success. Set Flat. Ok.With the shell in flat we had a failure to the jump of three actuators #149,#150,#151. Dumped saved at 23:03:18 After the Failure recover a new Set Flat is done. Ok. The Flat process finished at 23:09:38. The shell is rest/set

Shell rest@23:13:20 Shell set@23:24:0 Shell rest@23:27:09

Handover to the telescope at 4:30PM Local time

03 Sept 2013, 10:34 pm, 5 min, SX Adsec, Doug

SX Sept. 4 05:34:47 CHDISTAVERAGE in Alarm over indexes: 032
Recover Failure was successful.
"Set" from AOSGUI left was successful.

01 Sept 2013, 9:45 pm, 1 hour, SX Adsec, Doug

Issue #4779, AO Event: AOSR could not connect to AdSecDX message deamon

01 Sept 2013, 9:44 pm, 5 min, SX Adsec, Doug

The DX AdSec automatically performed a recover fail with the error that CHDISTAVERAGE Alarm over indexes: 595. I have not changed anything. We will see if this happens again and regularly. If so, we should maybe remove this actuator.

27 June 2013, 3:28 am, 3 hour, SX Adsec, Juan Carlos

The TO call me due to a problem with a ASM_SX that it was in a PANIC state. Checking the failures that they have happened I saw some problem with IDL which triggered some RIP of the shell. Another issue that I found that was that the switchcabinet was given high temperatures values; Crop of the housekeeper log:

housekeeper.L |WAR| 19386|2013-06-27 11:04:03.620238| FUNCTWARNING > FunctWarning SWITCHSTRATIXTEMP-0000 50.375

This is the limit to operate in the cabinet and after this warning we are starting to get error reading the BCUs.

To through the problem, we went to the cabinet and we left the door open for about 10 minutes to ventilate it. 15 minutes later, the temperature started to rise up hitting the limit and the failure related to temperature and the communication with the bcu started. After one minute the observation was interrupted.

We went to the cabinet and we tide up the door, leaving it open for the rest of the night. With this solution the temperature cabinet dropped to 38 Celcius and we could oberved during the twilight. LBTI used that extra observing time to check some scripting options.

24 May 2013, 1:28 am, 2 hour, SX Adsec, Juan Carlos

Call from the mountain reporting a problem to recover the ASM_SX which it was in a failure state. The system was in close loop and the shell rip due to a jump of the act #216, this jump put the shell in a recover failure and right after the recover failure procedure started the IDL went down. With the IDL the system is unresponsive to any command and it remains in failure state. The system was recover opening a new adseceng GUI, open the System processes and stop/start IDL process and adsecarb process.

After leaving the system back in operations I've been monitoring and talking to Vanessa (LBTI Observer) how to deal with this scenario.

21 May 2013, 1:50 am, 5 hour, SX Adsec, Juan Carlos

ASM_SX back in AO operations with the LBTI, I've been checking the behaviour of the ASM and helping the LBTI during the first check of the ASM in AO mode.

It was necessary to remove the act #374 from the position list to make the system more stable. The AO corrections were good and stable with 400 modes corrected.

12 May 2013, 1:40 am, 1 hour, SX Active Optics, Doug

MODS elongated images at high (~80) elevation. No elongation at lower (~60) EL. Turned off Act (mods 8), guiding (mods9), step focus (mods 10), but all show elongation. May have been some improvement when step focus turned off. Science field down to 75 El now and elongation less.

Collimated on-axis. Preset off-axis, but did not send zernikes. wfsc images 184-194. Calculate 900 nm of astigmatism. Is off-axis corrections incorrect? Earlier had changed off-axis guide stars. x=13, y=-131 starting at wfsc image 148. Before that we were off-axis at x=-58, x=-134.

12 May 2013, 11:30 am, 30 minutes, SX Active Optics, Doug

Install new IE/CA and X,Y,Z global offsets in PT and COL model files

11 May 2013, 1:03 am, 1.25 hours, SX Active Optics, Doug

Trying to determine new global offsets for Rigid secondary on SX. Find a problem with UMAC code and scaling of motion. Stop and John make IT4641.

6 May 2013, 1:30 pm, 7 hours, SX Adaptive Secondary, Doug

Either dust contamination in the gap or a bad DSP/Ribbon Cable/Distribution board. See IT#4633

30 April 2013, 12:24 am, 5 minutes, Active Optics, Doug

Change MOD_L.cfg back to the original wfsc_hotspot_x value for next target

30 April 2013, 10:15 pm, 1.25 hours, Active Optics, Doug

Geno called and the MODS Active Optics diverged. Seeing was ~2.5 so pinhole in front of wfsc was filled. Need to determined when to send Act Opt corrections and when not to (residuals?). Geno added a note to IT#4358

30 April 2013, 7:27 pm, 10 minutes, SX Adaptive Secondary, Doug

Geno had done a closed dome preset with shell set. Successful. Then Rested, went to horizon and opened dome. Back to Zenith and AdSec was in Panic

On Diagnostic info panel of AdSec Control GUI:
  • 02:15:59 AO System was in Panic: CHDISTAVERAGE 309

  1. Stopped AdSec Arbitrator
  2. Loaded Program from AdSec Control Gui
  3. Set Shell from AOS GUI.

All worked fine. Nothing changed on the system. 309 must have jumped just once.

16 April 2013, 10:00 am, 6 hours, SX Adaptive Secondary, Juan Carlos

Logging check for LBTI to get a better understanding of the failures rates during the last two LBTI nights. Also two actuators were put in the list of act_wo_pos (act #371 and #243). After teh remove of the actuarors I've been monitoring the behaves of the shell.

14 April 2013, 11:20 am, 2.5 hours, DX Adaptive Secondary, Doug

Frost on DX AdSec shell, Coolant temperature Too Cold IT#4593

8 April 2013, 3:30 pm, 15 minutes, Active Optics, Doug

MODS not converging. Cleared Active Optics on Primary. Collimation fine.

3 April 2013, 1:20 am, 5 minutes, SX AdSec, Doug

Check Adsec and check with David to see if they will switch to Gregorian Observing. They will not.

3 April 2013, 9:00 pm, 2 hours, SX AdSec, Doug

SX AdSec will not set for 1.5 hours. Finally Set, but no confidence that it will remain Ready. Updated IT 4582.

3 April 2013, 5:15 pm, 40 minutes, SX Adsec, Doug

SX AdSec in Panic mode. Can not load program on BCU crate # 1. Stop when telescope moved to horizon

30 Mar. 2013, 1:23 am, 1.25 hours, SX Adsec, Doug

Assist Julian, who is on call, to get the SX AdSec up and running. Many failures with crate #1 BCU. Finally communication is fine and AdSec in Ready state. Description in IT#4582

26 Mar. 2013, 6:00 pm, 20 minutes, DX Adsec, Juan Carlos

3 Mar. 2013, 6:00 pm, 20 minutes, DX Adsec, Doug

Geno called and said the DX AdSec was in Panic. The IDL process was still working properly so I tried fsm_load_program(/auto), but it failed. Tried power off, power on, fsm_load_program, but failed. Tried power off, adsc_stop, adsc_start, power on, fsm_load_program, but failed. Found in logs:
adsecarb.R.00001362322084.log:adsecarb.R         |ERR|       312|2013-03-03 14:38:25.346152|              IDL > Command returns error : IDL Dust detected in the mirror
adsecarb.R.00001362359611.log:adsecarb.R         |ERR|        98|2013-03-04 01:10:04.806898|              IDL > Command returns error : IDL Dust detected in the mirror
adsecarb.R.00001362359611.log:adsecarb.R         |ERR|       108|2013-03-04 01:11:32.803698|              IDL > Command returns error : IDL Dust detected in the mirror
adsecarb.R         |ERR|       105|2013-03-04 01:17:27.990047|              IDL > Command returns error : IDL Dust detected in the mirror
adsecarb.R         |ERR|       115|2013-03-04 01:18:54.586603|              IDL > Command returns error : IDL Dust detected in the mirror

Did not do anything more and created IT#4538, assigned to Armando.

Monday morning, as Armando suggested, added actuator #23 NO position list, restarted AdSec and Set with no problems. IT #4538 closed.

2 Mar. 2013, 9:00 pm, 20 minutes, DX Adsec, Doug

Geno report in the previous nightly log and called me a 5:45 this evening that the DX AdSec was reporting IDL License Not Available. I found the IDL was not responsive so I restarted the IDL process, executed fsm_power_off(), adsc_stop, adsc_start, power on from AOS GUI, Set Shell. This first Set shell failed with a communication error to one of the BCU. Recover fail and the next Set was successful.

27 Feb. 2013, 10:00 pm, 20 minutes, DX Adsec, Doug

Geno reported that the IDL License Missing error occurred. I checked at an IDL terminal and sure enough, the IDL process was not responsive. I stopped and started the IDL process from the process GUI, then IDL> print, fsm_load_program(/auto) and then set the shell.

27 Feb. 2013, 8:03 pm, 15 minutes, SX Adsec, Doug

Geno could not set the SX shell from the AOS GUI. I checked the engineering GUI and everything looked fine. I set the shell from the engineering GUI and the AOS GUI reflected the correct information

24 Feb. 2013, 7:30 pm, 5 Hours, ASMs, Juan Carlos

David called about 7:30pm due to a problem with the SX ASM. The ASM was in Panic state.

The mirror was recovered cleaning all the process (adsc_stop/start). SX had an instable behavior till the LBTI corrected the way to close the loop. With the improvement of the LBTI AO loop calibration, the number of rips were reduced.

The two ASM mainly ripped due too much forces during the close loop. The seeing was bad and variable. The ASMs recovered from the rips without problems, but the SX side was suffereing more rips and the ASM started to get difficulty to recover. With the many rips in the SX, the ASM dumped many files and many of them were duplicated filling the disk and at that point the internal processes were getting slow provoking an instable system.

The duplicated files were removed and coinciding with a better alignment of the LBTI WFS in the SX side, the ASM started to work in a stable state reducing the number of rips and it was recovering easily from the rips.

I followed the observations till the telescope was closed by humidity at 12:30am

23 Feb. 2013, 4:10 am, 5 Hours, ASMs, Juan Carlos

David called about 5:00 am due to a problem with the secondary. The LBTI is the instrument working in te telescope.

The mirrorr was in panic and to restore the operations of the ASM an adsc_stop/_start was needed. The ASM was recovered well but during any LBTI operations the shell rip in most of the cases. For each rip the ASM software dumped many duplicated files that at the end it filled the disk and all the internal process of the ASM started to be stuck. The mirror was recovered but for any close loop from LBTI the shell rip and it was hard to recover it. The actuator #373 was back in the control loop to see if there was something that we don't understand ( we have a few information about the way that LBTI is closing the loop) and the shell was swtill ripping after the ASM closed the loop. Looking tha the actuator 373 has not effect in those rips, the actuator is again include in the list of bad actuator in position and the software is restarted. At this point it was 6:30 am and the telescope was closed. When teh telescope was back to zenith (close dome) We restarted all the process and the shell could be rested without more problems. Several test of rest and set was done and the ASM responded very well.

About 6:00 I sent to Runa and email asking some information about some errors code that I've never seeing before: SWITCHSAFESKIPCOUNTER-0000 -1.45519e-11

Runa came back to me and we started to look the logs of this night and we configurations files and everything looks fine apart of the all the duplicated files. We discused this issue and the doen't understand why is doing this and he will talk with Marco.

The ASM is left in rest and operational.

13 Feb. 2013, 10:10 pm, 40 minutes, Active Optics, Doug

Steve called and the guider image had small wings on it, part of the time. At times it looked quite round. Steve said that it even occasionally look comatic. Elevation was 68 degrees.

I looked at the WFSC image and it looked well collimated, although the pupil should have maybe been shifted up on subap.

They are taking spectra, so a small wing would only put maybe 1% of the light out of the slit. Thus, this problem is not affecting science observing, but does need to be understood.

At the next target, at 36 elevation, the stars in the center of the MODS image looked very round, according to Olga.

Is the problem with the guider itself (dust on and optic such as filter wheel) and the IQ in the focal plane is fine?

12 Feb. 2013, 10:10 pm, 15 minutes, Active Optics, Doug

Steve called me on Skype asking what the new FWHM Average and Sigma from the WFSC image installed on the GCS GUI (IT# 4490) measured and how should he interpret their meaning. He also mentioned that the Average was in the range of 1.0-1.4, but the Sigma was ranging from 0.1 to 0.6, without much change in the WFSC image.

I informed him that a large Average FWHM indicates that the seeing and/or primary mirror temperature gradient are sufficiently bad that the Active Optics was probably not improving the telescope collimation. Currently the GCS GUI will change the FWHM Average display to Yellow when the value is > 2.0". This initial value will possibly change.

For the MODS WFSC image from this night, the FWHM Ave was reasonable and the Act Opt was collimating the telescope correctly.

Action Items:
  1. ALERT! Doug: Check the value of 2.0" as the correct value for warning that Act Opt may not have enough information to improve collimation
  2. ALERT! Doug: Check to see why the FWHM Sigma is changing so much with no obvious change in the WFSC image
  3. ALERT! Doug: Send a message to telecopework with a description of the new FWHM Ave and Sig displayed on the GCS GUI and an explanation of what the range of these values mean.

10 Feb. 2013, 8:04 pm, 30 minutes, Active Optics, Doug

Geno called me on the phone (switched to Skype) saying the the WFSC images looked strange and the the telescope was not collimating.

After looking at a WFSC image it was clear the that either the natural plus dome seeing was really bad (>2.0") or, more likely, the temperature gradient across the primary was high and causing large amounts of aberration in the wavefront. Geno reported the FWHM of the guider was >> 2.0". On the WFSC image the had each subap completely filled with the Shack-Hartmann spot. There were only a few (~15) SH spot that Source Extractor could find a centroid (about the same as my eye) so the there was not enough information in the WFSC image to reconstruct the wavefront so any Active Optics corrections would be meaningless.

Geno had already un-clicked the "Send zernikes to PSF" button. I had him clear Active Optics on both the Primary and Secondary. Neither the guider nor WFSC images changed.

My conclusion was that any Active Optics correction determined in the current conditions would be useless and the "Send zernikes to PSF" should remain un-clicked until conditions (ie the temperature gradient across the primary) improved and more SH spots could be seen/found on the WFSC image. The astronomers could collect data (guiding was keeping the telescope mostly pointed at the target), but FWHM > 2.0" is the best they could expect in current conditions.

I think not long after this the Chamber was closed due to high humidity.

This high temperature gradient spawned IT# 4490.

-- DougMiller - 13 Feb 2013

  • position difference between two frame before pie:
    diff_oneframe_before_pie_20140310.png
Topic attachments
I Attachment Action Size Date Who Comment
20140310_Pie_DX.pngpng 20140310_Pie_DX.png manage 19 K 11 Mar 2014 - 02:22 JuanCarlosGuerra PIe shape
current_one_frame_before_Pie_20140310.pngpng current_one_frame_before_Pie_20140310.png manage 33 K 11 Mar 2014 - 04:22 JuanCarlosGuerra Current one frame before
diff_oneframe_before_pie_20140310.pngpng diff_oneframe_before_pie_20140310.png manage 34 K 11 Mar 2014 - 02:24 JuanCarlosGuerra position difference between two frame before pie
pie_shape_DX20140310.pngpng pie_shape_DX20140310.png manage 28 K 11 Mar 2014 - 04:32 JuanCarlosGuerra Position of the actuators
pie_shape_DX_20140323.pngpng pie_shape_DX_20140323.png manage 92 K 23 Mar 2014 - 06:18 JuanCarlosGuerra Pie shape
Topic revision: r58 - 28 Nov 2014, JuanCarlosGuerra
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback