Testing with the new BBB July/August 2016


Initial Configuration Setup

  • we had set the static network info for the mountain on the BBB before coming up to the values Stephen suggested for the static IP 192.168.59.150, when we booted it on the mountain, it was correct
  • wanted to mount the CMU /images disk on the BBB; had to add nfs service to run on CMU add this to startup so it always starts and added /etc/exports file to CMU
  • added ntp configuration to BBB, it already had the service, just had to add our servers to ntp.conf ( add this to the installation doc )
  • /media/images on BBB is the SD card. 16GB
  • at the end of all the tuning, captured all the source and the initialization files (from /lbc) in SVN
  • we had two BBBs - one was the initial prototype, with wire-wrapped connections, the one we used the most was the first real board.
    On 20160803, we updated the software and init files on the prototype to match the real BBB that we had tested with. Interestingly, the Ubuntu versions are different (12.04.4 and 12.04.5), and the prototype does not power down like the other one does - every poweroff/shutdown/halt command reboots!

Testing

  • mostly used the test program in /home/ubuntu/ccdctrl
  • the last day on the mountain and followed up in Tucson, we did testing with the testcamera or testtrackers programs using the ccdctrl client on the BBB.

Software Issues

priority description responsibility
high Not controlling the shutter. We tried from the CTR-S100 program as well.
Update, 20161007: FP has given Alessandro a procedure to check out the shutter control wiring on the SPC card.
SkyTech
high cannot take an image longer than 30secs. The CTR code does not return the HB method that ccdctrl calls until after it finishes integration - by then it has turned off because it thinks it's not communicating to the BBB.
(see screenshot below)
Update 20161017: SkyTech has updated the software and KS has verified the fix.
SkyTech, mutex problem
high PRU timeout ( prussdrv_pru_wait_event_timeout times out) in CCDCtrlPreset when using windowing (see second screenshot below)
AdP suggested this was a timeout on the number of pixels expected. KS made a test resetting the expected pixels when windowing is requested, which worked.
Is this a new API feature? If so, we need to make sure it's set correctly for all cases.
AdP says: "This have been one of the latest problems with SkyTech. There is a problem with the new version that did not exist in the previous. Now the controller does not know how many pixels to wait for and the lMemAllocatedSize, instead of representing the full maximum size as in the Windows version, should represent the actual expected size to download. Maybe I forgot to change that somewhere..."
INAF
high Once we started using the high-level testing, we saw RPC timeouts between CMU and BBB - seen this on presets, stop, query
preset timeout is 40s
open/telemetrycheck are 20s
close/init are long, 120s
resume/query are only a 2s timeout
shutter/upload/stop only 10s
expose is set to exposure time+24+6(if flushing)+1
save is set to 40+10 if not MT, otherwise 10s
LBTO
med saw blank fits filenames at one point (20160729, 19:36, 19:43, etc), this occurs when we have telemetry error during exposure INAF/LBTO
high saw seg faults of the ccdctrl client on the BBB during the high-level error processing when we had the RPC timeouts and it tried to re-init (20160803 18:28, 19:07)
also saw this when I had a query timeout and it did a destroy of the status RPC handle
INAF/LBTO
  maybe need to make sure we can get out of the HB error more easily. Since it is a new feature, maybe it should be easier to recover from. We may run in to it a lot due to the resource availability of the BBB.  
med recover in camera code wants to do a stop/initialize, but that hangs in ccdctrl because stop doesn't release the enabled_control mutex; KS didn't want to just add the release without understanding why it worked in the Windows version, without the release.
AdP says: "Do not be reluctant too much. On Windows mutexes could be multiple locked if this happens in the same thread. Posix threads are not the same some maybe release is required."
KS changed stop to release the mutex and made resume lock the mutex again, to enable.
INAF
  there's a TOO FEW or TOO MANY CTR error now; we can change the code to not calculate and check for it
If it times out like the windowing case for too few, how do you ever get TOO FEW as an error?
 
  saw a PRU timeout and then a GetTemperature failure on a preset 20160810 20:36 - why? Can I reproduce this one ?  
  add a close of the log file to the lbckill script so that BBB client doesn't crash when we kill LBC software LBTO
  test mounted drive from archive to BBB, newdata LBTO
n/a single FITS files of the sci camera corrupted? (in contrast to the MEF) FP said they wouldn't open in ds9
we checked later and it was ok - must have been related to some other weirdness
 


Long image, no HB error:
60secexposurenohb.PNG


PRU timeout in preset, using sub-window:
PRUTimeoutWithWindowing.PNG


Checked in the BBB version to https://svn.lbto.org/repos/lbccontrol/trunk/src/bbb/ccdctrl

Notes

Initially, using old controllers to read all the voltages (bias and clock) - recorded by FP for comparison with BBB controller.

Using the test program in ccdctrl directory on BBB.

Lots of confusion about why the hard flush every time we take a sci image with the test program. The test program is really minimally modified, so I have to think it worked this way before. FP said it should flush after safety off and after it has been a long time since exposure.
hard flush from the code is an ExecuteWipeTable

turn off on LBC should do a poweroff on the BBB before turning off CCD controller

Thursday, we connected to the red tech detector. By 3pm, Fernando was happy with the results from the red tech.

Send Bellesi document comments - just a couple

Made few test.c updates:
- fix to make "re-init" work (close first)
- delete modes available - only NORMAL

More updates made to ccdctrl.c :
- logging up to date
- too-few/too-many save the image anyway
- use CTR error code translation
- delete the FAST/FASTBINNED/NORMALBINNED processing

5:49 on Thursday, connected to the red sci detector - saw bad clock voltages when we did a show telemetry. Powered off and disconnected.
Shouldn't we see clock voltages logged when we do exposure? not just biases, but clocks?
checked this in the code - it takes more than 5 seconds it says, so it doesn't check clock voltages on every exposure.

Finished with red cryostat about 9am on Friday

TelemetryCheck (check telemetry on test program) doesn't look at the new status structure - probably should, and print it out.
Doesn't it do a "telemetry check" during init? before flush? not clocks?
We can see the voltages come back on after the stop command, if we do a "check telemetry". Seems wrong - the CHECKOFF macro turns enables the clock and bias lines via the API, which turns the voltages on.

Modified testtrackers to work with a single channel (like we need when testing CCD controller), or both (like we need for testing image analysis).

Modified trackers.c to pay attention to IA enable from config before doing any IDL stuff (open/close).

Modified testcamera to spawn a thread querying status every 10secs, otherwise the hardware shuts down on us (bias values of 0) if we don't take images constantly.

When do we see the _1, _2, _3, _4 fits files? How does the high level software set the multi-fits vs multi files? szPrimaryHeader ?

Camera ErrorCheckAndRecover is called for preset, expose, save. It tries one re-init cycle.

  • Status check during HB:
    StatusCheckDuringHBERror.PNG

Updates to test.c that FP would like
  • important inputs (like FITS format) should wait for return, not just take value from first char

INAF/SkyTech Updates Sept-2016

Andrea sent an updated zip file on 22-Sep-2016:
Ciao Kellee,
the controller is now working in SkyTech for long exposure images: they changed DLL to get that result.
We had some problems with test program because of a new parameter (maybe you inserted it)
on the stop and resume functions to trigger safety stop. In my mind stop and safety are very different things and they should not be called in correlation, but of course you addressed a problem in this way and if it works it is fine for me.
So please rename the attached file to .zip and extract at least CTR_PROCEDURES and ccdctrl.c code and test.c code (in ccdctrl.c I changed the error returned to allow working without shutter connected, but Fernando will explain more).

Ciao
Andrea and Fernando

The only updates were to CTR_Procedures.cpp and CTR_Procedures.h
Andrea's ccdctrl and test.c were not different and KS's were more updated.
Tested using the test program on 17/18-Oct-2016. Verified the heartbeat timeout did not trigger on long exposures.



Back to BBB Integration Testing page
I Attachment Action Size Date Who Comment
60secexposurenohb.PNGPNG 60secexposurenohb.PNG manage 153 K 29 Jul 2016 - 18:33 UnknownUser Long exposure timeout
PRUTimeoutWithWindowing.PNGPNG PRUTimeoutWithWindowing.PNG manage 85 K 29 Jul 2016 - 18:34 UnknownUser Windowing exposure timeout
StatusCheckDuringHBERror.PNGPNG StatusCheckDuringHBERror.PNG manage 85 K 29 Jul 2016 - 18:34 UnknownUser Status check during HB
Topic revision: r15 - 18 Oct 2016, KelleeSummers
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback