earlier tests didn't have maxThreads reset to higher value. Did that (set to 200) and
container no longer crashes at 1Hz submission rate
local invocation now gives superior response
put in request to vdt-support to have increased maxthreads level set by default.
Jeff's client timeout tests
globusrun-ws option '-T #' did not seem to work or I set it too low (~10 minutes)
adding containerTimeout parameter in server-config.wsdd file and setting to 1000000 (~16 minutes) did work
370 out of 375 jobs went through successfully
5 jobs failed with
....
Current job state: StageOut
Current job state: Failed
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
Message expired (outside window)
Charles says these results point to the server container (not the client-container) as having the timeout. And small number of failures may be a security catch all whereby any message carries a time-limit.
Terrence's Condor-g testing at UCSD
just now was able to push the service into operation by running the seg by hand.
still an open question why this doesn't work when container is started initially
will start running tests with Condor-g
OSG Integration activities (Charles)
How to address 3 issues (interop, accounting, log-rotation)
reporting: Stu notes that supporting log-rotation in Condor is underway or persons identified
accounting: there are outstanding bug reports from John Wiegand that spell out the issues ... followup from there
interop: Charles needs to re-ignite email thread based on CE element from GlueSchema? 1.3
Involvement of Big VOs
CMS involvement will follow Terrence's tests
Atlas contact made
To Do List
submit bug report for maxAttempts not working (Jeff)
Suchandra will be running through Condor-g tests Types I - VII
Charles to follow up on accounting & reporting
Jeff will be finishing SGE log-rotation and following up on SGE/Globus interations