%LINKCSS%

Minutes WSGram Testing Dec 18

Introduction

  • Attendees: Jeff, Stu, Suchandra, Terrence, Charles
  • Apologies: none
  • Coordinates: Tuesday, 3:30pm Central; 510-665-5437, #1212
  • Previous meetings, MeetingMinutes

Scalability testing

  • Suchandra's local invocations tests
    • earlier tests didn't have maxThreads reset to higher value. Did that (set to 200) and
      • container no longer crashes at 1Hz submission rate
      • local invocation now gives superior response
    • put in request to vdt-support to have increased maxthreads level set by default.
  • Jeff's client timeout tests
    • globusrun-ws option '-T #' did not seem to work or I set it too low (~10 minutes)
    • adding containerTimeout parameter in server-config.wsdd file and setting to 1000000 (~16 minutes) did work
      • 370 out of 375 jobs went through successfully
      • 5 jobs failed with
         ....
        Current job state: StageOut
        Current job state: Failed
        Destroying job...Done.
        Cleaning up any delegated credentials...Done.
        globusrun-ws: Job failed: Staging error for RSL element fileStageOut.
        Message expired (outside window)
        
  • Charles says these results point to the server container (not the client-container) as having the timeout. And small number of failures may be a security catch all whereby any message carries a time-limit.
  • Terrence's Condor-g testing at UCSD
    • just now was able to push the service into operation by running the seg by hand.
    • still an open question why this doesn't work when container is started initially
    • will start running tests with Condor-g

OSG Integration activities (Charles)

  • How to address 3 issues (interop, accounting, log-rotation)
    • reporting: Stu notes that supporting log-rotation in Condor is underway or persons identified
    • accounting: there are outstanding bug reports from John Wiegand that spell out the issues ... followup from there
    • interop: Charles needs to re-ignite email thread based on CE element from GlueSchema? 1.3
  • Involvement of Big VOs
    • CMS involvement will follow Terrence's tests
    • Atlas contact made

To Do List

  • submit bug report for maxAttempts not working (Jeff)
  • Suchandra will be running through Condor-g tests Types I - VII
  • Charles to follow up on accounting & reporting
  • Jeff will be finishing SGE log-rotation and following up on SGE/Globus interations
  • All will have fun over the holidays

%BOTTOMMATTER%

-- JeffPorter - 18 Dec 2007

Topic revision: r3 - 18 Dec 2007 - 22:42:55 - JeffPorter
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback