MinutesWSGramTestingDec4

Introduction

  • Attendees: Suchandra, Stu, Dan, Jeff, Rob, Charles
  • Apologies: none
  • Coordinates: Tuesday, 3:30pm Central; 510-665-5437, #1212
  • Previous meetings, MeetingMinutes

WS Gram scalability tests (Suchandra)

  • Redid tests - see plots. Commands were of the form: globusrun-ws -submit -Ft Condor -F uct3-edge6.uchicago.eduu:9443 -s -c /usr/bin/whoami, submitted at a rate of 1 Hz.
  • Finds big improvement in OSG 0.8 over "modified" OSG 0.6
  • Finds that container stays up for 30 seconds w/ 100% success rate. Connection reset error. Would like to be able to manage job restarts after time-outs.
Delegating user credentials...Failed.
globusrun-ws: globus_i_delegate.c::1142:
Error trying to delegate
globus_i_delegate.c::673:
Error querying delegation factories
ManagedJobFactoryService_client.c::2209:
Failed sending request ManagedJobFactoryPortType_GetMultipleResourceProperties.
globus_xio_system_select.c:globus_l_xio_system_try_read:1134:
System error in read: Connection reset by peer
globus_xio: A system call failed: Connection reset by peer
  • There are improvements in the handling of delegation.
  • Question - how does this behavior compare to similar tests to the older architecture (gt2). Note - is the Condor-G functions as optimized for WS - Gram?

WS Gram scalability tests (Jeff)

  • Has focused on jobs w/ data movement from Condor-G submitted jobs.
  • Had returned to doing things at the globus-job-run level. Is reproducing results - thinks its a gridftp timeout.
  • Not enough detail coming from Condor-G - just see a Hold reason for stagein.
  • RFT interacting with a gridftp server - not getting a reply during the security handshake. Why does this take too much time? Is the RFT service not getting enough time from witing the container (getting time sliced)? Gridftp is closing the socket after 30 seconds. Updated to 2 minutes - but not in a public release.
  • Consider also setting the number of retries > 0 for RFT.
  • Also set gridftp timeouts to 2 minutes.

WS Gram container problems (Charles)

  • Container crashing issues at UCSD - bump up debugging levels, look at VM usage. Nothing in container logs.
  • A simple script that ps's every few minutes to track memory usage. Suchandra will proivde this.

WS Gram at the OSG Site Coordinators meeting

Discussion, next steps

  • Reproduce tests with optional configurations.
  • Meet again next week - Monday - 4 pm Central.

-- RobGardner - 03 Dec 2007

Topic revision: r2 - 04 Dec 2007 - 22:43:14 - RobGardner
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback