MinutesOct23

Introduction

Changes to attributes / schema

  • Steven is seeing an interesting problem with fermigrid:
    • Fermigrid has multiple heterogeneous subclusters
    • Number of job slots gets multiplied by subclusters resulting in overreporting of # of free cpus available
    • Matching on an individual node in a specific subcluster is difficult to do (matching on CEs and subclusters might help here)
  • Other issues (site specific services, e.g. ws-gram, nfs lite):
    • Possibly need extensions to the glue schema
    • GlueService? attributes might work
    • Need to investigate whether these will work or whether it'll break the bdii (Steven will investigate this)

Future plans for attributes subcommittee

  • Should reconvene about a month or so after the next release to discuss changes/new attributes
  • Somewhat of a requirements gathering process

BDII server issues (RobQ?)

  • Four options to deal with issues, will be discussed in today's facility meeting
  • No decision but there are several options
  • Running multiple services will fragment information onto different servers
  • Rob will see what EGEE is doing in with schema updates

WSGRAM testing progress (Jeff, Suchandra)

  • Jeff's report
    • ran thru Validation steps against persistent ITB set
      • 2 sites had config issues - both now fixed
      • 2 sites with permission issues - unresolved
      • remainder were successful
    • began scaling tests from ITB (vdt-181) submit host w/ gsiftp installed from vdt:Globus-Base-Data-Server
      • simple Condor-g submissions against FNAL, TTU, BNL, & LBNL services
      • scaling up to 100 jobs per submit-file (condor -version 6.8.6, configured with defaults)
    • results were generally successful
      • small fraction of jobs fail with "Staging error for RSL element" (FNAL & BNL)
      • Large load on LBNL service (reaching 20) causing a larger number of "Staging errors for RSL element"
      • Need to apply performance recommendations and ask rest of ITB group
  • Suchandra will start testing this week

VO WS-GRAM validation (Jeff, Suchandra)

  • Jeff's report
    • Working with STAR admin (Wayne) and Grid developer (Leve)
    • Using SUMS (STAR job scheduler) built appropriate job files and condor-g submit file for gt2
    • modified submit file for gt4
    • modified job files for submission from my ITB client (STAR production clients currently do not support gt4)
    • ran jobs against FNAL, TTU, BNL - so far each have had distinct errors
      • shared lib missing on FNAL - now fixed and rerunning
      • TTU - source $OSG_GRID/setup.csh produced an error
      • BNL - gridftp of results back to submit host failed
  • Suchandra's report
    • Britta (LIGO) has a ws-gram enabled workflow
    • Has submitted against UC_ITB but ran into problems
    • Resubmitting to UC_ITB and submitting to other permanent ITB sites (Fermigrid and BNL)
    • Need LIGO
  • VOs have concerns in regards to the version of the osg-client package, using ws-gram depends on a newer version of the client than distributed with 0.6.0
  • Uptake might have to wait until osg 0.8.0 due to problems with condor 6.8.2/6.8.3 and firewalls

Last minute changes

  • Repositories will be frozen today so any changes should be given to Suchandra ASAP

AOB

  • None
-- RobGardner - 17 Oct 2007
Topic revision: r7 - 16 Dec 2008 - 16:16:01 - KyleGross
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback