-- DanFraser - 02 May 2012

Action/Significant Items:

Attendees:

  • (to be updated after the meeting) Xin, Armen, Suchandra, Marco, Rob Q., Scott T., Kevin , Chander

CMS (Tony)

  • Job statistics for last week
    • 7,718 jobs/day

  • Transfer statisics for last week
    • ~699 TB/day

Atlas (Armen & Xin)

  • General production status
    • Relatively long recovery to more stable operation after the technical stop for LHC. The past week was a good one, with significant data collected. Current collected luminosity for ATLAS ~1.7 fb-1.
    • US ATLAS production during the week was quite stable, at the average level of about 17-20K running jobs, mostly simulation type. Review of all computing resources, squeezing maximum efficiency, including beyond pledged resources, also prevent any waist of analysis resources with low priority stuff, to achieve maximum MC12 production and be ready for summer conferences.
  • Job statistics for last week.
    • Gratia report: 1.3M pilot jobs run on USATLAS sites, with CPU/walltime ratio of 85%
    • Real Jobs processed by US sites for last week, reported from PanDA? monitor
      • 1M
  • Data Transfer statistics for last week
    • Data transfer rate was 200~400TB/day at BNL T1 in last week.
  • Issues

Grid Operations Center (Rob Q.)

Operations Last Week

  • GOC Services Availability/Reliability
  • Current Status
  • Production Services Updates - Notification
    • Minor changes to Operations Status Overview, see it here
    • /usr/local on repo.grid.iu.edu to grow to 128 GB from 16 GB (time permitting)
    • www.grid.iu.edu to be retired
    • Physical move of an unused VM host from IUPUI to IUB
      • Will host GOC internal services, (monitor, jump, etc) *WMS Glide In Factory
    • Discovered on Thurs May 10 that UCSD Factory hit file descriptor limits.
      • This prevented the factory from submitting new glideins for about 48 hours.
      • We increased the ulimit levels from the default 1024 to 10240.
      • The factory went back up successfully but we hit the limit again on Friday so we increased to 50240.
      • GlideinWMS? developers are aware of it and will work on better alert system in the factory to notify of this kind of problem.
    • Began testing Condor 7.6.7 testing on ITB Factory. Plan to upgrade GOC on next Production window.
    • An RSV probe to monitor the health of the UCSD factory will be investigated

Operations This Week

  • ITB Release today - Change log notification will be sent upon release.
    • Rebuilding rsvprocess1/2 with larger disk space (8G > 128G) MYOSG-43
    • Adding /campus URL for blogs [Requested by Derek W.]
    • Setting up InCommon? https access for gratiaweb (service itself needs to be updated still) GRATIAWEB-14

Campus Infrastructures / HTPC (Dan, Brooklin)

  • The Bosco June 30 release date is in jeapordy because the proposed completion date for the Condor file transfer capabiility is June 22.
  • Bosco w/Campus Factory testing is ongoing with positive feedback coming in.
  • Getting some best practices published and deployed campus grid profiles. Need to send an update email to campus-grids list...
  • We have a full-featured partitionable slots configurations running in Madison now, including the defragmentation daemon. Early indications are positive.

Integration (Suchandra)

  • OSG release today
    • Fixes globus jobmanager bugs
      • Memory leak
      • Held jobs
  • Gathering information on bigger ticket items to work on later

Site Coordination (Marco)

Note that this report lists the currently active resources in OSG. If a site is down or not reporting it will not be counted. Therefore there may be fluctuations. Each line has the current number and variation from last week in parenthesis. You can find a table with current OSG and VDT versions at http://www.mwt2.org/~marco/myosgldr.php
  • Site update status (from MyOSG as of today):
    • Most recent production version is OSG 3.1.1 / 1.2.28
    • 27 (4) OSG 3.x ( 13 are 3.1.1)
    • 78 (6) OSG 1.2.X resources ( 18 are 1.2.28)
    • 2 (0) OSG 1.0.X resources ( 0 are 1.0.6)
    • 1 (0) OSG 1.0.0 resources

User Support (Chander, Mats)

Security (Kevin)

  • No new security incidents.
  • Watching new openssl denial of service vulnerability.
  • Security drill under way with selected tier 3 sites.
  • Security controls assessment under way.

The full report with links is available at https://twiki.grid.iu.edu/bin/view/Production/WeeklyProductionMeetings

Topic revision: r11 - 15 May 2012 - 20:43:31 - ScottTeige
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..