-- DanFraser - 08 Feb 2011

Action/Significant Items:

  • The RSV accounting problem reported last week at BNL is still open. (Xin) Scott reported that the problems have been identified and the changes were resubmitted to the LCG. However the GridView monitor is not yet displaying the correct results. Scott to follow up with the WLCG.
  • Brian noted that several sites reported Gratia probes being hung as a result of the ongoing networking issues at FNAL. Scott to follow up with Steve T. to identify and contact the affected sites.
  • An operations problem was discovered whereby kernel updates were being pulled from a Nebraska Yum repository. The main fix is to specify which YUM repositories will be used explicitly during the upgrade process. (Scott)
  • This brings into focus a larger problem on the OSG that we currently do not have any "non hand built" YUM repositories that we can depend on for OSG software distribution. (Brian)
  • Dan spoke with John-Paul Robinson of SuraGrid?. They are planning to become an OSG VO soon, perhaps to announce at the AHM. Dan suggested that interested parties join the VO calls as an entry point to OSG.
  • No production call next week due to the All Hands Meeting

Attendees:

  • Xin, Armen, Brian, Suchandra, Burt, Marco, Scott T., Chander, Dan

CMS (Burt)

  • Last week: 174 khour/day, 94% success

Atlas (Armen & Xin)

  • General production status
    • LHC beam commissioning in progress. Discussion in the collaboration of the upcoming challenges to accommodate the new data, possible increase in event size and cpu usage due to higher pile-up contribution.
    • ATLAS production was quite stable during the week at the average level of 10-12k running jobs, mainly simulation jobs. By the end of the week start of the heavy ion data reprocessing.

  • Job statistics for last week.
    • Gratia report: USATLAS ran 2.1M jobs, with CPU/Walltime ratio of 86%.
    • Panda world-wide production report (real jobs):
      • completed 1.4M managed group, MC production, validation and reprocessing jobs
      • average 200K jobs per day
      • failed 113K jobs
      • average efficiency: jobs - 92%, walltime - 94%
  • Data Transfer statistics for last week
    • BNL T1 data transfer rate was around 200~350TB/day in last week.
  • Issues
    • Recalculation of BNL site availability and reliability, with GOC and WLCG -- ongoing, no confirmation yet

LIGO (Britta, Robert E.)

Grid Operations Center (Rob Q.)

Operations last Week

  • Long (8 hour) maintenance window. OS updates will required reboots and brief service interruptions.
  • Production release. Release notes are available.
    • OIM 2.31 correct bug preventing registration ticket generation, improved logging
    • GOC Ticket 1.33 cosmetic and internal changes
    • TWiki FNAL KCA certificates now accepted correctly
    • MyOSG? 1.31 Added user feedback form, cosmetic changes
    • OSG Display Improved sql queries
  • Gratia Kernel Upgrades Rolling Downtimes, should not affect users.
  • Ongoing network issues on first floor of Feynmann at FNAL.

Operations this week

  • Rob, Kyle and Scott at AHM next week.
  • ITB release today
    • Added OIM Search box to MyOSG? that allows user to search various OIM entities which returns links to various OSG entities related to each entities. If no result is found, it provide link to OSG custom search provided by Google.
    • Various cosmetic / efficiency updates to MyOSG?
  • Accumulating data on memory use monitors/alarms.

Engage (Mats, John)

Integration (Suchandra)

  • OSG 1.2.19 testing
    • Changes
    • In VTB testing, should move to ITB soon
  • HTPC
    • Getting sites configured, should have things set at OUHEP and UC by AHM
    • BNL will take longer

Site Coordination (Marco)

Note that this report lists the currently active resources in OSG. If a site is down or not reporting it will not be counted. Therefore there may be fluctuations. Each line has the current number and variation from last week in parenthesis. You can find a table with current OSG and VDT versions at http://www.mwt2.org/~marco/myosgldr.php
  • Site update status (from MyOSG as of today):
    • Most recent production version is OSG 1.2.18
    • 95 (1) OSG 1.2.X resources ( 8 are 1.2.18, 6 are 1.2.17)
    • 3 (0) OSG 1.0.X resources ( 0 are 1.0.6)
    • 2 (0) OSG 1.0.0 resources
    • 1 (0) OSG 0.8.0 resources
No OSG site coordination next week, "Talk with the experts session" at the OSG all hands meeting Survey about cluster expertise in OSG reopened (if you did not respond it the first time, please respond this time, it will take few minutes):

Virtual Organizations Group (Chander)

Security (Mine)

The full report with links is available at https://twiki.grid.iu.edu/bin/view/Production/WeeklyProductionMeetings

Topic revision: r7 - 01 Mar 2011 - 22:12:57 - DanFraser
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..