Technical Roadmap Session Notes

Introduction

Session notes for developing the OSG technical roadmap - OSG Seattle Consortium, August 22, 2006. (Notes by Alain Roy)

Straightforward Services and Tools

These are services that either exist and have been validated, or are expected to be ready on the timescale of OSG 0.6.0. In some cases they will be optionally installed on OSG; very likely all these are non-contoversial, but lets discuss. Steven Timm expressed worry about the OSG worker node package is growing in size (it's now 500GB in OSG 0.5.0). Frank says "Let's worry about it if it becomes a real issue.)

Uberftp

Request from ATLAS for data management. To be included in osg-wn-client package. This is already in ITB 0.5.0 (VDT 1.3.11 and later).

wget

Its in osg-wn-client, useful for jobs to fetch software.

Squid

Introduced first by CDF, can be used for caching periodic CRL updates from compute nodes. Other VOs will likely want to use for caching application tarballs. We recommend installing it on a separate node. It's optional.

CEMon

Resource selection service introduced/driven by D0 and Samgrid. Being tested presently in ITB and D0 sites. Turning it on is optional. More information:

BDII

Berkeley index server, will be required in OSG 0.6.0, part of the general OSG information services proposal (see InfoServicesSessionAgenda). This is a required piece of software. Question: should we run on top of MDS 2, or should we not? Frank suggests that we push it to the information services group to make a reasoned suggestion. Leigh suggests we should skip MDS 2 (though she loves it) because it is deprecated and has security flaws. Gabriele promises to make a recommendation (with Shaowen) soon.

glexec

Introduced/driven by CDF, Fermilab and OSG Security, provides authorization for glide-in jobs and other pilot-based production frameworks (eg. Panda for ATLAS). This is optional. Fermilab is developing a plugin for glexec to talk to GUMS. It is still under development at Fermilab. They need it by October 1st for auditing purposes.

WS Gram

Tests were done for scalability and reliability. Globus has fixed errors in RFT (they think it's stable now), but not yet done performance enhancements. They hope to put in speed improvements for OSG 0.6.0, but they need feedback on the schedule for OSG 0.6.0 to decide how much they can do.

VDS

LIGO wants VDS 1.4.7, once it's released.

VDS verification

Introduced by LIGO and the VDS team, vds-verify is a site verification script which should be installed on headnodes in OSG 0.6.0. LIGO gave it to the ITB, as a modified version of site-verify. Leigh confirms that this is in OSG 0.5.0.

GRATIA accounting service

This will be required, assuming it's ready. Some people are worried about its readiness. Alain points out that we don't yet have PBS and LSF probes. Steve reports that they might be close on PBS and LSF. Alain asks if Gratia supports authentication and authorization yet--in the past it didn't. No one was present that could tell us. Someone asked for the time on it's cron job to be configurable. It should work with managed fork.

Services and Discussions

Clarens

Exists but are there ways to use it more effectively? Clarens is not required. Michael Thomas recommends using it for pushing VOMS information to sites, so that local sites need to manually pull it themselves. (A script to do this is already in the VDT.) Rob Gardner thinks that we might want to discover GUMS servers as well, though other people politely disagreed, because GUMS servers are not usually accessible outside a site anyway.There was disagreement on whether or not it is useful to use Clarens for VOMS server information if it is only populated by the GOC (instead of each site populating it).

Someone (Rob?) will make a proposal for what information a VO would want from GUMS, if we could query them. What are the security implications? The proposal will be discussed by OSG ITB or Executive Team.

MDS

GT4 information services: how will these be used, by which VOs?

MDS 4 is being tested by GROW from Iowa, as well as a new information provider. Slated for OSG 0.8.0?

ESF

Discuss how this will work - with workspace manager coming from VDT, XEN being installed separately. Will be optional. Should be in OSG 0.6.0. * Need a site deployment model and usage. * Edge Services twiki

g-Lite client tools

These were requested by ATLAS to make life simpler for them. Some would be on worker node, some on DQ2 node, or both. Question: what is needed for general interoperability, and what is needed for ATLAS? Some of this is going into $APP--is that good enough?

Storage

  • gPlazma authorization service - in time for OSG 0.6.0? Unclear if it's feasible?
  • Registration of SRM-v1, SRM-v2 storage elements in OSG?
Should we have an OSG:SE package that provides DRM? The general conclusion was no, because no one is demanding it.

Ian Fisk commented on DPM Disk Space Manager He says that CMS has deployed it widely, and it's "roughly functional". DPM uses RFIO, and it's not quite compatible, but it might be fixed by September, roughly. A bigger problem is that it can't say no: it can be easily overloaded if you ask it to do too much. It works okay for small scales, but not more than 40-60 local accesses (using RFIO). There is some interest in using DPM because it's simpler than dCache, and the ACLs seems nice.

LCG-OSG interoperability

As a grid responsibility, rather than a VO responsibility.
  • 10 min presentation by Ian Fisk on how CMS submits jobs transparently across OSG & LCG, and what they had to do to an OSG 0.4.1 site to make this happen. 10 min Q & A for Ian.

Ian Fisk talked about the importance of interoperability. CMS uses 1/3 OSG and 2/3 LCG. CMS consolidated to a single VOMS instance, which was really useful. Although information services have gotten better, they still have problmes, with either not having all the information they need, or ambiguity in what the information needs. They have asked some sites to install extra tools for LCG components, such as client tools for transferring data, though they agree that it's a good idea to deploy them on the fly. There are issues with gLite using Condor as a local batch queue, and US CMS sites care about Condor. Miron Livny offered help. CMS uses MonaLisa? heavily. They use Gratia output to manually fill out an Excel worksheet for WLCG accounting information, but that's okay for now.

  • 10 min presentation by Leigh the current and 0.6.0 plans toward EGEE/WLCG Interoperability

OSG advertises some sites as "interoperable". There is a new "Ops" VO that is accepted at CERN. LCG folks will run site functionality tests using the Ops VO. Planning to deply site functional tests developed by EGEE as part of OSG 0.6.0. Can exchange tickets with GGUS (EGEE's trouble ticketing). We are coordinating with EGEE Operation. With every OSG release, we need to test interoperability, and that's a lot of work. EGEE queries each site's BDII/MDS constantly (many resource brokers querying every two minutes), which introduces a lot of load on the site that runs the BDII. Ian Fisk suggests that we use a more hierarchical model to avoid load, and then reduce the frequency of queries. Miron Livny suggests aggregating information, perhaps with CEMon.

One proposal: complete glite UI on worker nodes. Counter proposal: get someone at LCG to install it in $APP. We can get glite UI as a tarball--doesn't have to be an RPM. Frank suggests that since the site functional tests require these, that it be up to the sites to install it if they want to be interoperable. Perhaps an "interop-client" package.

There was disagreement about whether or not OSG can submit jobs to LCG sites with Condor-G directly, or must use the Resource Broker. Action Item: Ruth should get a statement from Ian Bird that we can submit jobs with Condor-G directly.

Condor-C

What is the roadmap for deployment in EGEE?

Headnode load issues

Schedule for 0.6.0

  • What we know about the schedule: February 07?

Frank proposed December 1. Leigh proposed February 1. Ruth thinks February 1 is too late, and is willing to compromise on functionality.

What is the minimum set of functionality that we need to make it worthwhile to have OSG 0.6.0?

A December 1 deadline for 0.6.0 suggests a VDT 1.3.12 release by mid to late September. People are worried that not enough will be in OSG 0.6.0 so that people will not want to bother updating. Perhaps it should be a February 1 deadline instead.

Miron Livny proposes that this is not the right forum for this decision, and Ruth Pordes concurred. A smaller forum will discuss it.

Reference links

-- RobGardner - 21 Aug 2006

Topic attachments
I Attachment Action Size Date Who Comment
pptppt OSG-EGEE-interop.ppt manage 368.0 K 22 Aug 2006 - 22:28 LeighGrund talk
Topic revision: r10 - 16 Dec 2008 - 16:16:03 - KyleGross
Integration.TechnicalRoadmapSession moved from Integration.ConsortiumMeetingTechnicalRoadmapSession on 21 Aug 2006 - 16:37 by RobGardner - put it back
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback