Using Worker Node Client


The Worker Node Client is a collection of useful software components that is guaranteed to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that your job can use to locate handy resources.

This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node.

The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself. Sites are not required to provide any specific tools (gcc, svn, lapack, blas, etc.) beyond the ones in the OSG worker node client.

The Environment

All OSG sites should be configured so that the Worker Node Client software is in the default environment of the job and some variables are already defined when your job starts running.

[user@client ~]$ globus-job-run /usr/bin/printenv

Common software available on worker nodes.

The OSG worker node client (called the wn-client package) contains the following software:

  • The site's supported set of CA certificates (located in $X509_CERT_DIR after the environment is set up)
  • Proxy management tools:
    • Create proxies: voms-proxy-init and grid-proxy-init
    • Show proxy info: voms-proxy-info and grid-proxy-info
    • Destroy the current proxy: voms-proxy-destroy and grid-proxy-destroy
  • Data transfer tools:
    • HTTP/plain FTP protocol tools (via system dependencies):
      • wget and curl: standard tools for downloading files with HTTP and FTP
    • SRM clients
      • LCG SRM Client (lcg-cp and others)
      • LBNL SRM Client (srm-copy and others)
      • FNAL SRM Client (srmcp and others)
    • GridFTP client
      • Globus GridFTP client (globus-url-copy)
      • UberFTP, another command-line client for GridFTP; covers a wider variety of the GridFTP protocol than just copying
    • Site-specific protocols
      • DCache client: dccp, a client specifically for sites running dCache, on about 510 of the 80 OSG sites
  • MyProxy client tools

Advanced users can list the content of the RPM

Directories in the Worker Node Environment

The following table outlines the various important directories for the worker node environment. A job running on an OSG worker node can refer to each directory using the corresponding environment variable.

Environment Variable Purpose of a directory Notes
$OSG_APP Location for users to install software. This directory is used by VO's to install software that will be used when running on the cluster. For example, a VO may install the BLAST executable here. The VO would submit jobs that executed the blast executable. Access to this area varies from site-to-site. Most sites allow software installation only from the head node (jobmanager-fork), while others require you to access it via a special job description of VOMS role.
$OSG_DATA Data files that are accessible via NFS from all batch slots, read-write. Not all OSG sites have deployed this area. You can test for existance by comparing the environment variable $OSG_DATA to UNAVAILABLE. If true, then the directory is not available.
$OSG_WN_TMP Temporary storage area in which your job(s) run Local to each batch slot. Create a directory under this as your work area. See NOTE below.

Be careful with using $OSG_WN_TMP, this directory might be shared with other VOs. We recommend the following code (suppose gpn is your VO's name):

mkdir -p $OSG_WN_TMP/gpn
export mydir=`mktemp -d -t gpn`
cd $mydir
# Run the rest of your application
rm -rf $mydir

A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP on the fly to point to this directory.

There is no way to know in advance how much scratch disk space any given worker node has available, as OSG information systems don't advertise this. Most of the times, it is shared among a number of job slots.

The following query looks for storage directories for sites that support CMS (change "VO:cms" with the name of your VO):
condor_status -pool -const 'stringListIMember("VO:cms", GlueCEAccessControlBaseRule)' -format 'CE: %s ' GlueCEUniqueID -format 'OSG_WN_SCRATCH: %s ' GlueSubClusterWNTmpDir -format 'OSG_DATA: %s ' GlueCEInfoDataDir -format 'OSG_APP: %s\n' GlueCEInfoApplicationDir | sort | uniq
The reason the query is so long is due to the extensive format strings. The above query shows the scratch, app, and data directories - one line per CE. Here's a few lines of output:
CE: Scratch: /state/partition1 OSG_DATA: /lustre/hep/osg OSG_APP: /lustre/home/antaeus/apps
CE: Scratch: /state/partition1 OSG_DATA: /lustre/hep/osg OSG_APP: /lustre/home/antaeus/apps
CE: Scratch: /condor/osg_wn_tmp OSG_DATA: /nfs/osg-data OSG_APP: /nfs/osg-app
CE: Scratch: /osg/tmp OSG_DATA: /osg/data OSG_APP: /osg/app
CE: Scratch: /osg/tmp OSG_DATA: /osg/data OSG_APP: /osg/app
CE: Scratch: /osg/tmp OSG_DATA: /osg/data OSG_APP: /osg/app
CE: Scratch: /osg/tmp OSG_DATA: /osg/data OSG_APP: /osg/app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /wntmp OSG_DATA: /raid2/osg-data OSG_APP: /raid1/osg-app
CE: Scratch: /hadoop/tmp OSG_DATA: /data/se/osg OSG_APP: /sharesoft/osg/app


Topic revision: r10 - 07 Feb 2017 - 17:55:43 - BrianBockelman
Hello, TWikiGuest!


TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..