Validating Compute Element

Please note: This documentation is for OSG 1.2. While we still provide critical security updates for OSG Software 1.2, we recommend you use OSG Software 3 for any new or updated installations. We are considering May 31, 2013 as possible OSG 1.2 End of Life (EOL).

ReleaseDocumentation
ValidatingComputeElement
Owner SuchandraThapa
Area ComputeElement
Role SysAdmin
Type HowTo
Reviewer Tester Owner SuchandraThapa
Not Ready Not Ready Not Released

Introduction

The Compute Element Validating Procedure is done in the final stages of preparing a major OSG Software Release. As part of this there are a number of functionalities that need to be validated. Typically a validation table is made with a column for each of the tests below and a row for each Integration site that is being tested.

About this Document

Each section of this document provides instructions on how to test functionality of one of the services that runs on the Compute Element.

on

Compute Element Validation Script

hand The site_verify.pl script can be used by site administrators to validate the correct operation of a local or remote Compute Element. Much of its functionality has also been incorporated into RSV.

The script executes a number of tests by running a sequence of fork jobs on the Gatekeeper. Tests that depend on the successful execution of other tests will not be executed and the script will return with an error. Tests of MonALISA and Ganglia are optional and do not indicate a problem upon failure.

Requirements

Instructions

Use Perl to execute the script as a user on the command line.

[user@client ~]$ perl ~/verify/site_verify.pl

HELP NOTE
The script provides a help page by providing -help as an argument to the script.

[user@client ~]$ perl ~/verify/site_verify.pl

===============================================================================
Info: Site verification initiated at Fri Feb 26 19:50:03 2010 GMT.
===============================================================================
-------------------------------------------------------------------------------
-------- Begin uct3-edge7.uchicago.edu at Fri Feb 26 19:50:03 2010 GMT --------
-------------------------------------------------------------------------------
Checking prerequisites needed for testing: PASS
Checking for a valid proxy for sthapa@uct3-edge7.uchicago.edu: PASS
Checking if remote host is reachable: PASS
Checking for a running gatekeeper: YES; port 2119
Checking authentication: PASS
Checking 'Hello, World' application: PASS
Checking remote host uptime: PASS
   13:50:06 up 37 days,  5:44,  1 user,  load average: 0.86, 0.59, 0.54
Checking remote Internet network services list: PASS
Checking remote Internet servers database configuration: PASS
Checking for GLOBUS_LOCATION: /opt/itb-1.1.18/globus
Checking expiration date of remote host certificate: Feb 22 22:48:45 2011 GMT
Checking for gatekeeper configuration file: YES
  /opt/itb-1.1.18/globus/etc/globus-gatekeeper.conf
Checking users in grid-mapfile, if none must be using Prima: alice,cdf,cigi,compbiogrid,des,dosar,engage,fermilab,geant4,glow,gluex,gpn,grase,gridunesp,grow,i2u2,icecube,ilc,jdem,ligo,mis,nanohub,nwicg,nysgrid,ops,osg,osgedu,samgrid,sbgrid,star,usatlas1,uscms01
Checking for remote globus-sh-tools-vars.sh: YES
Checking configured grid services: PASS
  jobmanager,jobmanager-fork,jobmanager-managedfork,jobmanager-pbs
Checking for OSG osg-attributes.conf: YES
Checking scheduler types associated with remote jobmanagers: PASS
  jobmanager is of type managedfork
  jobmanager-fork is of type managedfork
  jobmanager-managedfork is of type managedfork
  jobmanager-pbs is of type pbs
Checking for paths to binaries of remote schedulers: PASS
  Path to managedfork binaries is .
  Path to pbs binaries is /usr/local/bin
Checking remote scheduler status: PASS
  pbs : 5 jobs running, 2 jobs idle/pending
Checking if Globus is deployed from the VDT: YES; version 2.0.99p14
Checking for OSG version: NO
Checking for OSG grid3-user-vo-map.txt: YES
  ops users: ops
  i2u2 users: i2u2
  geant4 users: geant4
  osgedu users: osgedu
  nanohub users: nanohub
  gridex users: gridex
  cdf users: cdf
  DOSAR users: dosar
  nwicg users: nwicg
  osg users: osg
  usatlas users: usatlas1,usatlas2,usatlas3,usatlas4
  engage users: engage
  star users: star
  uscms users: uscms01
  grase users: grase
  glow users: glow
  fermilab users: fermilab
  dzero users: sam,samgrid
  compbiogrid users: compbiogrid
  mis users: mis
  des users: des
  sdss users: sdss
Checking for OSG site name: UC_ITB
Checking for OSG $GRID3 definition: /opt/itb-1.1.18
Checking for OSG $OSG_GRID definition: /opt/wn
Checking for OSG $APP definition: /share/osg/app
Checking for OSG $DATA definition: /share/osg/data
Checking for OSG $TMP definition: /share/osg/data
Checking for OSG $WNTMP definition: /scratch
Checking for OSG $OSG_GRID existence: FAIL
Checking for OSG $APP existence: PASS
Checking for OSG $DATA existence: PASS
Checking for OSG $TMP existence: PASS
Checking for OSG $APP writability: PASS
Checking for OSG $DATA writability: PASS
Checking for OSG $TMP writability: PASS
Checking for OSG $APP available space: 438.250 GB
Checking for OSG $DATA available space: 438.250 GB
Checking for OSG $TMP available space: 438.250 GB
Checking for OSG additional site-specific variable definitions: YES
  MountPoints
    ATLAS_APP prod /osg/app/atlas_app
    ATLAS_DATA prod /osg/data/atlas_data
    ATLAS_DQ2Cli prod /osg/data/atlas_app/dq2_cli/DQ2Cli
    ATLAS_LOC_11042 11.0.42 /osg/app/atlas_app/atlas_rel/11.0.42
    ATLAS_LOC_1105 11.0.5 /osg/app/atlas_app/atlas_rel/11.0.5
    ATLAS_LOC_1203 12.0.3 /osg/app/atlas_app/atlas_rel/12.0.3
    ATLAS_LOC_12031 12.0.31 /osg/app/atlas_app/atlas_rel/12.0.31
    ATLAS_LOC_1204 12.0.4 /osg/app/atlas_app/atlas_rel/12.0.4
    ATLAS_LOC_1205 12.0.5 /osg/app/atlas_app/atlas_rel/12.0.5
    ATLAS_LOC_1206 12.0.6 /osg/app/atlas_app/atlas_rel/12.0.6
    ATLAS_LOC_1207 12.0.7 /osg/app/atlas_app/atlas_rel/12.0.7
    ATLAS_LOC_1208 12.0.8 /osg/app/atlas_app/atlas_rel/12.0.8
    ATLAS_LOC_12095 12.0.95 /osg/app/atlas_app/atlas_rel/12.0.95
    ATLAS_LOC_13010 13.0.10 /osg/app/atlas_app/atlas_rel/13.0.10
    ATLAS_LOC_13020 13.0.20 /osg/app/atlas_app/atlas_rel/13.0.20
    ATLAS_LOC_13030 13.0.30 /osg/app/atlas_app/atlas_rel/13.0.30
    ATLAS_LOC_13035 13.0.35 /osg/app/atlas_app/atlas_rel/13.0.35
    ATLAS_LOC_13040 13.0.40 /osg/app/atlas_app/atlas_rel/13.0.40
    ATLAS_LOC_1400 14.0.0 /osg/app/atlas_app/atlas_rel/14.0.0
    ATLAS_LOC_14010 14.0.10 /osg/app/atlas_app/atlas_rel/14.0.10
    ATLAS_LOC_1410 14.1.0 /osg/app/atlas_app/atlas_rel/14.1.0
    ATLAS_LOC_1420 14.2.0 /osg/app/atlas_app/atlas_rel/14.2.0
    ATLAS_LOC_14210 14.2.10 /osg/app/atlas_app/atlas_rel/14.2.10
    ATLAS_LOC_14211 14.2.11 /osg/app/atlas_app/atlas_rel/14.2.11
    ATLAS_LOC_14220 14.2.20 /osg/app/atlas_app/atlas_rel/14.2.20
    ATLAS_LOC_14221 14.2.21 /osg/app/atlas_app/atlas_rel/14.2.21
    ATLAS_LOC_14222 14.2.22 /osg/app/atlas_app/atlas_rel/14.2.22
    ATLAS_LOC_14223 14.2.23 /osg/app/atlas_app/atlas_rel/14.2.23
    ATLAS_LOC_14224 14.2.24 /osg/app/atlas_app/atlas_rel/14.2.24
    ATLAS_LOC_14225 14.2.25 /osg/app/atlas_app/atlas_rel/14.2.25
    ATLAS_LOC_1440 14.4.0 /osg/app/atlas_app/atlas_rel/14.4.0
    ATLAS_LOC_1450 14.5.0 /osg/app/atlas_app/atlas_rel/14.5.0
    ATLAS_LOC_1451 14.5.1 /osg/app/atlas_app/atlas_rel/14.5.1
    ATLAS_LOC_1452 14.5.2 /osg/app/atlas_app/atlas_rel/14.5.2
    ATLAS_LOC_GCC 3.2 /osg/app/atlas_app/gcc32
    ATLAS_LOC_GCE prod /osg/app/atlas_app/GCE-Server/gce-server
    ATLAS_LOC_KitVal prod /osg/app/atlas_app/atlas_rel/kitval/KitValidation
    ATLAS_LOC_Trfs prod /osg/app/atlas_app/Atlas-Trfs/atlas-trfs
    ATLAS_PYTHONHOME prod /osg/app/atlas_app/python
    ATLAS_STAGE prod /osg/data/atlas_data
    ATLAS_WN_Client prod /osg/app/atlas_app/atlaswn
    ATLAS_WN_Client_ prod /osg/app/atlas_app/atlaswn--2010-02-25
    SAMPLE_LOCATION default /SAMPLE-path
    SAMPLE_SCRATCH devel /SAMPLE-path
    VO-atlas-AtlasPoint1-14.1.0.13-i686-slc4-gcc34-opt AtlasPoint1 /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-12.0.3.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.3
    VO-atlas-AtlasProduction-12.0.3.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.3
    VO-atlas-AtlasProduction-12.0.31.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.3 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.4 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.5 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.6 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.7 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.31.8 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.31
    VO-atlas-AtlasProduction-12.0.4.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.4
    VO-atlas-AtlasProduction-12.0.4.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.4
    VO-atlas-AtlasProduction-12.0.5.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.5
    VO-atlas-AtlasProduction-12.0.5.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.5
    VO-atlas-AtlasProduction-12.0.5.3 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.5
    VO-atlas-AtlasProduction-12.0.6.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.6
    VO-atlas-AtlasProduction-12.0.6.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.6
    VO-atlas-AtlasProduction-12.0.6.3 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.6
    VO-atlas-AtlasProduction-12.0.6.4 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.6
    VO-atlas-AtlasProduction-12.0.6.5 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.6
    VO-atlas-AtlasProduction-12.0.7.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.7
    VO-atlas-AtlasProduction-12.0.7.2 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.7
    VO-atlas-AtlasProduction-12.0.8.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.8
    VO-atlas-AtlasProduction-12.0.95.1 AtlasProduction /osg/app/atlas_app/atlas_rel/12.0.95
    VO-atlas-AtlasProduction-13.0.10.1 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.10
    VO-atlas-AtlasProduction-13.0.20.1 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.20
    VO-atlas-AtlasProduction-13.0.20.2 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.20
    VO-atlas-AtlasProduction-13.0.20.3 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.20
    VO-atlas-AtlasProduction-13.0.30.1 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.30
    VO-atlas-AtlasProduction-13.0.30.2 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.30
    VO-atlas-AtlasProduction-13.0.30.3 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.30
    VO-atlas-AtlasProduction-13.0.30.4 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.30
    VO-atlas-AtlasProduction-13.0.30.5 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.30
    VO-atlas-AtlasProduction-13.0.40.1 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.40
    VO-atlas-AtlasProduction-13.0.40.2 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.40
    VO-atlas-AtlasProduction-13.0.40.3 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.40
    VO-atlas-AtlasProduction-13.0.40.4 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.40
    VO-atlas-AtlasProduction-13.0.40.5 AtlasProduction /osg/app/atlas_app/atlas_rel/13.0.40
    VO-atlas-AtlasProduction-14.0.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.0.0
    VO-atlas-AtlasProduction-14.0.10-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.0.10
    VO-atlas-AtlasProduction-14.1.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-14.1.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-14.1.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-14.1.0.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-14.1.0.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.1.0
    VO-atlas-AtlasProduction-14.2.0-i686-slc4-gcc34 AtlasProduction /osg/app/installtest/atlas_app/atlas_rel/14.2.0
    VO-atlas-AtlasProduction-14.2.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.0
    VO-atlas-AtlasProduction-14.2.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.0
    VO-atlas-AtlasProduction-14.2.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.0
    VO-atlas-AtlasProduction-14.2.10-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.10
    VO-atlas-AtlasProduction-14.2.10.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.10
    VO-atlas-AtlasProduction-14.2.11-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.11
    VO-atlas-AtlasProduction-14.2.20-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.20
    VO-atlas-AtlasProduction-14.2.20.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.20
    VO-atlas-AtlasProduction-14.2.20.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.20
    VO-atlas-AtlasProduction-14.2.20.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.20
    VO-atlas-AtlasProduction-14.2.21-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.21
    VO-atlas-AtlasProduction-14.2.21.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.21
    VO-atlas-AtlasProduction-14.2.22-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.22
    VO-atlas-AtlasProduction-14.2.23-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasProduction-14.2.23.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasProduction-14.2.23.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasProduction-14.2.23.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasProduction-14.2.23.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasProduction-14.2.24-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.24
    VO-atlas-AtlasProduction-14.2.24.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.24
    VO-atlas-AtlasProduction-14.2.25-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.10-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.11-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.7-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.8-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.2.25.9-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.2.25
    VO-atlas-AtlasProduction-14.4.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.4.0
    VO-atlas-AtlasProduction-14.4.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.4.0
    VO-atlas-AtlasProduction-14.5.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.0.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.0.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.0.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.0
    VO-atlas-AtlasProduction-14.5.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.1.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.1.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.1.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.1.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.1.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.1
    VO-atlas-AtlasProduction-14.5.2-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.10-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.11-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.12-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-14.5.2.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/14.5.2
    VO-atlas-AtlasProduction-15.0.0-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.0.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.0.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.0.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.0.0.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.0.0.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasProduction-15.1.0-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.7-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.1.0.8-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.1.0
    VO-atlas-AtlasProduction-15.2.0-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.2.0
    VO-atlas-AtlasProduction-15.2.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.2.0
    VO-atlas-AtlasProduction-15.3.0-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.0
    VO-atlas-AtlasProduction-15.3.0.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.0
    VO-atlas-AtlasProduction-15.3.0.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.0
    VO-atlas-AtlasProduction-15.3.1-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.1-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.10-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.11-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.12-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.13-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.14-i686-slc4-gcc34-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.20-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.21-i686-slc4-gcc34-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.7-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.8-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.3.1.9-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.3.1
    VO-atlas-AtlasProduction-15.4.0-i686-slc4-gcc34 AtlasProduction /osg/app/atlas_app/atlas_rel/15.4.0
    VO-atlas-AtlasProduction-15.4.1-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.4.1
    VO-atlas-AtlasProduction-15.5.0-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.5.0
    VO-atlas-AtlasProduction-15.5.1-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasProduction-15.5.2-i686-slc4-gcc34 AtlasProduction /osg/app/atlas_app/atlas_rel/15.5.2
    VO-atlas-AtlasProduction-15.5.3-i686-slc4-gcc34 AtlasProduction /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasProduction-15.5.4-i686-slc4-gcc34 AtlasProduction /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasProduction-15.5.5-i686-slc4-gcc34-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.5.5
    VO-atlas-AtlasProduction-15.6.0-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.0
    VO-atlas-AtlasProduction-15.6.0.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.0
    VO-atlas-AtlasProduction-15.6.1-i686-slc4-gcc34 AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.2-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.4-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.5-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.6-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.1.7-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.1
    VO-atlas-AtlasProduction-15.6.3-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.1-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.2-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.3-i686-slc4-gcc34-opt AtlasProduction /osg/app/atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.4-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.5-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.6-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.7-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.8-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasProduction-15.6.3.9-i686-slc5-gcc43-opt AtlasProduction /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-AtlasTier0-14.2.23.2-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasTier0-14.2.23.3-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.2.23
    VO-atlas-AtlasTier0-14.2.24.2-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.2.24
    VO-atlas-AtlasTier0-14.2.24.3-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.2.24
    VO-atlas-AtlasTier0-14.2.24.4-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.2.24
    VO-atlas-AtlasTier0-14.4.0.1-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.4.0
    VO-atlas-AtlasTier0-14.4.0.2-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/14.4.0
    VO-atlas-AtlasTier0-15.0.0.1-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasTier0-15.0.0.2-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasTier0-15.0.0.3-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.0.0
    VO-atlas-AtlasTier0-15.2.0.13 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.2.0
    VO-atlas-AtlasTier0-15.2.0.16 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.2.0
    VO-atlas-AtlasTier0-15.4.0.1 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.4.0
    VO-atlas-AtlasTier0-15.4.0.4 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.4.0
    VO-atlas-AtlasTier0-15.4.1.1 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.4.1
    VO-atlas-AtlasTier0-15.5.0.1 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.0
    VO-atlas-AtlasTier0-15.5.0.2 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.0
    VO-atlas-AtlasTier0-15.5.1.1 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.2 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.3 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.4 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.5 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.6 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.7 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.1.8 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.1
    VO-atlas-AtlasTier0-15.5.2.2 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.2
    VO-atlas-AtlasTier0-15.5.2.4 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.2
    VO-atlas-AtlasTier0-15.5.2.5 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.2
    VO-atlas-AtlasTier0-15.5.3.10 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.11-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.3 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.4 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.5 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.6 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.8 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.3.9 AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.3
    VO-atlas-AtlasTier0-15.5.4.1-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.10-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.11-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.12-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.2-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.20-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.21-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.3-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.4-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.6-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.7-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.8-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-AtlasTier0-15.5.4.9-i686-slc4-gcc34-opt AtlasTier0 /osg/app/atlas_app/atlas_rel/15.5.4
    VO-atlas-TopPhys-15.6.3.5.2-i686-slc5-gcc43-opt TopPhys /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-TopPhys-15.6.3.6.1-i686-slc5-gcc43-opt TopPhys /osg/app//atlas_app/atlas_rel/15.6.3
    VO-atlas-WZBenchmarks-15.5.4.1.1-i686-slc4-gcc34-opt WZBenchmarks /osg/app//atlas_app/atlas_rel/15.5.4
    VO-atlas-gcc-x86_64-slc5 gcc /osg/app//atlas_app/atlas_rel/atlas-gcc/432/x86_64
Checking for OSG execution jobmanager(s): uct3-edge7.uchicago.edu/jobmanager-pbs
Checking for OSG utility jobmanager(s): uct3-edge7.uchicago.edu/jobmanager-pbs
Checking for OSG sponsoring VO: osg:100
Checking for OSG policy expression: NONE
Checking for OSG setup.sh: YES
Checking for OSG $Monalisa_HOME definition: FAIL
Checking for MonALISA configuration: UNTESTED
Checking for a running MonALISA: UNTESTED
Checking for a running GANGLIA gmond daemon: PASS (pid 3543 ...)
  /usr/sbin/gmond
  name = "part_max_used"
  owner = "unspecified"
  url = "unspecified"
Checking for a running GANGLIA gmetad daemon: NO
  gmetad does not appear to be running
Checking for a running gsiftp server: YES; port 2811
Checking gsiftp (local client, local host -> remote host): PASS
Checking gsiftp (local client, remote host -> local host): PASS
Checking that no differences exist between gsiftp'd files: PASS
-------------------------------------------------------------------------------
--------- End uct3-edge7.uchicago.edu at Fri Feb 26 19:52:52 2010 GMT ---------
-------------------------------------------------------------------------------
===============================================================================
Info: Site verification completed at Fri Feb 26 19:52:52 2010 GMT.


Included topic: Validate GUMS


* why is this a stand-alone-document not part of the installation instructions? * ST--there used to be a whole Validation section as part of the Integration web which got shoved into the ReleaseDocumentation? web as part of an earlier doc release, the osg 1.0 release I think. The Validate documents used to be worked through one by one by the ITB sites and they would have to check off a table to say they completed the validation. * what documents include this document? * Strike all references to CompatibilityAuthPreconfig?.. ST

GUMS server

Note: This only applies if you are using the Full Privilege Authorization or the Compatibility Authorization? mode.

On your GUMS server, run

$VDT_LOCATION/tomcat/v55/webapps/gums/WEB-INF/scripts/gums-add-mysql-admin "{Your user DN}"

With your user certificate loaded in your browser, use the Administration interface at

https://hostname:port/gums
  • Check Summary link on navigation panel and review results
  • Select Update V0 Members and check for errors and updates in
    $VDT_LOCATION/tomcat/v55/logs/gums-service-admin.log

CE GUMS-client

Note: This only applies if you are using the Full Privilege Authorization mode.

  • Check that the server is pointed to in $VDT_LOCATION/gums/config/gums-client.properties
  • For use of gums to maintain grid-mapfile:
    • execute
       gums-host-cron --gumsdebug
      Look for error messages or premature termination.
    • Review those files created and documented in OSG_VOs on CE, namely:
             /etc/grid-security/grid-mapfile
             $VDT_LOCATION/osg/etc/osg-user-vo-map.txt
             $VDT_LOCATION/osg/etc/osg-undefined-accounts.txt
             $VDT_LOCATION/osg/etc/osg-supported-vos-list.txt 
      
  • GT2 prima callout
    • get a voms-proxy
    • submit a job using globus-job-run
    • review $VDT_LOCATION/globus/var/globus-gatekeeper.log file for PRIMA INFO records associated with job. Should find records to identify the GUMS server contacted, the User DN, and local account assigned
    • can crosscheck the GUMS server response in $VDT_LOCATION/tomcat/v55/logs/gums-service-admin.log using timestamps
  • GT4 prima callout
    • get a voms-proxy
    • submit a job using globusrun-ws
    • review $VDT_LOCATION/globus/var/container-real.log file for PRIMA GT4 AUTHORIZATION records associated with job. Should find records to identify the GUMS server, the user DN, and local account assigned
    • can crosscheck the GUMS server response in $VDT_LOCATION/tomcat/v55/logs/gums-service-admin.log using timestamps
  • If these tests succeed, make DONE entry in the validation table for your site.


Included topic: Validate Gram Web Services


Quick server-side checks

  • Check that container is running: ps auxw | grep org.globus.wsrf.container.ServiceContainer
  • Check permissions are configured: grep $VDT_LOCATION /etc/sudoers This file should have permissions of 440 so that only root can read this. However, this may vary depending on your os and distribution.
  • Check log file records connections ..e.g. tail -f $VDT_LOCATION/globus/var/container-real.log

Client-side test of WS-GRAM services

Documentation from Globus on how to execute jobs via WS-GRAM can be found here.

Command Line tests

  • Test container functions: globusrun-ws -submit -F hostname:9443 -c /bin/true
    ruly % globusrun-ws -submit -F hostname:9443 -c /bin/true
    Submitting job...Done.
    Job ID: uuid:cc1b94f0-6c61-11dc-8201-000476f3dd75
    Termination time: 09/27/2007 18:53 GMT
    Current job state: Done
    Destroying job...Done.
    ruly % echo $?
    0
You can also submit a /bin/false job to verify that the value of $? is 1 after the command returns. This test will verify that the container on host hostname is up and running, and that the Fork jobmanager can run your jobs. If it fails, verify that the container is up on the server using the ps auxw command above or lsof on port 9443, and double-check that the container is not blocked by a firewall.
  • add batch jobmanager: globusrun-ws -submit -F hostname:9443 -Ft [Condor|PBS|SGE] -c /bin/true
    ruly % globusrun-ws -submit -F hostname:9443 -Ft PBS -c /bin/true
    Submitting job...Done.
    Job ID: uuid:6523ba32-6c63-11dc-a6b7-000476f3dd75
    Termination time: 09/27/2007 19:05 GMT
    Current job state: Pending
    Current job state: Active
    Current job state: Done
    Destroying job...Done.
    ruly % echo $?
    0
This tests that the container is capable of routing jobs to the batch system of your choice. If you do not see the Pending and Active states, it could be that there is a firewall between your client and the server, and that your client is not receiving notifications back from the server. In that case, the client will fall back to polling for status, and may only catch the job once it is done. The next test will help double-check that.
  • add delegation (output returns to screen): globusrun-ws -submit -F hostname:9443 -s -c /bin/hostname
    ruly % globusrun-ws -submit -F hostname:9443 -s -c /bin/hostname 
    Delegating user credentials...Done.
    Submitting job...Done.
    Job ID: uuid:4618e4f4-6c64-11dc-9d97-0007e9d81215
    Termination time: 09/27/2007 19:11 GMT
    Current job state: Active
    Current job state: CleanUp-Hold
    hostname
    Current job state: CleanUp
    Current job state: Done
    Destroying job...Done.
    Cleaning up any delegated credentials...Done.
    
This test adds a delegation step where your client delegates a proxy to the container. When the job is finished running, the job enters a state of "CleanUp-Hold", which means that the output is being saved until the client retrieves it. The output is sent back using GridFTP? to a port opened by the client. Once the client is finished getting the output, the job proceeds to a cleanup step where the saved output is deleted, then the job finishes and the delegated credentials are destroyed.

If your client is behind a firewall and your port ranges are not setup, you will instead get an error during Cleanup-Hold like:

globusrun-ws: ignoring error while streaming gsiftp://hostname:2811/home/user/5056668a-6c64-11dc-828f-000476f3dd75.0.stdout:
globus_ftp_client_state.c:globus_i_ftp_client_response_callback:3260:
the server responded with an error
500 500-Command failed. : globus_xio: Unable to connect to client.ip.address:59459
500-globus_xio: System error in connect: No route to host
500-globus_xio: A system call failed: No route to host
500 End.

  • add delegation + batch (output returns to screen): globusrun-ws -submit -F hostname:9443 -Ft [Condor|PBS|SGE] -s -c /bin/hostname
    Delegating user credentials...Done.
    Submitting job...Done.
    Job ID: uuid:54ab01a4-6c65-11dc-8e39-0007e9d81215
    Termination time: 09/27/2007 19:18 GMT
    Current job state: Pending
    Current job state: Active
    Current job state: CleanUp-Hold
    compute-node-hostname
    Current job state: CleanUp
    Current job state: Done
    Destroying job...Done.
    Cleaning up any delegated credentials...Done.
    
This test is the same as the last, except the job is routed through the batch system of your choice.
  • add simple RSL job file: globusrun-ws -submit -F hostname:9443 -Ft [Condor|PBS|SGE] -s -f hellogrid.xml
    • hellogrid.xml
      <?xml version="1.0"?>
      <!-- Simple Job Request With Arguments -->
      <job>
          <executable>/bin/echo</executable>
          <argument>Hello,</argument>
          <argument>Grid</argument>
      </job>
      
    • Delegating user credentials...Done.
      Submitting job...Done.
      Job ID: uuid:c8078226-6c65-11dc-93bd-0007e9d81215
      Termination time: 09/27/2007 19:22 GMT
      Current job state: Pending
      Current job state: Active
      Current job state: CleanUp-Hold
      Hello, Grid
      Current job state: CleanUp
      Current job state: Done
      Destroying job...Done.
      Cleaning up any delegated credentials...Done.
      This test doesn't stress any new elements of the system, but serves as an introduction to the GRAM4 RSL syntax.
  • add file-staging to remote host: globusrun-ws -submit -F hostname:9443 -Ft [Condor|PBS|SGE] -S -f hostname_file_stage.xml
    • hostname_file_stage.xml (Change YourDestinationHostName below to a machine that is running a GridFTP? server)
      <job>
           <executable>/bin/hostname</executable>
           <directory>${GLOBUS_USER_HOME}</directory>
           <stdout>${GLOBUS_USER_HOME}/hostname_stdout</stdout>
           <fileStageOut>
               <transfer>
                <sourceUrl>file:///${GLOBUS_USER_HOME}/hostname_stdout</sourceUrl>
                <destinationUrl>gsiftp://YourDestinationHostName:2811/tmp/hostname_stdout</destinationUrl>
              </transfer>
           </fileStageOut>
           <fileCleanUp>
               <deletion>
                   <file>file:///${GLOBUS_USER_HOME}/hostname_stdout</file>
               </deletion>
           </fileCleanUp>
      </job>
      
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:77df3432-6c66-11dc-a2f3-0007e9d81215
Termination time: 09/27/2007 19:27 GMT
Current job state: Active
Current job state: StageOut
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.

YourDestinationHostName $ cat /tmp/hostname_stdout 
hostname
Note that you used -S in the client command, not -s. In this example we don't use streaming back to the client at all. Instead, we stage out the results to YourDestinationHostName?. However, we still need to delegate a staging credential, and the -S flag makes that happen. Because there is no streaming, there is no Cleanup-Hold stage. Instead, there is a StageOut? stage where the file staging directives make the requested transfer.

Client-side test using Condor-G with WS-GRAM

  • Condor version must be greater than 6.8.3 for using Condor-G with WS-GRAM of GT4.0.5 on clients with local firewalls restricting ephemeral port access
    • To check your version: condor_version
    • OSG-0.6. Clients, VDT-1.6.1 have Condor versions of 6.8.3
  • Create and submit condor submit file using grid universe and gt4 host: condor_submit hostname_submit_file.cmd
    • hostname_submit_file.cmd (Specify hostname and batch resource)
      #specify resource destination
      Universe            = grid
      grid_resource       = gt4 https://hostname:9443 [Condor|PBS|SGE]
      
      # specify executable
      Executable          = /bin/hostname
      Transfer_Executable = false
       
      # copy stdout stderr to local files, referenced by job (Process) and submission id (Cluster)
      output  =  mytest.out.$(Cluster).$(Process)
      error   =    mytest.err.$(Cluster).$(Process)
       
      #a single local log file for tracking Condor-g submission
      log    = mytest.log
       
      # do not send email notification
      notification=Never
       
      # submit 2 identical jobs: Process=0 and Process=1
      Queue 2
      
    • monitor jobs progress with condor_q

Known Problems & Solutions using Condor-G with WS-GRAM

  • As mentioned above, Condor installation must above Version 6.8.3 to work with WS-GRAM Globus-4.05
  • Condor-g will start a gridftp server on the client machine. If however the Client machine is configured with a gridftp server, it will use that server which may resolve user authentication differently. This is fixed by inserting the following lined into the client's condor/libexec/gridftp_wrapper.sh:
    • GSI_AUTHZ_CONF=/doesnotexist
      export GSI_AUTHZ_CONF
      


Included topic: Validate RSVProbes


Validating RSV configuration

  • Condor_cron jobs: You can check that your RSV monitoring jobs are running in the Condor-Cron infrastructure with the following command:

    condor_cron_q
    You should see a handful of probe_wrapper.pl jobs, and a couple of other jobs (html-consumer, gratia-script-consumer, rotate_html_files), with output similar to this:
    Submitter: foo.bar.edu : <156.56.m.n:nnn> : foo.bar.edu
    ID      OWNER   SUBMITTED     RUN_TIME ST PRI SIZE CMD               
    21.0   rsvuser    5/9  16:36   0+00:00:00 I  0   0.0  probe_wrapper.pl /
       . . .
    36.0   rsvuser    5/9  16:36   0+00:00:00 I  0   0.0  probe_wrapper.pl /
    37.0   rsvuser    5/9  16:36   0+00:00:04 R 0   0.0  html-consumer     
    38.0   rsvuser    5/9  16:36   0+00:00:00 I  0   0.0  probe_wrapper.pl /
    39.0   rsvuser    5/9  16:36   0+00:00:04 R 0   0.0  gratia-script-cons
    40.0   rsvuser    5/9  16:36   0+00:00:00 I  0   0.0  rotate_html_files.

    If you do not see these jobs in Condor-Cron, then RSV jobs are not running on your system.

  • Service cert proxy (if applicable): If you setup RSV to use a service certificate (use_service_cert setting in your config.ini file), then you should see the proxy for it in /tmp/rsvproxy (or whatever path you specified in rsv_proxy_out_file in your config.ini file). This page explains how to authorize the certificate to allow RSV to run globus jobs.

  • Local RSV status page : After the first probe has run (usually ~15 minutes), you can see the results on the local status web page that is created. This page is created at $VDT_LOCATION/osg-rsv/output/html/index.html. You can either view the local status pages directly on the monitoring host ; or you can view these pages on the web at https://<FQDN_of_monitoring_host>:8443/rsv, as long as the setup_for_apache value in your config.ini file is enabled.

    Note: In order to view this page, you'll need to have a certificate loaded in your web browser that is accepted by one of the Certificate Authorities on your CE.

  • RSV status on MyOSG: After the first probe has run (usually ~15 minutes), check MyOSG's RSV resource status page - metric results for your resource, as well as resource status, should be displayed, assuming your resource is registered with the appropriate FQDN, and is active in OIM.
    • RSV output record upload to central collector (using Gratia transport): if the MyOSG URL above does not display metrics then you can start debugging by first looking at the Gratia consumer log file: $VDT_LOCATION/osg-rsv/logs/consumers/gratia-script-consumer.out to see if probe results were uploaded successfully to the GOC maintained RSV database. You should see entries similar to this:

         08-20-2007 16:35:26 - Executing script '/usr/local/grid/ . . . record.py'
         OK 

Please note: This documentation is for OSG 1.2. While we still provide critical security updates for OSG Software 1.2, we recommend you use OSG Software 3 for any new or updated installations. We are considering May 31, 2013 as possible OSG 1.2 End of Life (EOL).

ReleaseDocumentation
ValidateVoAccess
Owner SuchandraThapa
Area ComputeElement
Role SysAdmin
Type HowTo
Reviewer Tester Owner SuchandraThapa
Not Ready Not Ready Not Released

VO Access Validation

About This Document

This document is designed for the systems administrators who install and maintain Compute Elements. The purpose of this section is to provide information necessary to determine if your gatekeeper is configured to support all VOs and users that you intend to support. This verification/validation procedure is not a one-time event. You should perform whenever any changes are made to your authorization services or configuration of your site.

HELP NOTE
This is not an automated or scripted process.

To make sure that all OSG VO members can access your site, you need to verify that:

  • all UNIX accounts that can be assigned using your authorization mode have been created.
  • the OSG VOs are supported (the ones you have agreed to support).
  • all the local storage areas defined for your CE node are accessible by the VO members' UNIX accounts.

The following sections will describe in a little more detail how to go about this verification. There are several files in the $VDT_LOCATION/osg/etc directory that can aid you in this process. It is suggested you read the %SUPPORTED_VOS% to familiarize yourself with these files.

When you are satisfied that you have met the 3 criteria above, you can be reasonably confident that all VO members you intend to support will be able to access the services provided by your CE node.

When to perform this validation

This validation should be performed after:
  • you have fully installed your OSG CE node
  • configured it using the configure-osg script
  • verified that your authorization service is up-to-date
  • run edg-mkgridmap or gums-host-cron scripts (depending on your authorization mode)

Verifying that all UNIX accounts are created and accessible

  • The $VDT_LOCATION/osg/etc/osg-undefined-accounts.txt file shows any UNIX accounts that are mapped by your authorization method, but do not exist on your system. This file is created by either the edg-mkgridmap or gums-host-cron script so be sure configure-osg was run prior to checking. Refer to the %SUPPORTED_VOS% for details on how this is done. The creation of UNIX accounts and user home areas was one of the OSG CE pre-installation items that should have been completed as described in the %CE_PREINSTALL_CHECKLIST%. If all UNIX accounts exist, there should be no UNIX accounts listed and this file should be empty. If so, then you are assured that all OSG VO members that are authorized and assigned a UNIX account will be able to at least submit a job to your batch queue manager.

  • You need to verify that the $HOME directories for all the UNIX accounts are read/write accessible. They should have permissions 755.

Verifying that the appropriate VOs are supported

The $VDT_LOCATION/osg/etc/osg-supported vo-list.txt file shows the list of OSG VOs to which your site will allow access. Refer to the %SUPPORTED_VOS% for details on how this is done.

Since this file is created by either the edg-mkgridmap or gums-host-cron scripts (depending on your authorization mode), be sure you have run the appropriate script before looking here.

If the $VDT_LOCATION/osg/etc/osg-supported vo-list.txt file on your CE node matches the output from the command shown below, then you are assured that at least one member of every OSG VO will be able to run a job on your CE node.

  • wget -q -O - 'http://oim.grid.iu.edu/pub/vo/show.php?format=plain-text' | cut -f1 -d ','

If the VO names vary (or are missing), you may not have properly configured your authorization configuration files:

  • for local grid-mapfile: this is the $VDT_LOCATION/edg/etc/edg-mkgridmap.conf file on your CE node.
  • for Compatibility and Full Privilege mode: this is the $VDT_LOCATION/vdt-app-data/gums/config/gums.config file on your GUMS server.

If you are unfamiliar with the authorization mode terms used above, see About OSG CE Authorization to get a better understanding.

If you make any changes to the edg-mkgridmap.conf or gums.config configuration files, you need to do the following to effect the changes:

  1. edg-mkgridmap.conf:
    • re-run the edg-mkgridmap script
  2. gums.config:
    • have your GUMS administrator perform the Update VO members action on the GUMS server GUI.
    • then rerun the gums-host-cron script

When this is complete, you must re-verify the osg-undefined-accounts.txt and osg-supported-vo-list.txt files.

Verifying OSG local storage access

The other factor that will affect a VO member's ability to run successfully on your site is the accessibility to the OSG defined local storage.

You answered several questions related to these storage areas when you ran the configure_osg.sh script during the installation process as described in the Configuring OSG Attributes document? of the CE Installation Guide.

You should review the Local Storage Configuration document again to verify that you are configured correctly with specific emphasis on the directory permissions for these areas.


Comments


Included topic: Validate BDII

Validating BDII

  • Check the GIP validator
  • Next check your system is reporting using the following command. If your resource is a production resource, you need to query is.grid.iu.edu instead of is-itb.grid.iu.edu
[user@client ~]$ ldapsearch -x -LLL -p 2170 -h is-itb.grid.iu.edu -b mds-vo-name=YOUR_SITE_NAME_HERE,mds-vo-name=local,o=grid
  • If the information if not being reported correctly, run the following on your CE
[user@client ~]$ $VDT_LOCATION/gip/bin/gip_info
If this information is correct, then your local GIP is configured correctly.
  • If your information is still not reported to BDII, you might have a problem with the CEMon publishing service. To troubleshoot your CEMon installation see the CEMon troubleshooting guide.


Included topic: Validate Clients


OSG Client

The OSG Client package provides a set of tools useful for user-access to Grid services. Below are a series of test you can do after installing the client software via the IntegrationClientInstallationGuide? and sourcing the setup.(c)sh script to test the operability of the software.

Obtain a Grid Proxy: grid-proxy-init and voms-proxy-init

In the following examples it is assumed that your usercert.pem and userkey.pem files are in $HOME/.globus/. If not, you will need to specify them explicitly via "-cert" and "-key" arguments to the xxxx-proxy-init commands.

  • Obtain a grid-proxy via grid-proxy-init
    $ grid-proxy-init
    
    Your identity: /DC=org/DC=doegrids/OU=People/CN=XXX XXXXX #####
    Enter GRID pass phrase for this identity:
    Creating proxy ....................................... Done
    Your proxy is valid until: Wed Aug 22 04:57:38 2007
    
    

  • Obtain a voms-proxy via voms-proxy-init:
    $ voms-proxy-init -voms osg
    
    Cannot find file or dir: /home/condor/execute/dir_14135/userdir/glite/etc/vomses
    Enter GRID pass phrase:
    Your identity: /DC=org/DC=doegrids/OU=People/CN=XXX XXXXX #####
    Cannot find file or dir: /home/condor/execute/dir_14135/userdir/glite/etc/vomses
    Creating temporary proxy ................................ Done
    Contacting  voms.opensciencegrid.org:15027 [/DC=org/DC=doegrids/OU=Services/CN=host/voms.opensciencegrid.org] "osg" Done
    Creating proxy .................................... Done
    Your proxy is valid until Tue Oct 30 23:32:54 2007
    
    

Note: the apparent error message Cannot find file or dir: /home/condor/..... is not an error but an artifact of the software build used in the VDT. You should ignore that message.

Querying Resources

HELP NOTE
This section refers to the ldapsearch command. The ldapsearch command is not part of the standard OSG installation, it is commonly found on many standard linux installations. If you do not have it, it is part of the VDT. Install the OpenLDAP? package using pacman with the following command:
pacman -get http://vdt.cs.wisc.edu/vdt_200_cache/:OpenLDAP

The client tools include utilities to query Grid resources. Examples here used to test the client are directed the OSG-ITB repositories.

  • Query ReSS?
    condor_status -pool osg-ress-4.fnal.gov -format '%s\n' GlueSiteName | uniq
    CIT_ITB_1
    ITB_INSTALL_TEST_2
    ITB_INSTALL_TEST_3
    CMS-BURT-ITB
    ...
    
  • Query BDII (see ValidateBDII documentation)
    ldapsearch -x -LLL -p 2170 -h is-itb.grid.iu.edu -b mds-vo-name=LBNL_VTB,mds-vo-name=local,o=grid
    dn: mds-vo-name=LBNL_VTB,mds-vo-name=local,o=grid
    objectClass: GlueTop
     
    dn: GlueSEUniqueID=osp1.lbl.gov,mds-vo-name=LBNL_VTB,mds-vo-name=local,o=grid
    ...
    
  • Query by LCG tools (NOTE lcg-info and lcg-infosites are not in ones path by default):
     $VDT_LOCATION/lcg/bin/lcg-info --list-ce --bdii is-itb.grid.iu.edu:2170 --vo osg
    - CE: cithep201.ultralight.org:2119/jobmanager-condor-osg
    - CE: cms-xen1.fnal.gov:2119/jobmanager-condor-osg
    ....
    
    $VDT_LOCATION/lcg/bin/lcg-info --list-se --bdii is-itb.grid.iu.edu:2170 --vo osg
    - SE: cit-se.ultralight.org
    - SE: cms-xen1.fnal.gov
    ... 
     
    $VDT_LOCATION/lcg/bin/lcg-infosites --vo osg -f ce  --is is-itb.grid.iu.edu 
    valor del bdii: is-itb.grid.iu.edu:2170
    #CPU    Free    Total Jobs      Running Waiting ComputingElement
    ----------------------------------------------------------
      40      40       0              0        0    osp1.lbl.gov:2119/jobmanager-pbs-batch
       2       0       0              0        0    tb10.grid.iu.edu:2119/jobmanager-condor-osg
    1042       2       0              0        0    itb.rcac.purdue.edu:2119/jobmanager-condor-osg
    ...
    
    $VDT_LOCATION/lcg/bin/lcg-infosites --vo osg -f se  --is is-itb.grid.iu.edu 
    Avail Space(Kb) Used Space(Kb)  Type    SEs
    ----------------------------------------------------------
    125             2               n.a     osp1.lbl.gov
    28824440        10638628        n.a     tb10.grid.iu.edu
    37              242             n.a     itb.rcac.purdue.edu
    ...
    

Simple Client Jobs

Some of the examples here are also documented in IntegrationCESimpleTest? with a emphasis on observiny the CE response via the log files. Here we simply note the response you should observe from the client. It is assumed that you have sourced the $VDT_LOCATION/setup.(c)sh script , have a valid grid-proxy, and are aware of resources available to you.

Submit a remote job: globus-job-run

To the default (fork) queue:

$globus-job-run osp1.lbl.gov/jobmanager /bin/hostname
osp1.lbl.gov

To the local batch system queue: (specific syntax depends on what batch system is deployed [Condor|PBS|LSF|SGE])

$globus-job-run osp1.lbl.gov/jobmanager-pbs /bin/hostname
osp3.lbl.gov

File tranfer: globus-url-copy,srmcp,uberftp

Copy a file using globus-url-copy from a local system to a remote gsiftp server.

$ls -l /tmp/testdata_10.dat
-rw-r--r--  1 qwerty qwerty 2121000 Nov 15 11:25 /tmp/testdata_10.dat

$globus-url-copy file:///tmp/testdata_10.dat gsiftp://osp4.lbl.gov:2811/tmp/testdata_destination.dat

Now copy it back and compare:

$globus-url-copy gsiftp://osp4.lbl.gov:2811/tmp/testdata_destination.dat  file:///tmp/testdata_return.dat
$ls -l /tmp/testdata_return.dat
-rw-r--r--  1 qwerty qwerty 2121000 Nov 15 11:27 /tmp/testdata_return.dat

$diff /tmp/testdata_10.dat /tmp/testdata_return.dat

You can repeat the above tests between two remote gsiftp servers using the syntax:

$globus-url-copy  gsiftp://server1:2811/filesystem/filename gsiftp://server2:2811/filesystem2/filename2

You can repeat the above tests using srmcp instead of globus-url-copy. In addition you can access a remote SRM service explicitly. For example:

$srmcp gsiftp://osp4.lbl.gov:2811/tmp/testdata.dat srm://osg-itb.ligo.caltech.edu:8643/pnfs/ligo.caltech.edu/data/star/testdata_dest.dat

You may also use the "-debug" flag which will generate diagnostic messages. Also, srm-v2-client package is included in the OSG-client installation but the software is not added to ones path by default.

To test uberftp, simply point to a known resource:

$ uberftp osg-itb.ligo.caltech.edu
220 osg-itb.ligo.caltech.edu GridFTP Server 2.5 (gcc32dbg, 1182369948-63) ready.
230 User qwerty logged in.

Submit a remote job to WS GRAM: globusrun-ws

Typically, WS-GRAM is unused on the OSG as a client or server; we maintain the below test instructions for posterity's sake.

Basic submission using WS GRAM is done with globusrun-ws analgous to globus-job-run with GRAM. A wide range of tests are shown in the ValidateGramWebServices documentation. Here is a basic test to verify that the client tools are in order.


$globusrun-ws -submit -F osp1.lbl.gov:9443 -Ft Fork -s -c /bin/hostname

Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:7bfd4324-93b7-11dc-8f33-00304889ddce
Termination time: 11/16/2007 20:15 GMT
Current job state: Active
Current job state: CleanUp-Hold
osp1.lbl.gov
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.

Condor-G submission

OSG Client includes a condor package for use as a Condor-G submit host. To test this service

  • verify service is running:
     $condor_q
     
    -- Submitter: qwerty@osp3.lbl.gov : <128.3.30.238:59969> : osp3.lbl.gov
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
     
    0 jobs; 0 idle, 0 running, 0 held
    

  • prepare submit file for either GRAM or WS-GRAM
     $cat test.submit
    #specify resource destination
    Universe            = grid
    grid_resource       = gt4 https://hostname:9443 [Condor|PBS|SGE]
    
    # specify executable
    Executable          = /bin/hostname
    Transfer_Executable = false
     
    # copy stdout stderr to local files, referenced by job (Process) and submission id (Cluster)
    output  =  mytest.out.$(Cluster).$(Process)
    error   =    mytest.err.$(Cluster).$(Process)
    #a single local log file for tracking Condor-g submission
    log    = mytest.log
     
    # do not send email notification
    notification=Never
    
    # submit 2 identical jobs: Process=0 and Process=1
    Queue 2
    
  • Submit job
    $condor_submit test.submit 
    Submitting job(s)..
    Logging submit event(s)..
    2 job(s) submitted to cluster 51.
    
  • monitor jobs
    $condor_q
      
    -- Submitter: qwerty@osp3.lbl.gov : <128.3.30.238:59969> : osp3.lbl.gov
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
      51.0   qwerty         11/15 12:29   0+00:00:00 I  0   9.8  hostname          
      51.1   qwerty         11/15 12:29   0+00:00:00 I  0   9.8  hostname          
      52.0   qwerty         11/15 12:29   0+00:00:15 R 0   0.0  gridftp_wrapper.sh
     
    3 jobs; 2 idle, 1 running, 0 held
    
  • verify jobs completed
    $cat mytest.out.51.0
    osp3.lbl.gov
    

Worker node client

The worker node client is a subset of client tools assumed to be needed, not by an interactive user, but by a batch job which lands on a worker node. These tools can be checked via the methods listed above where applicable. To validate one should test the job-submission tools (globus-job-run and globusrun-ws) and the data transfer tools (globus-url-copy, srmcp and uberftp) as described for the OSG Client package.


Included topic: Validate Squid


About this Document

hand This document is for System Administrators. It describes how to make sure that a site's squid server is working properly.

Introduction

A squid server can cache input data locally at a site which are downloaded by user jobs, which will make subsequent downloads of the same data much faster. The installation instructions can be found at the SquidInstallation page.

Requirements

Prerequisite: Make sure squid is installed and the OSG_SQUID_LOCATION attribute is set in the osg-attributes.conf file. Otherwise run configure-osg to update the $VDT_LOCATION/osg/bin/osg-attributes.conf file with the OSG_SQUID_LOCATION attribute.

Validation Steps

To make sure the server is working:

  • Set http_proxy environment variable to proxyhost:3128 (where proxyhost is the name of your squid server) and export it
  • Use wget with debugging to get a cachable web url, for example
[user@client ~]$ wget -d -O/dev/null http://www.fnal.gov
  • The first time you run it you should see not far from the end of the output a line beginning "X-Cache: MISS"
  • Run the same command again and you should instead see a line beginning "X-Cache: HIT"

To verify that it is working properly from a worker node:

  • run a grid job on the worker node (such as with globus-job-run or condor_submit) with the following commands in it:
    • uname -a
    • export http_proxy=$OSG_SQUID_LOCATION
    • run the above wget command twice
  • Verify that the job ran on a worker node and that "X-Cache: HIT" is present in the output from the second wget


Included topic: Validate Gratia


Gratia Probe

  • Ensure the batch system for which you have installed a probe has completed jobs and at least 10 minutes has elapsed.
  • Then, visit http://gratia.opensciencegrid.org:%GratiaCollectorPort%/gratia-administration/monitor-status.html?sitename=my-site to check that your probe is reporting (eg working example link: http://%GratiaHost%/gratia-administration/monitor-status.html?sitename=FNAL_FERMIGRID_ITB).
  • Note that this status page may show multiple probes, some of which may be obsolete. One way this can arise is that old installations may identify a probe as pbs-lsf:myhost.mydomain, whereas new installations identify a probe as pbs:myhost.mydomain or pbs-lsf:myhost.mydomain as appropriate. Please notify the Gratia team and we will correct the record.
  • If you get no entry at all, double-check by using the probe name (MeterName in the ProbeConfig file: usually "probe-type:dns-name"), viz:
    http://gratia.opensciencegrid.org:%GratiaCollectorPort%/gratia-administration/monitor-status.html?probename=probe-type:myhost.mydomain.
    The reason for the fallback is: if Gratia has seen a probe by this name before (previous installation, for instance) it will use the old site name for accounting purposes until we are notified otherwise. Working example link: http://gratia.opensciencegrid.org:%GratiaCollectorPort%/gratia-administration/monitor-status.html?probename=condor:fngp-osg.fnal.gov.
  • As a last resort, peruse the whole list of reporting probes.
  • If you are still unable to find evidence that your probe has contacted the collector, download gratia-site-diag and run it to diagnose possible problems. Report same immediately; see Gratia troubleshooting instructions for more details.
  • Assuming you find evidence of recent contact using one of the methods above, you may wish to further verify the receipt of real records by obtaining a report from the Gratia reporting site, for example Daily jobs by VO for site from the main Gratia reporting service. Also, Run the special SQL query "Probe / Local UID combinations corresponding to Unknown VO in the last week" and check for a large number of OSG-like user names that are being mapped to the Unknown VO. This indicates a problem either with the ProbeConfig knowing the location of the osg-user-vo-map.txt or with the reverse map file itself. Please investigate and report to the the Gratia team.

Gratia Collector

  • Check that the reporting is operational: http://mygratiahost.example.com:port/gratia-reporting/.
  • Check that the administration page is operational: http://mygratiahost.example.com:port/gratia-administration/.
  • Configure a probe to send to the collector (set SOAPHost appropriately) and check the above pages for entries.
  • If there is no evidence of received data, check $VDT_LOCATION/tomcat/v55/logs/gratia-0.log and $VDT_LOCATION/tomcat/v55/logs/catalina.out. An "empty data set" exception and/or blank report page is OK; everything else should be reported to the Gratia team.

Validate Ress And Cemon

Please note: This documentation is for OSG 1.2. While we still provide critical security updates for OSG Software 1.2, we recommend you use OSG Software 3 for any new or updated installations. We are considering May 31, 2013 as possible OSG 1.2 End of Life (EOL).

ReleaseDocumentation
ValidateRessAndCemon
Owner SuchandraThapa
Area ComputeElement
Role SysAdmin
Type HowTo
Reviewer Tester Owner SuchandraThapa
Not Ready Not Ready Not Released

ReSS Classad Validation Mechanism

Introduction

In the Resource Selection Service (ReSS) model, sites advertise their characteristics to a central ReSS information collector in the form of old-format classads. These classads are formed at sites by the Computing Element Monitor (CEMon) service, gathering site information from Generic Information Providers (GIP) scripts. This information is expressed via attributes according to the Glue Schema v1.2 or v1.3.

The Open Science Grid (OSG) has agreed on a minimal set of critical attributes that sites must correctly advertise, in order for the OSG to consider the information "valid". Also, because the semantic of Glue Schema attributes is well defined, only certain ranges of values can be expected for some attributes.

Following these considerations, the ReSS project, in collaboration with OSG, has developed a mechanism to validate site classads. This documentation explains how this is achieved and how the validation mechanism can be maintained.

Architecture

The ReSS central collector of site information is based on the condor collector server. The ReSS classad validation mechanism uses the ability of condor to evaluate logical expressions carried by the classad itself.

More in detail, the validation process is decomposed into a series of expressions that test different characteristics of the classad. These expressions are implemented as classad attributes and are written using the condor classad expression grammar. In summary, expressions refer to other attributes in the classad and use typical classad operators such as '==', '<', '&&', '=!=', regexp(...), etc. We refer the reader to the condor manual for the details of such grammar.

The validation expressions are common to all classads i.e. to all sites. They are added to each classad by the ReSS Information Gatherer, an interface adaptor service that receives classads from the CEMon at each site and forwards them to the information collector. The maintenance of such expressions is therefore done centrally by changing the configuration of the information gatherer.

Validation expressions test different characteristics of the classad. Some test the presence of attributes, for examples the OSG critical attributes, with expressions such as

isClassadValidIsCriticalAttributeXPresent =  
    (ResourceSelection.AttributeX =!= UNDEFINED)

Others test that attribute values gathered by the GIP scripts are consistent with the semantic of the attributes. For example, the total number of CPU in a cluster is a positive number; this can be expressed with an expression like this

isClassadValidAreTotalCPUPositive = 
    ( ResourceSelection.GlueCEInfoTotalCPUs =!= UNDEFINED && 
      ResourceSelection.GlueCEInfoTotalCPUs > 0 )

All the validation expressions are put in logical OR in a summary-level validation attribute. Ultimately, the evaluation of this attribute determines the validity of the classad.

It should be noted that typically each site is described with more than one classad. The classad multiplicity for each site is the product of the number of Clusters x CE's x SubClusters x Supported VO's. In principle, only some classads from a site may fail the validation test, for example, if only part of the information system configuration is wrong. As of Nov 2007, however, we have never observed such occurrance.

We refer to Appendix A for a complete list of validation expressions.

Evaluation of validation expressions

The condor system evaluates automatically a validation expression when the information collector is queried to display the value of the corresponding validation attribute. For example, using the command line interface of the condor system, this can be achieved with the following commands:

% source /opt/vdt/setup.sh
% condor_status -pool osg-ress-1.fnal.gov \
     -format '%d\n' isClassadValid \
     -constraints 'GlueSiteName == "FNAL_FERMIGRID"' | uniq
1

This command queries the production OSG ReSS information collector (osg-ress-1.fnal.gov) and prints out the values of the attribute "isClassadValid" for the site "FNAL_FERMIGRID". The command is "piped" into the unix "uniq" command to display a single value if all classad pass (return value 1) or all fail (return value 0) the test.

There are two products that display results of these evaluations for all sites. One is run as an hourly cron job and displays results as a web pages ( Integration | Production ). The other is a validation script run via the RSV framework. As of OSG 0.8.0, the latter is part of the suite of site validation tests.

Maintenance of validation expressions

The Information Gatherer (IG) is the ReSS central service that receives classads from the CEMon at each site and forwards them to the information collector. The IG runs on the same machine where the information collector (condor_collector) also runs: for OSG integration osg-ress-4.fnal.gov; for OSG production osg-ress-1.fnal.gov.

Validation expressions can be added by editing the staticCondorClassadAttributes.data configuration file. Please, refer to IG Customization of Global Site Parameters for details.

Appendix A: Validation expressions

As of Nov 2007, these are the expressions used to validate a ReSS classad

isClassadValidAreCrtiticalAttributesPresent = 
   ( ResourceSelection.GlueSiteName =!= UNDEFINED && 
     ResourceSelection.GlueHostApplicationSoftwareRunTimeEnvironment =!= UNDEFINED &&
     ResourceSelection.GlueHostNetworkAdapterInboundIP =!= UNDEFINED && 
     ResourceSelection.GlueHostNetworkAdapterOutboundIP =!= UNDEFINED && 
     ResourceSelection.GlueSubClusterTmpDir =!= UNDEFINED && 
     ResourceSelection.GlueSubClusterWNTmpDir =!= UNDEFINED )


isClassadValidAreImportantAttributesPresent = 
    ( ResourceSelection.GlueSubClusterPhysicalCPUs =!= UNDEFINED && 
      ResourceSelection.GlueSubClusterLogicalCPUs =!= UNDEFINED && 
      ResourceSelection.GlueCEStateStatus =!= UNDEFINED && 
      ResourceSelection.GlueCEInfoContactString =!= UNDEFINED )


isClassadValidAreStateSlotsAndCPUNonNegative = 
    ( ResourceSelection.GlueCEStateFreeCPUs =!= UNDEFINED && 
      ResourceSelection.GlueCEStateFreeCPUs >= 0 && 
      ResourceSelection.GlueCEStateFreeJobSlots =!= UNDEFINED && 
      ResourceSelection.GlueCEStateFreeJobSlots >= 0 && 
      ResourceSelection.GlueCEStateTotalJobs =!= UNDEFINED && 
      ResourceSelection.GlueCEStateTotalJobs >= 0 && 
      ResourceSelection.GlueCEStateWaitingJobs =!= UNDEFINED && 
      ResourceSelection.GlueCEStateWaitingJobs >= 0 && 
      ResourceSelection.GlueCEStateRunningJobs =!= UNDEFINED && 
      ResourceSelection.GlueCEStateRunningJobs >= 0 )


isClassadValidAreTotalSlotsAndCPUPositive = 
   ( ResourceSelection.GlueCEInfoTotalCPUs =!= UNDEFINED && 
     ResourceSelection.GlueCEInfoTotalCPUs > 0 && 
     ResourceSelection.GlueCEPolicyAssignedJobSlots =!= UNDEFINED && 
     ResourceSelection.GlueCEPolicyAssignedJobSlots > 0 )


isClassadValidIsCEHostNetAvailable = 
    ( ResourceSelection.GlueCEInfoHostName =!= UNDEFINED && 
      regexp("\.lan$"; ResourceSelection.GlueCEInfoHostName) != 1 && 
      regexp("\.localhost$"; ResourceSelection.GlueCEInfoHostName) != 1 && 
      regexp("\.localdomain$"; ResourceSelection.GlueCEInfoHostName) != 1 && 
      regexp("\.local$"; ResourceSelection.GlueCEInfoHostName) != 1 && 
      regexp("\.internal$"; ResourceSelection.GlueCEInfoHostName) != 1 )


isClassadValid = 
    ( isClassadValidAreCrtiticalAttributesPresent && 
      isClassadValidAreImportantAttributesPresent && 
      isClassadValidAreTotalSlotsAndCPUPositive &&
      isClassadValidAreStateSlotsAndCPUNonNegative && 
      isClassadValidIsCEHostNetAvailable )

-- GabrieleGarzoglio - 10 Jun 2008


Included topic: Validate Wlcg Interoperability


The definition of a validated, interoperable Storage Element (SE) will not be final until the OSG 1.0 cycle. The following steps are the bare minimum required to validate a Compute Element (CE).

  • Interoperability staff can verify:
    1. Does the output from CEMon match what is served through the BDII?

  • If the above steps are validated, an OPS VO member with access to a gLite WN can verify the following:
    1. Does glite-job-list-match return the site if the sole requirement is
      other.GlueCEInfoHostName == "foo.bar.edu"
    2. Does a simple "hello world" job succeed submitted through the WMS?

Further tests to determine validation are currently being developed.

on on


Comments

PM2RPM?_TASK = CE RobertEngel 28 Aug 2011 - 06:21

Topic revision: r26 - 15 Feb 2012 - 21:00:27 - KyleGross
Hello, TWikiGuest
Register

Introduction

Installation and Update Tools

Clients

Compute Element

Storage Element

Other Site Services

VO Management

Software and Caches

Central OSG Services

Additional Information

Community
linkedin-favicon_v3.icoLinkedIn
FaceBook_32x32.png Facebook
campfire-logo.jpgChat
 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..