Please note: This documentation is for OSG 1.2. While we still provide critical security updates for OSG Software 1.2, we recommend you use OSG Software 3 for any new or updated installations. We are considering May 31, 2013 as possible OSG 1.2 End of Life (EOL).

ReleaseDocumentation
ComputeElementInstall
Reviewed Passed
by MarcoMambelli
Test Passed
by
Released
by SuchandraThapa

Compute Element Installation Guide

About this Document

hand This document is for System Administrators. It covers the installation of a Compute Element to be integrated into the Open Science Grid. This document applies to the latest release OSG-1.2.28 . All procedures presented in this document in general require root privileges.

Conventions used in this document:

A User Command Line is illustrated by a green box that displays a prompt:

  [user@ce /opt/osg-1.2.32]$

A Root Command Line is illustrated by a red box that displays the root prompt:

  [root@ce /opt/osg-1.2.32]$

Lines in a file are illustrated by a yellow box that displays the desired lines in a file:

priorities=1

Customize this Document

Host Name
Domain Name
Login Name
Installation Path
Batch System
 

Engineering Considerations

A Compute Element is a selection of software provided by the Virtual Data Toolkit bundled for the use on the Open Science Grid. A Compute Element provides services to allow grid users to run jobs on a grid resource. The Site Planning Guide will help you avoid many of the common problems while installing a new site.

After successfully completing this procedure, you should proceed to install the Worker Node Client.

Requirements

The CE enables a cluster to receive grid jobs but this document supposes that you have a working cluster and are familiar with its management. If this is not the case you may check the OSG Tier 3 documentation, a guide more simple for system administrators at their first experiences with clusters and High Throughput Computing.

  1. new System Administrators should read the Site Planning Guide
  2. execute all steps to prepare your Compute Element
  3. verify that your Operating System is supported
  4. a Pacman installation of version 3.28 or later
  5. a valid grid host certificate? is required
  6. to test and use the installation a valid grid user certificate? is required

How to get Help?

To get assistance please use this page.

Installation Procedure

The installation procedure consists of the following steps:

  1. deactivate an existing installation
  2. create an installation directory
  3. use Pacman to install the Compute Element
  4. install the CA Certificates and the Certificate Revocation List
  5. connect GRAM to managedfork and your batch system back-end
  6. execute the post installation script

Deactivate an existing Installation (Optional)

Before you proceed you must deactivate an existing installation to avoid to have leftover processes. Make sure to start the installation with a clean environment. If $VDT_LOCATION is in your environment you probably need to log-in again and make sure that no setup file is sourced by your log-in script.

Reuse an existing Configuration (Optional)

This guide describes how to re-use the configuration from an existing installation.

Create the Installation Directory

Create an installation directory and change into it. Make sure the directory is world readable if the installation is to be shared by grid users:

[root@ce ~]$ mkdir -p /opt/osg-1.2.32
[root@ce ~]$ cd /opt/osg-1.2.32

ALERT! WARNING!
Please do not use a system directory like /opt or /usr for the installation directory. The installation routine will create many sub-directories in the main directory.

Use Pacman to Install the Compute Element Software

In the next step we will use the pacman command line tool to install the package from the OSG software cache. Pacman will ask whether you want to "trust the caches and accept the license", answer yall and y to install Compute Element.

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:ce
Do you want to add [http://software.grid.iu.edu/osg-1.2] to [trusted.caches]? (y/n/yall): yall
...

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:ce
Do you want to add [http://software.grid.iu.edu/osg-1.2] to [trusted.caches]? (y/n/yall): yall
Beginning VDT prerequisite checking script vdt-common/vdt-prereq-check...       

All prerequisite checks are satisfied.
                                                          


========== IMPORTANT ==========
Most of the software installed by the VDT *will not work* until you install
certificates.  To complete your CA certificate installation, see the notes
in the post-install/README file.

                                                                              
Pacman Installation of OSG-1.2.32 Complete
 

HELP NOTE
Depending on your network connection and hardware capabilities the installation process will take between 5 and 30 minutes to complete. Meanwhile you can tail the main installation logfile /opt/osg-1.2.32/vdt-install.log.

ALERT! WARNING!
Do not continue if the pacman command indicated an error! In this case review the main installation log file and consult the help page.

Update the Environment

Depending on your shell update your environment by sourcing /opt/osg-1.2.32/setup.sh or /opt/osg-1.2.32/setup.csh:

[root@ce /opt/osg-1.2.32]$ . /opt/osg-1.2.32/setup.sh

Depending on your preference you might want to optionally include the setup script in your system or user profile.

Install a Certificate Authority Package

Before you proceed to install a Certificate Authority Package you should decide which of the available packages to install. Choose according to the resource group you chose earlier during the resource registration process with the OSG Information Management System:

HELP NOTE
If in doubt, please consult the policies of your home institution and get in contact with the Security Team. You may inspect and remove CA Certificates after the installation process completed!

Next decide at what location to install the Certificate Authority Package:

  1. on the local file system beneath the current installation directory /opt/osg-1.2.32
  2. on the root file system in a system directory /etc/grid-security/certificates
  3. in a custom directory that can also be shared

The instructions below illustrate each method using the Certificate Authority Package used on the Open Science Grid. Choose a package by changing the --url argument provided to vdt-ca-manage.

Local Installation of the CA Certificates

This local installation of the Certificate Authority Package is preferably be used by grid users without root privileges or if the CA certificates will not be shared by other VDT installations on the same host.

[root@ce /opt/osg-1.2.32]$ vdt-ca-manage setupca --location local --url osg
Setting CA Certificates for VDT installation at '/opt/osg-1.2.32'

Setup completed successfully.

After a successful installation the certificates will be installed in ($VDT_LOCATION/globus/share/certificates, /opt/osg-1.2.32/globus/share/certificates in this example).

Root Installation

This root installation of the Certificate Authority Package is preferably be used if the CA certificates will be shared by several VDT installations on the same host. This installation requires always root privileges because the CA Package is installed in a system directory.

[root@ce /opt/osg-1.2.32]$ vdt-ca-manage setupca --location root --url osg
Setting CA Certificates for VDT installation at '/opt/osg-1.2.32'

Setup completed successfully.

After a successful installation the certificates will be installed in /etc/grid-security/certificates.

Custom Installation

This custom installation of the Certificate Authority Package is preferably be used if the CA certificates will be shared by several VDT installations on different hosts.

[root@ce /opt/osg-1.2.32]$ vdt-ca-manage setupca --location /mnt/nfs --url osg
Setting CA Certificates for VDT installation at '/opt/osg-1.2.32'

Setup completed successfully.

After a successful installation the certificates will be installed in /mnt/nfs/certificates.

Provide and Install a custom CA Package

If a custom Certificate Authority certificates package was made available on a web server, it can be used to be installed using the --url option on the command line to vdt-ca-manage:

[root@ce /opt/osg-1.2.32]$ vdt-ca-manage setupca --location /mnt/nfs --url <url to custom CA Package>
Setting CA Certificates for VDT installation at '/opt/osg-1.2.32'

Setup completed successfully.

After a successful installation the certificates provided at the provided --url location will be installed in /mnt/nfs/certificates.

Enable Updates of the CA Certificates

CA certificates have a limited lifetime and will expire. To keep the installed certificates current it is necessary to update them automatically using the vdt-update-certs provided by the Virtual Data Toolkit:

To enable the service use:

[root@ce /opt/osg-1.2.32]$ vdt-control --enable vdt-update-certs
running 'vdt-register-service --name vdt-update-certs --enable'... ok

Enable Updates of the Certificate Revocation List

The Certificate Revocation List lists certificates that have been temporarily or permanently revoked. To keep the CRL current it is necessary to update it automatically using fetch-crl provided by the Virtual Data Toolkit:

[root@ce /opt/osg-1.2.32]$ vdt-control --enable fetch-crl
running 'vdt-register-service --name vdt-update-certs --enable'... ok

Connect GRAM to Managed Fork and your Batch System Backend

The Grid Resource Allocation Manager connects to the batch system using a Job Manager. By default only the fork job manager is supported which allows grid users to execute jobs on the Gatekeeper. To connect GRAM to an existing batch system, you need to install the corresponding job manager.

Define Globus Port Ranges

GRAM and GridFTP use TCP ports to handle connection requests. The range of TCP ports that will be used by the Compute Element is defined by two environment variables $GLOBUS_TCP_PORT_RANGE for outbound and $GLOBUS_TCP_SOURCE_RANGE for inbound connections respectively.

The range of ports required dependents on the number of job slots your Compute Element provides to grid users. Each job slot in use requires 2 TCP ports within the port range.

The two variables are defined in /opt/osg-1.2.32/vdt/etc/vdt-local-setup.sh for the Bourne Shell:

# This file is sourced by setup.sh.  Use it for any custom setup for this site.
# This file will be preserved across VDT installations if OLD_VDT_LOCATION is set.

# Set GLOBUS_TCP_PORT_RANGE to define communication ports for inbound connections.
export GLOBUS_TCP_PORT_RANGE=<begin port>,<end port>

# Set GLOBUS_TCP_SOURCE_RANGE to define communication ports for outbound connections.
export GLOBUS_TCP_SOURCE_RANGE=<begin port>,<end port>

# Set GLOBUS_TCP_PORT_RANGE_STATE_FILE to the location where Globus should record
# TCP port usage for outbound connections in case of a stateful firewall.
export GLOBUS_TCP_PORT_RANGE_STATE_FILE=<location on file system>

# Set GLOBUS_TCP_SOURCE_RANGE_STATE_FILE to the location where Globus should record
# TCP port usage for inbound connections in case of a stateful firewall.
export GLOBUS_TCP_SOURCE_RANGE_STATE_FILE=<location on file system>

and for the TC Shell in /opt/osg-1.2.32/vdt/etc/vdt-local-setup.csh :

# This file is sourced by setup.sh.  Use it for any custom setup for this site.
# This file will be preserved across VDT installations if OLD_VDT_LOCATION is set.

# Set GLOBUS_TCP_PORT_RANGE to define communication ports for inbound connections.
setenv GLOBUS_TCP_PORT_RANGE <begin port>,<end port>

# Set GLOBUS_TCP_SOURCE_RANGE to define communication ports for outbound connections.
setenv GLOBUS_TCP_SOURCE_RANGE <begin port>,<end port>

# Set GLOBUS_TCP_PORT_RANGE_STATE_FILE to the location where Globus should record
# TCP port usage for outbound connections in case of a stateful firewall.
setenv GLOBUS_TCP_PORT_RANGE_STATE_FILE <location on file system>

# Set GLOBUS_TCP_SOURCE_RANGE_STATE_FILE to the location where Globus should record
# TCP port usage for inbound connections in case of a stateful firewall.
setenv GLOBUS_TCP_SOURCE_RANGE_STATE_FILE <location on file system>

It may be necessary to limit the Linux ephemeral port range to avoid the Globus ports defined above. Please check the /etc/sysctl.conf file for the following lines; insert if needed:

# Limit ephemeral ports to avoid globus TCP port range
# See OSG CE install guide
net.ipv4.ip_local_port_range = 10240 <begin port minus 1 port> 

Execute sysctl as the root user for these settings to take effect:

[root@ce /opt/osg-1.2.32]$ sysctl -p

ALERT! WARNING!
An insufficient range of TCP ports reserved will make grid jobs hang randomly! Do not set the port ranges by editing the xinetd configuration files!

HELP NOTE
Please proceed to configure your firewall accordingly.

Instructions for Managed Fork

The default job manager fork allows grid users to run arbitrary many jobs on the Gatekeeper. This may result in dangerous load values which will render the Gatekeeper unusable eventually. The managedfork job manager routes incoming fork requests through Condor instead, which can be used to restrict the number of jobs executing on the Gatekeeper.

Before you proceed to install the managedfork job manager, you may define two environment variables that point to an existing Condor installation and its configuration file. Otherwise skip this step:

[root@ce /opt/osg-1.2.32]$ export VDTSETUP_CONDOR_LOCATION = <Path to Condor>
[root@ce /opt/osg-1.2.32]$ export VDTSETUP_CONDOR_CONFIG   = <Path to Condor configuration file>

Then continue to use pacman to install the managedfork job manager:

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:ManagedFork

By default managedfork does not restrict the execution of jobs forked on the gatekeeper. Restrictions can be defined using the Condor configuration file $CONDOR_CONFIG:

START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 5 || GridMonitorJob =?= TRUE

HELP NOTE
Match TotalLocalJobsRunning to the capabilities of your gatekeeper. You must execute condor_reconfig as root for the changes to take effect.

Instructions for Condor

Before you proceed to install the Condor job manager, you must define two environment variables that point to the existing Condor installation and its configuration file:

[root@ce /opt/osg-1.2.32]$ export VDTSETUP_CONDOR_LOCATION = <Path to Condor>
[root@ce /opt/osg-1.2.32]$ export VDTSETUP_CONDOR_CONFIG   = <Path to Condor configuration file>

Then continue to use pacman to install the Condor job manager:

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:Globus-Condor-Setup

By default the Condor job manager requires all jobs to be run on a Worker Node of the same architecture as the Gatekeeper architecture. Disable this feature if some or all worker nodes use a different architecture by commenting out following line in the file /opt/osg-1.2.32/globus/lib/perl/Globus/GRAM/JobManager/condor.pm:

#    $requirements .= " && Arch == \"" . $description->condor_arch() . "\" ";

Next, optimize Gratia probe performance by changing the directory to record the job history. The location can be changed in the file $VDTSETUP_CONDOR_CONFIG using the variable PER_JOB_HISTORY_DIR:

PER_JOB_HISTORY_DIR = /opt/osg-1.2.32/gratia/var/data

Finalize the changes by running condor_reconfig:

[root@ce /opt/osg-1.2.32]$ condor_reconfig

Instructions for PBS

Before you proceed make sure that the $PATH environment variable contains the path to the binaries for the Portable Batch System. Then use pacman to install the PBS job manager:

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:Globus-PBS-Setup

Otherwise the installation will silently fail and the error will be reported to /opt/osg-1.2.32/vdt-install.log only. In this case you must remove the Globus-PBS-Setup package, correct the $PATH environment variable and install the PBS job manager once again. To remove the Globus-PBS-Setup use:

[root@ce /opt/osg-1.2.32]$ pacman -remove http://software.grid.iu.edu/osg-1.2:Globus-PBS-Setup

Instructions for SGE

Before you proceed make sure that the $PATH environment variable contains the path to the binaries for the SUN Grid Engine. Then use pacman to install the SGE job manager:

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:Globus-SGE-Setup

Instructions for LSF

Before you proceed make sure that the $PATH environment variable contains the path to the binaries for the Platform LSF. Then use pacman to install the LSF job manager:

[root@ce /opt/osg-1.2.32]$ pacman -get http://software.grid.iu.edu/osg-1.2:Globus-LSF-Setup

Enable Log-file Rotation (Optional)

Optionally enable the rotation of all log-files using vdt-control:

[root@ce /opt/osg-1.2.32]$ vdt-control --enable vdt-rotate-logs

Post-Install Procedure

To finalize your installation you must run the vdt-post-install program on the command line:

[root@ce /opt/osg-1.2.32]$ vdt-post-install
Starting...
Configuring PRIMA... Done.
Configuring EDG-Make-Gridmap... Done.
Configuring PRIMA-GT4... Done.
Done.

Configuration Procedure

The procedure to configure an installation of the Virtual Data Toolkit consists of three steps:

  1. edit the configuration file /opt/osg-1.2.32/osg/etc/config.ini
  2. verify the configuration file /opt/osg-1.2.32/osg/etc/config.ini using /opt/osg-1.2.32/osg/bin/configure-osg
  3. run the configuration script /opt/osg-1.2.32/osg/bin/configure-osg

The final steps are:

  • set attributes in config.ini as described below
  • run post installation for standard services (GIP, Gratia, CEMon, Authentication, RSV, and WS-GRAM)
  • configure any optional services (Squid, MonALISA)
  • copy /opt/osg-1.2.32/osg/etc/locations/grid3-locations.txt to $OSG_APP/etc/grid3-locations.txt

Edit the Configuration File

Syntax and formatting rules for the configuration file can be found here. An exhaustive discussion of configuration options can be found in the reference guide. Refer to the Quick Install Guide for a minimal number of changes required for config.ini.

Default Information

The first section in the configuration file /opt/osg-1.2.32/osg/etc/config.ini is used to specify default attributes to be used in the remainder of the file. Please fill in the fully qualified domain name of your resource and the administrator e-mail.

[DEFAULT]
; Use this section to define variables that will be used in other sections
; For example, if you define a variable called dcache_root here
; you can use it in the gip section as %(dcache_root)s  (e.g.
; my_vo_1_dir = %(dcache_root)s/my_vo_1
; my_vo_2_dir = %(dcache_root)s/my_vo_2

; Defaults, please don't modify these variables
unavailable = UNAVAILABLE
default = UNAVAILABLE

; Name these variables disable and enable rather than disabled and enabled
; to avoid infinite recursions
disable = False
enable = True

; You can modify the following and use them
localhost = ce.opensciencegrid.org
admin_email = admin@ce.opensciencegrid.org

Site Information

The attributes in the section labeled [Site Information] require that you registered your resource with the OSG Information Management System. All attributes in this section are required and must be set.

hand The latitude and longitude for your resource location can be found using this external tool not affiliated with the Open Science Grid.

[Site Information]
; The group option indicates the group that the OSG site should be listed in,
; for production sites this should be OSG, for vtb or itb testing it should be
; OSG-ITB
;
; YOU WILL NEED TO CHANGE THIS
group = OSG-ITB

; The host_name setting should give the host name of the CE  that is being
; configured, this setting must be a valid dns name that resolves
;
; YOU WILL NEED TO CHANGE THIS
host_name = %(localhost)s

; The resource setting should be set to the same value as used in the OIM
; registration at the goc
;
; YOU WILL NEED TO CHANGE THIS
resource = LIGO_CIT


; The resource_group setting should be set to the same value as used in the OIM
; registration at the goc
;
; YOU WILL NEED TO CHANGE THIS
resource_group = LIGO-CIT-ITB

; The sponsor setting should list the sponsors for your cluster, if your cluster
; has multiple sponsors, you can separate them using commas or specify the
; percentage using the following format 'osg, atlas, cms' or
; 'osg:10, atlas:45, cms:45'
;
; YOU WILL NEED TO CHANGE THIS
sponsor = LIGO

; The site_policy setting should give an url that lists your site's usage
; policy
site_policy = %(unavailable)s

; The contact setting should give the name of the admin/technical contact
; for the cluster
;
; YOU WILL NEED TO CHANGE THIS
contact = <Full Name of System Administrator>

; The email setting should give the email address for the technical contact
; for the cluster
;
; YOU WILL NEED TO CHANGE THIS
email = %(admin_email)s

; The city setting should give the city that the cluster is located in
;
; YOU WILL NEED TO CHANGE THIS
city = Pasadena

; The country setting should give the country that the cluster is located in
;
; YOU WILL NEED TO CHANGE THIS
country = USA

; The longitude setting should give the longitude for the cluster's location
; if you are in the US, this should be negative
; accepted values are between -180 and 180
;
; YOU WILL NEED TO CHANGE THIS
longitude = -118.123874

; The latitude setting should give the latitude for the cluster's location
; accepted values are between -90 and 90
;
; YOU WILL NEED TO CHANGE THIS
latitude = 34.13647

Batch System Information

The configuration file contains sections for each supported batch system. Please fill in the information for the batch system used on your resource.

ALERT! WARNING!
If no batch system is enabled, the current release of configure-osg will fail to configure the CE and not give you any warning or indication of this!

[PBS]
; This section has settings for configuring your CE for a PBS job manager

; The enabled setting indicates whether you want your CE to use a PBS job
; manager
; valid answers are True or False
enabled = %(disable)s

; The home setting should give the location of the pbs install directory
home = %(unavailable)s

; The pbs_location setting should give the location of pbs install directory
; This should be the same as the home setting above
pbs_location = %(home)s

; The job_contact setting should give the contact string for the jobmanager
; on this CE (e.g. host.name/jobmanager-pbs)
job_contact = %(localhost)s/jobmanager-pbs

; The util_contact should give the contact string for the default jobmanager
; on this CE (e.g. host.name/jobmanager)
util_contact = %(localhost)s/jobmanager

; The wsgram setting should be set to True or False depending on whether you
; wish to enable wsgram on this CE
wsgram = %(disable)s

[Condor]
; This section has settings for configuring your CE for a Condor job manager

; The enabled setting indicates whether you want your CE to use a Condor job
; manager
; valid answers are True or False
enabled = %(disable)s

; The condor_location setting should give the location of condor install directory
condor_location = %(unavailable)s

; The condor_location setting should give the location of condor config file,
; This is typically  etc/condor_config within the condor install directory.
; If you leave this set to %(unavailable)s, configure-osg will attempt to
; determine the correct value.
condor_config = %(unavailable)s

; The job_contact setting should give the contact string for the jobmanager
; on this CE (e.g. host.name/jobmanager-condor)
job_contact = %(localhost)s/jobmanager-condor

; The util_contact should give the contact string for the default jobmanager
; on this CE (e.g. host.name/jobmanager)
util_contact = %(localhost)s/jobmanager

; The wsgram setting should be set to True or False depending on whether you
; wish to enable wsgram on this CE
wsgram = %(disable)s

[SGE]
; This section has settings for configuring your CE for a SGE job manager

; The enabled setting indicates whether you want your CE to use a SGE job
; manager
; valid answers are True or False
enabled = %(disable)s

; The sge_root setting should give the location of sge install directory
;
; The VDT will bootstrap your SGE environment by sourcing
;   $SGE_ROOT/$SGE_CELL/common/settings.sh
; where $SGE_ROOT and $SGE_CELL are the values given for sge_root and sge_cell.
sge_root = %(unavailable)s

; The sge_cell setting should be set to the value of $SGE_CELL for your SGE
; install.
sge_cell = %(unavailable)s


; The job_contact setting should give the contact string for the jobmanager
; on this CE (e.g. host.name/jobmanager-sge)
job_contact = %(localhost)s/jobmanager-sge

; The util_contact should give the contact string for the default jobmanager
; on this CE (e.g. host.name/jobmanager)
util_contact = %(localhost)s/jobmanager

; The wsgram setting should be set to True or False depending on whether you
; wish to enable wsgram on this CE
wsgram = %(disable)s

[LSF]
; This section has settings for configuring your CE for a LSF job manager

; The enabled setting indicates whether you want your CE to use a LSF job
; manager
; valid answers are True or False
enabled = %(disable)s

; The home setting should give the location of the lsf install directory
home = %(unavailable)s

; The lsf_location setting should give the location of lsf install directory
; This should be the same as the home setting above
lsf_location = %(home)s

; The job_contact setting should give the contact string for the jobmanager
; on this CE (e.g. host.name/jobmanager-lsf)
job_contact = %(localhost)s/jobmanager-lsf

; The util_contact should give the contact string for the default jobmanager
; on this CE (e.g. host.name/jobmanager)
util_contact = %(localhost)s/jobmanager

; The wsgram setting should be set to True or False depending on whether you
; wish to enable wsgram on this CE
wsgram = %(disable)s

[Managed Fork]
; The enabled setting indicates whether managed fork is in use on the system
; or not. You should set this to True or False
enabled = %(disable)s

Accounting and Information Services

The [Gratia] and [CEMon] sections configure accounting and information services respectively. For each finished grid job on your Compute Element, Gratia sends a record to the central OSG server; this ensures usage of your site gets fairly accounted for. CEMon will periodically run a software called GIP, saving the resulting information about your site, and uploading it to various servers.

CEMon

The [Cemon] section in the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini defines variables ress_servers and bdii_servers. The default values are chosen according to your type of installation and do not require to be changed.

Gratia

The [Gratia] section in the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini contains the probes attribute which can be used to define the list of probes to be run. The default values are chosen according to your type of installation and do not require to be changed.

Storage Information

Use the [Storage] section to define the mass storage provided by your site.

  • grid_dir: the top-level directory of the worker node client on the worker nodes.
  • app_dir: shared directory where VO applications will be installed; also known as $OSG_APP
  • data_dir: shared directory where VOs may save their data files; also known as $OSG_DATA
  • worker_node_tmp: location of a non-shared directory on the worker nodes where VOs can write their scratch files; also known as $OSG_WN_TMP

[Storage]
;
; Several of these values are constrained and need to be set in a way
; that is consistent with one of the OSG storage models
;
; Please refer to the OSG release documentation for an indepth explanation
; of the various storage models and the requirements for them

; If you have a SE available for your cluster and wish to make it available
; to incoming jobs, set se_available to True, otherwise set it to False
se_available = %(disable)s

; If you indicated that you have an se available at your cluster, set default_se to
; the hostname of this SE, otherwise set default_se to UNAVAILABLE
default_se = %(unavailable)s

; The grid_dir setting should point to the directory which holds the files
; from the OSG worker node package, it should be visible on all of the computer
; nodes (read access is required, worker nodes don't need to be able to write)
; 
; YOU WILL NEED TO CHANGE THIS
grid_dir = %(unavailable)s

; The app_dir setting should point to the directory which contains the VO
; specific applications, this should be visible on both the CE and worker nodes
; but only the CE needs to have write access to this directory
; 
; YOU WILL NEED TO CHANGE THIS
app_dir = /mnt/nfs/osg/app

; The data_dir setting should point to a directory that can be used to store
; and stage data in and out of the cluster.  This directory should be readable
; and writable on both the CE and worker nodes
; 
; YOU WILL NEED TO CHANGE THIS
data_dir = /mnt/nfs/osg/data

; The worker_node_temp directory should point to a directory that can be used
; as scratch space on compute nodes, it should allow read and write access on the
; worker nodes but can be local to each worker node
; 
; YOU WILL NEED TO CHANGE THIS
worker_node_temp = /tmp

; The site_read setting should be the location or url to a directory that can
; be read to stage in data, this is an url if you are using a SE
; 
; YOU WILL NEED TO CHANGE THIS
site_read = %(unavailable)s

; The site_write setting should be the location or url to a directory that can
; be write to stage out data, this is an url if you are using a SE
;
; YOU WILL NEED TO CHANGE THIS
site_write = %(unavailable)s

If you'll be advertising an SE associated with your resource, you should make sure to add the appropriate entries to your ini file as well.

Enable Full Privilege Authorization

The [Misc Services] section in the configuration file /opt/osg-1.2.32/osg/etc/config.ini defines the authorization method used on the site. To enable full privileged authorization provide the fully qualified domain name of the GUMS server. To be clear, the PRIMA software has two different modes of authorization which in the config.ini file are referred to as PRIMA (a SAML-based authorization) and XACML, which is an XACML-based authorization message.

The attribute authorization_method describes which method to use for authorization:

  • gridmap for local authorization
  • prima for GUMS version <= 1.2 or 1.3
  • xacml for GUMS version >= 1.3 (note xacml is not tested at high rate yet and prima also works just fine with GUMS version 1.3).

[Misc Services]
; If you have glexec installed on your worker nodes, enter the location
; of the glexec binary in this setting
glexec_location = %(unavailable)s

; If you wish to use the ca certificate update service, set this setting to True,
; otherwise keep this at false
; Please note that as of OSG 1.0, you have to use the ca cert updater or the RPM
; updates, pacman can not update the ca certs
use_cert_updater = %(disable)s

; This setting should be set to the host used for gums host.
; If your site is not using a gums host, you can set this to %(unavailable)s
gums_host = ce.opensciencegrid.org

; This setting should be set to one of the following: gridmap, prima, xacml
; to indicate whether gridmap files, prima callouts, or prima callouts with xacml
; should be used
authorization_method = prima

; This setting indicates whether the osg index page generation will be run,
; by default this is not run
enable_webpage_creation = %(disable)s

Configuration of the Generic Information Provider (GIP)

One important aspect of a site being part of the Open Science Grid is the ability of the Compute Element to describe the site to external users. The Generic Information Provider generates site information and provides it in the GLUE schema.

The GIP is collecting information about some aspects of your site including its hardware composition, the batch system and associated storage. It relies on the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini to collect the information.

The following sections detail information about the options in the [GIP], [Subcluster], and [SE] sections of the configuration file.

For Sites with Multiple Compute Elements

If you would like to properly advertise multiple Compute Element per resource, make sure to:

  • set the value of cluster_name in the [GIP] section to be the same for each CE
  • set the value of other_ces in the [GIP] section to be the hostname of the other CEs at your site; this should be comma separated. So, if you have two CEs, ce1.example.com and ce2.example.com, the value of other_ces on ce1.example.com should be "ce2.example.com. This assumes that the same queues are visible on each CE.
  • set the value of resource_group in the "Site Information" section to be the same for each CE.
  • the value of resource in the "Site Information" section should be CE unique.
  • Have the exact same configuration values for the GIP, SE*, and Subcluster* sections in each CE.

It is good practice to run "diff" between the /opt/osg-1.2.32/osg/etc/config.ini of the different CEs. The only changes should be the value of localhost in the [DEFAULT] section and the value of other_ces in the [GIP] section.

Subcluster Configuration

Another aspect of the Generic Information Provider is to advertise the physical hardware a job will encounter when submitted to your site. This is information is provided to GIP in the [Subcluster <name>] section of the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini.

A sub-cluster is defined to be a homogeneous set of worker node hardware. At least one Subcluster section is required. For WLCG sites, information filled in here will be advertised as part of your MoU commitment, so please strive to make sure it is correct.

For each sub-cluster constituting your site, fill in the information about the worker node hardware by creating a new section choosing a unique name using the following format: [Subcluster <name>] where <name> is the sub-cluster name.

ALERT! IMPORTANT
for OSG 1.2.4 and below, this sub-cluster name must be unique for the entire OSG. Do not pick something generic like [Subcluster Opterons]!

Examples can be found here. This page shows the mapping from attribute names in /opt/osg-1.2.32/osg/etc/config.ini to GLUE attribute names.

The values for this section are relatively well-documented and self-explanatory and are given below:

Option Valid Values Explanation
name String This is the same name that is in the Section label. It should be globally unique!
node_count Positive Integer This is the number of worker nodes in the sub-cluster
ram_mb Positive Integer Megabytes of RAM per node.
cpu_model String CPU model, as taken from /proc/cpuinfo. Please, no abbreviations!
cpu_vendor String Vendor's name: AMD, Intel, or any other.
cpu_speed_mhz Positive Integer Approximate speed in MHZ of the CPU as taken from /proc/cpuinfo
cpus_per_node Positive Integer Number of CPUs (physical chips) per node.
cores_per_node Positive Integer Number of cores per node.
inbound_network True or False Set to true or false depending on inbound connectivity. That is, external hosts can contact the worker nodes in this sub-cluster based on their hostname.
outbound_network True or False Set to true or false depending on outbound connectivity. Set to true if the worker nodes in this sub-cluster can communicate with the internet.
cpu_platform x86_64 or i686 NEW for OSG 1.2. Set according to your sub-cluster's processor architecture.
HEPSPEC Positive Integer Optional: Publish the HEPSPEC number for the subcluster.
SI00 Positive Integer Optional: Publish the SpecInt2000? number for the subcluster. Default: 2000
SF00 Positive Integer Optional: Publish the SpecFloat2000? number for the subcluster. Default: 2000

For Compute Elements

The Generic Information Provider queries the batch system specified and enabled in /opt/osg-1.2.32/osg/etc/config.ini. Alternatively you may manually specify which batch system to query by setting the batch attribute in the [GIP] section.

Information on Condor
GIP has several options that will control how it interacts with the Condor Batch system. You can determine which Condor daemon GIP queries as well as set whitelists and black lists that GIP will use to properly report which VOs are assigned to which Condor Groups.

Option Valid Values Explanation
use_collector True or False Ordinarily GIP queries the Negotiator. This option allows the administrator to tell GIP to query the collector instead of the negotiator. This option is overridden by query_only_local_condor
query_only_local_condor True or False This option overrides use_collector. It causes GIP to query the local Condor SCHEDD rather than the Negotiator or Collector. If not explicitly set, this option defaults to False
exclude_schedds String This is a comma separated list of schedds that should be excluded during a condor_status -submitter query
subtract_owner True or False By default, owner VMs are not accounted for the total CPU numbers published by GIP since according to Condor they aren't available for user jobs. Set to False to tell GIP to not subtract CPUs provided by owner VMs
(groupname)_blacklist String Either set to "*" which denies access to all VOs to the Condor group identified by (groupname) or set to a comma separate list of VOs you wish to deny access.
(groupname)_whitelist String Either set to "*" which allows access to all VOs to the Condor group identified by (groupname) or set to a comma separate list of VOs you wish to allow access.
jobs_constraint String adds a --constraint option to the condor_q command that GIP will execute. %NOTE: for big sites this could cause rather large performance issues
status_constraint String adds a --constraint option to the condor_status command that GIP will execute. %NOTE: for big sites this could cause rather large performance issues
max_wall Integer This allows an administrator to publish the maximum wall time allowed on the Condor pool

HELP NOTE
To generate a list of supported VOs for a queue, the blacklist is evaluated before the whitelist.

Information on PBS

By default any authorized grid user may submit to every PBS queue listed in $VDT_LOCATION/globus/share/globus_gram_job_manager/pbs.rvf. In this case GIP automatically advertises all listed queues.

GIP will also detect if queues restrict access to certain users (acl_users) or certain groups (acl_groups). A reverse mapping from these users and groups to their associated Virtual Organization provides access information of VOs to certain queues which will be advertised by the GIP instead.

This process is not perfect and sometimes fails to generate the right information. In this case you may manually blacklist and whitelist PBS queues for listed VOs in the [PBS] section of the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini

GIP also allows you to completely exclude queues from the list of queues that GIP will advertise.

Option Valid Values Explanation
host String Optional - This is the hostname that will be appended to PBS commands if set. Example: qstat -B -f host
preemption Integer Optional - Set to 1 in order enable reporting that preemption is enabled on the cluster. Default is 0
pbs_path String Optional - This allows an administrator to set a non standard PBS location or a PBS location that is not in the system search path
(queuename)_blacklist String Either set to "*" which denies access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to deny access.
(queuename)_whitelist String Either set to "*" which allows access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to allow access.
queue_exclude String Comma-separated list of queue names that GIP should exclude from the list of queues to publish

HELP NOTE
To generate a list of supported VOs for a queue, the blacklist is evaluated before the whitelist.

Information on SGE

The home attribute in the [SGE] section of the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini has been replaced by sge_root and sge_cell, which should be set to the value of of $SGE_ROOT and $SGE_CELL, respectively. The GIP assumes that it can source $SGE_ROOT/$SGE_CELL/common/settings.sh to create a working SGE environment.

Option Valid Values Explanation
preemption Integer Optional - Set to 1 in order enable reporting that preemption is enabled on the cluster. Default is 0
sge_path String Optional - This allows an administrator to set a non standard SGE location or a SGE location that is not in the system search path
sge_root String This value should be set to $SGE_ROOT - GIP assumes that it can source $SGE_ROOT/$SGE_CELL/common/settings.sh to create a working SGE environment.
sge_cell String This value should be set to $SGE_CELL - GIP assumes that it can source $SGE_ROOT/$SGE_CELL/common/settings.sh to create a working SGE environment.
(queuename)_blacklist String Either set to "*" which denies access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to deny access.
(queuename)_whitelist String Either set to "*" which allows access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to allow access.
queue_exclude String Comma-separated list of queue names that GIP should exclude from the list of queues to publish

Information on LSF

Option Valid Values Explanation
host String Optional - This is the hostname that will be appended to LSF commands if set. Example: lshosts -w host
preemption Integer Optional - Set to 1 in order enable reporting that preemption is enabled on the cluster. Default is 0
lsf_location String Optional - This allows an administrator to set a non standard LSF location or a LSF location that is not in the system search path
lsf_profile String Optional - Set this to specify the location of the LSF profile file. GIP will source this to extract a working LSF environment. Defaults to /lsf/conf/profile.lsf
(queuename)_blacklist String Either set to "*" which denies access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to deny access.
(queuename)_whitelist String Either set to "*" which allows access to all VOs to the queue identified by (queuename) or set to a comma separate list of VOs you wish to allow access.
queue_exclude String Comma-separated list of queue names that GIP should exclude from the list of queues to publish

Link the Compute Element to an external Storage Element

A compute element may provide mass storage to grid users on an external storage element. To advertise the available storage space on the external storage element create the file /opt/osg-1.2.32/gip/etc/gip.conf with following content:

[cesebind]
; Advertise the availability of an external storage element for mass storage use on the compute element.
simple = False
se_list = <coma-separated list of the SRM endpoint of storage elements>

Advertise Available Services

The [GIP] section of the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini provides the possibility to advertise services available at your site.

However, Multi-CE sites MUST edit both cluster_name and other_ces.

All options are given in the table below:

Option Valid Values Explanation
advertise_gums True or False Defaults to False. If you want GIP to query and advertise your gums server set this to True.
advertise_gsiftp True or False Defaults to True. If you don't want GIP to advertise your gridftp server set this to False.
gsiftp_host String This should be set to the name of the gridftp server GIP will advertise if the advertise_gridftp setting is set to True.
cluster_name String This should only be set if you run multiple gatekeepers for the same cluster; if you do, set this value to the FQDN of the head node of the cluster.
other_ces String This should only be set if you run multiple gatekeepers for the same cluster; if you do, set this value to the comma-separate list of FQDNs for the other CEs at this site.
local_template_dirs String Support for local template dirs (contributed by Sam Morrison @ ARCS)

If you want to advertise glexec support you must set glexec_location in the [Misc] section. The value should be the location of glexec on the worker node.

Resource and Service Validation (RSV) Configuration

To configure RSV you can either edit the [RSV] section in the resource configuration file /opt/osg-1.2.32/osg/etc/config.ini directly or alternatively use the rsv-control command line tool. rsv-control is documented here.

Verify the Configuration File

Here we assume that you have completed the previous step to edit the configuration file. Before you proceed you should verify the new configuration file using configure-osg:

[user@ce /opt/osg-1.2.32]$ configure-osg -v
Using /opt/osg-1.2.32/osg/etc/config.ini for configuration information
Configuration verified successfully

HELP NOTE
Inspect the terminal output and the log file /opt/osg-1.2.32/vdt-install.log in case of errors!

Run the Configuration Script

If the configuration file was successfully verified in the previous step, you can proceed to configure the installation:

[user@ce /opt/osg-1.2.32]$ configure-osg -c

[root@ce /opt/osg-1.2.32]$ configure-osg -c
Using /opt/osg-1.2.32/osg/etc/config.ini for configuration information
running 'vdt-register-service --name gums-host-cron --enable'... ok
running 'vdt-register-service --name edg-mkgridmap --disable'... ok
The following consumer subscription has been installed:
	HOST:    http://is-itb2.grid.iu.edu:14001
	TOPIC:   OSG_CE
	DIALECT: RAW

running 'vdt-register-service --name tomcat-55 --enable'... ok
running 'vdt-register-service --name apache --enable'... ok
The following consumer subscription has been installed:
	HOST:    http://is-itb1.grid.iu.edu:14001
	TOPIC:   OSG_CE
	DIALECT: RAW

running 'vdt-register-service --name tomcat-55 --enable'... ok
running 'vdt-register-service --name apache --enable'... ok
The following consumer subscription has been installed:
	HOST:    https://osg-ress-4.fnal.gov:8443/ig/services/CEInfoCollector
	TOPIC:   OSG_CE
	DIALECT: OLD_CLASSAD

running 'vdt-register-service --name tomcat-55 --enable'... ok
running 'vdt-register-service --name apache --enable'... ok
running 'vdt-register-service --name vdt-rotate-logs --enable'... ok
running 'vdt-register-service --name mysql5 --enable'... ok
running 'vdt-register-service --name gsiftp --enable'... ok
Configure-osg completed successfully

HELP NOTE
The script will append log messages to /opt/osg-1.2.32/vdt-install.log. Providing the -d command line option switches on debugging messages to be written to the log file.

HELP NOTE
Each successful execution of the configure-osg script will update following files:

  • /opt/osg-1.2.32/osg/etc/osg-attributes.conf
  • /opt/osg-1.2.32/osg/etc/osg-job-environment.conf
  • /opt/osg-1.2.32/osg/etc/osg-local-job-environment.conf

Activate and Deactivate Services

About VDT Services

The Virtual Data Toolkit provides three types of services:

  1. cron services that will be started periodically by the cron daemon
  2. init services that will be started and stopped by the init daemon
  3. xinet services that will be started upon internet connection attempts by the extended internet daemon

Only cron services may be setup by unprivileged users. All other services require root privileges.

List Registered Services

To see a list of services registered by the Virtual Data Toolkit use vdt-control:

[user@ce /opt/osg-1.2.32]$ vdt-control --list

Service Activation

Use vdt-control to activate registered services. This will:

  • add entries to crontab for cron services
  • add control scripts to /etc/init.d for init services
  • start new init services
  • configure the xinet daemon for xinet services

Unprivileged users must provide the --non-root argument to vdt-control to install cron services. All other services require root privileges.

[root@ce /opt/osg-1.2.32]$ vdt-control --on <Service Name>

vdt-control will fail to activate any service that is already provided by the operating system. In this case you may force the activation of the new service provided by the Virtual Data Toolkit:

[root@ce /opt/osg-1.2.32]$ vdt-control --force --on <Service Name>

Another reason for vdt-control to fail to activate a service may be that the service was previously installed by another installation of the Virtual Data Toolkit which has not been deactivated yet. In this case you must force the deactivation of the existing service before you continue to install the new service:

[root@ce /opt/osg-1.2.32]$ vdt-control --force --off <Service Name>
[root@ce /opt/osg-1.2.32]$ vdt-control --on <Service Name>

Service Deactivation

Use vdt-control to deactivate registered services. This will:

  • remove entries from crontab for cron services
  • stop init services
  • remove control scripts from /etc/init.d for init services
  • re-configure the xinet daemon for xinet services

Unprivileged users must provide the --non-root argument to vdt-control to uninstall cron services. All other services require root privileges.

[root@ce /opt/osg-1.2.32]$ vdt-control --off <Service Name>

vdt-control may fail to deactivate all services due to hanging processes. In this case expect the process table and kill hanging processes manually.

Register the Compute Element with the Grid Operations Center

Every new Compute Element should be registered with the Grid Operation Center using these instructions.

Verify the Operation of the Compute Element

Follow these instructions to verify the correct operation of the Compute Element.

ManagedFork

To verify the correct operation of the managedfork jobmanager submit a simple grid job to your gatekeeper (ce.opensciencegrid.org). This operation requires a valid grid proxy:

[user@ce /opt/osg-1.2.32]$ globus-job-run ce.opensciencegrid.org:2119/jobmanager-fork /bin/sleep 900

Next log into your gatekeeper (ce.opensciencegrid.org) while the job is running and query Condor for a list of a jobs using managedfork as their jobmanager:

[user@ce /opt/osg-1.2.32]$ condor_q -constraint 'JobUniverse==12'
2784794.0   fnalgrid       10/7  22:19   0+00:00:11 R  0   0.0  sleep 900 

The jobmanager managedfork may not work correctly if you don't see the test job among the list of jobs printed to the terminal.

References

Troubleshooting

Please see the Compute Element Troubleshooting Guide.

Comments

If the OLD_VDT_LOCATION is set there is no need to run the cert updater by hand, the certs will be handled same way as they were in the old install. I am not sure whether that happens always, but always for me.
Also seems to me that the Enabling Full Privilege Authentication is done automatically based on config.ini and the files are placed where they need to be - nothing to be done for it these days.
IwonaSakrejda 23 Jul 2009 - 20:54
If Condor has been installed via RPM what should VDTSETUP_CONDOR_LOCATION be? Or, better, how do I point the condor-jobmanager to it? MarcoMambelli 26 May 2010 - 23:44
Above, in the multi-ce section it says:
Set the value of site_name in the "Site Information" section to be the same for each CE.
Shouldn't that be resource_group setting to be the same now, and resource (only used by gratia) to be different for each (localhost);
StevenTimm 30 Aug 2010 - 17:57
Marco--if condor has been installed via the old-style RPM, then VDTSETUP_CONDOR_LOCATION should be set to the top level, i.e. /opt/condor-7.4.3<br />in the case of condor-7.4.3.. this is for old style condor rpms up to and including condor 7.4.x.<br />You can also make a version-neutral symlink, i.e ln -s /opt/condor-7.4.3 /opt/condor<br />and then set it to that, this allows you to upgrade condor rpms without changing your vdt install<br />Condor 7.5 and greater rpms will be put in /usr/bin and /usr/sbin. StevenTimm 08 Oct 2010 - 02:23
RE: latitude and longitude--myosg has a plugin by which you can modify the latitude and<br />longitude of your cluster with a on-screen google map, docs should reflect that. StevenTimm 08 Oct 2010 - 02:33
re: PBS instructions--the pbs binaries qstat, qsub, and pbsnodes have to be available<br />in the PATH all the time, not just at install time, because the Generic Information Provider<br />routines for PBS assumes that they are in the PATH. StevenTimm 08 Oct 2010 - 02:35
re: Condor Instructions, in the text 'The location can be changed in the file $VDTSETUP_CONDOR_LOCATION using the variable PER_JOB_HISTORY_DIR', I think the $VDTSETUP_CONDOR_CONFIG variable is meant. SarahWilliams 22 Mar 2011 - 17:37
PM2RPM?_TASK = CE RobertEngel 28 Aug 2011 - 06:03

Topic revision: r138 - 18 Jun 2012 - 18:20:31 - ElizabethChism
Hello, TWikiGuest
Register

Introduction

Installation and Update Tools

Clients

Compute Element

Storage Element

Other Site Services

VO Management

Software and Caches

Central OSG Services

Additional Information

Community
linkedin-favicon_v3.icoLinkedIn
FaceBook_32x32.png Facebook
campfire-logo.jpgChat
 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..