Compute Element Install

Tier3
ComputeElementInstall
Owner MarcoMambelli
Area Tier3
Role SysAdmin
Type Installation
Owner MarcoMambelli
Not Released

Introduction

This is a tutorial to demonstrate how to install and setup a basic CE installation. It will guide users through a basic CE installation step by step. At the completion of this guide, you should have a simple CE available.

These instructions assume that the Compute Element software is being installed on system following the guidelines for OSG Tier 3s. These include a RHEL based OS and a Condor resource manager, installed as described in ModulesIntro. Other recommandations are at the beginning of that document. As usual different configuration will likely work but you may have to change/adapt the instructions.

A more complete set of instructions is in ComputeElementInstall from the release documentation.

Requirements

You'll need a server with the following:
  • RHEL 4 or 5 based Linux distribution (reference OS is Scientific Linux 5.4)
  • root access
  • ~5 GB of free space
  • Condor installed using the RPM distribution or the shared tar installation
  • internet access (at least outbound connectivity)
  • Host, http, and rsv certificates (you can copy the host certificate and use it for the http and rsv services)

To perform the tests you'll also need the following:

  • Personal grid certificate

Getting started

Installing pacman

Pacman is a package management program used to install OSG software. PacmanInstall describes how to install Pacman which can be installed by either the cluster administrator or a non-privileged account. For example the cluster administrator might install this in /osg/app/pacman or /opt/pacman such as in the following
cd /opt
wget http://atlas.bu.edu/~youssef/pacman/sample_cache/tarballs/pacman-3.29.tar.gz
tar --no-same-owner -xzvf pacman-3.29.tar.gz
cd pacman-3.29
source setup.sh
cd ..
ln -s pacman-3.29 pacman
Once installed setup its environment with for example source /opt/pacman/setup.sh.

Creating directories

Create the installation directory and /etc/grid security :
mkdir /etc/grid-security
mkdir /etc/grid-security/http

Next, the host and service certificates will need to be placed in the proper places in /etc/grid-security. These instructions assume that the following files are in the current directory. GetHostServiceCertificates provides instruction on how to obtain host and service certificates/keys. If you have the host certificate/key but no service certificates/keys and are unable to obtain them, a copy of the host certificate will work as temporary solution.

  • hostcert.pem - host certificate file
  • hostkey.pem - host key file
  • httpcert.pem - http service certificate file
  • httpkey.pem - http service key file
  • rsvcert.pem - RSV service certificate file
  • rsvkey.pem - RSV service key file

Copy the host, RSV and http certificates and keys to /etc/grid-security or the correct subdirectory:

cp hostcert.pem hostkey.pem /etc/grid-security/
cp rsvcert.pem rsvkey.pem /etc/grid-security/
cp httpcert.pem httpkey.pem /etc/grid-security/http   
Make sure that the owner is root:
chown root:root /etc/grid-security/host*
chown rsvuser:rsvuser /etc/grid-security/rsv*
chown root:root /etc/grid-security/http/http*
Set the correct permissions for the certificate and key files:
chmod 400 /etc/grid-security/hostkey.pem
chmod 444 /etc/grid-security/hostcert.pem
chmod 400 /etc/grid-security/rsvkey.pem
chmod 444 /etc/grid-security/rsvcert.pem
chmod 400 /etc/grid-security/http/httpkey.pem
chmod 444 /etc/grid-security/http/httpcert.pem=

Now create the OSG service directories with the correct permissions: OSG_APP, OSG_DATA, OSG_GRID. More information on all the possible service directories and on recommended configurations for OSG sites is in LocalStorageConfiguration. We assume a shared file system is used with the structure described in ClusterNFSSetup.

ln -s /nfs/osg /osg  # if you did not already
mkdir -p /osg/app
mkdir -p /osg/app/etc
chmod 1777 /osg/app
chmod 1777 /osg/app/etc
mkdir -p /osg/data
chmod 1777 /osg/data
mkdir -p /osg/wn

RSV preparations

If you are running RSV using a service certificate, add the service certificate to your grid-mapfile and map it to the RSV user (rsvuser).

Install worker node software

If not already installed, install the OSG Worker Node software stack as in the Worker Node section of ModulesIntro.

Install and configure the CE

Create an install directory and install the Compute Element:
mkdir /opt/osgce
cd /opt/osgce
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:ce
Set the following variables to point to your Condor installation so that it is used by the CE. If you have a local resource manager different from Condor (e.g. PBS or LSF) check the job manager section in the Release CE Install document instead. If you used the RPM distribution then set:
export VDTSETUP_CONDOR_LOCATION=/usr
export VDTSETUP_CONDOR_CONFIG=/etc/condor/condor_config
else if you did the shared tar installation set:
export VDTSETUP_CONDOR_LOCATION=/opt/condor
export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config
Now install the Condor jobmanager setup package:
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:Globus-Condor-Setup
Install managed fork:
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:ManagedFork

Configure CE

Run the post-install script:
source /opt/osgce/setup.sh 
vdt-post-install 
Setup CA certificates:
vdt-ca-manage setupca --location local --url osg
In CaCertificatesInstall and in this VDT page you can find information on different options for the installation and configuration of the CA certificates using vdt-ca-manage.

Edit osg/etc/config.ini changing the following entries (use also the comments in the file as guidelines/explanation):

  • In the [Default] section:
    • Set localhost to the correct FQDN (Fully qualified Domain Name) for your CE (e.g. localhost = gc1-ce.yourdomain.org)
    • Set admin_email to your email address
  • In the [Site Information] section
    • Set group to OSG for a production site (choose OSG-ITB if if is a test or you are not interested in being accounted)
    • Set resource_group to your facility's name (e.g. GC1).
    • Set resource to your site's name (e.g. GC1_CE). This has to be unique in OSG!
    • sponsor Is the VO sponsoring your site, use 'osg' if none.
    • site_policy is a URL to your site's usage policy. A document like: http://www.mwt2.org/policy.html
    • contact, set this to your name.
    • Set email if different from the one entered above.
    • city to the city that the cluster is located in.
    • country to the country that the cluster is located in.
    • latitude to the latitude of the cluster. Use the city given in the city option, see [http://www.mapsofworld.com/lat_long/][this page]] for a list, or here is a list of cities in the USA. You can set this to 0 if you do not know your latitude.
    • longitude to the longitude of the cluster.
  • In the [Condor] section set
    • enabled to %(enable)s
    • condor_location to /usr or /opt/condor depending on what you assigned to VDTSETUP_CONDOR_LOCATION above
    • condor_config to /etc/condor/condor_config or /opt/condor/etc/condor_config depending on whet you assigned to VDTSETUP_CONDOR_CONFIG
    • wsgram to %(enable)s
  • In the [ManagedFork] section set
    • enabled to %(enable)s
  • In the [Misc Services] section, set :
    • use_cert_updater to %(enable)s
    • If the CE is using gums, set:
      • authorization_method to xacml
      • gums_host to your GUMS server, e.g. gc1-gums.yourdomain.org
    • If the CE is using a gridmap file, set:
      • authorization_method to gridmap
    • enable_webpage_creation to %(enable)s
  • In the [Storage] section, set:
    • grid_dir to /osg/wn
    • app_dir to /osg/app
    • data_dir to /osg/data
    • worker_node_temp to /tmp
  • In the [GIP] section, set:
    • batch to condor
  • In the [Subcluster CHANGEME] section, set the following options:
    • Change name of the section to [Subcluster Main]
    • name to Main
    • node_count to the number of worker nodes your cluster has. At least 1
    • ram_mb to the amount of ram that worker nodes in your cluster have in MB (check: cat /proc/meminfo). Has to be >=500
    • cpu_model to the model of your cluster's CPUs. You can get this information by running cat /proc/cpuinfo
    • cpu_vendor to the vendor of your cluster's CPUs (e.g. Intel or AMD)
    • cpu_speed_mhz to the clock speed of your cluster's CPUs in MHz. For example, a 2.83GHz CPU runs at 2830 MHz.
    • cpu_platform to the CPU platform (e.g. i686 or x86_64)
    • cpus_per_node to number of chips per node (e.g. 1)
    • cores_per_node to the total number of cores in the node (e.g. 8 for a dual socket, quad core node)
  • In the [SE CHANGEME] section, set the following options:
    • enabled to %(disable)s
  • In the [RSV] section, set the following options:
    • enabled to %(enable)s
    • use_service_cert to %(enable)s
    • rsv_cert_file to /etc/grid-security/rsvcert.pem (should be the default)
    • rsv_key_file to /etc/grid-security/rsvkey.pem (hsould be the default)
    • setup_for_apache to %(enable)s

Verify configuration with configure-osg -v, fix eventual error or missing entries, then configure the system:

configure-osg -c

For more option on the CE configuration check the Configuration section of the CE Install document in the release documentation.

Start/stop the CE

To start the CE (ten will remain on also after a reboot):
source /opt/osgce/setup.sh
vdt-control --on

To stop a CE:

source /opt/osgce/setup.sh
vdt-control --off

Verification

  • Run site_verify. You can add your host certificate to /etc/grid-security/grid-mapfile to be able to run as root. Even better, if you have a user owning a certificate already in your grid-mapfile or GUMS, log in as that user and, after sourcing the setup file, initialize the proxy with grid-proxy-init. Then continue with running $VDT_LOCATION/verify/site_verify.pl.
[root@gc1-ce osgce]# source /opt/osgce/setup/sh
[root@gc1-ce osgce]# cd $VDT_LOCATION/verify
[root@gc1-ce verify]# ./site_verify.pl 
===============================================================================
Info: Site verification initiated at Thu Jun 24 18:25:13 2010 GMT.
===============================================================================
-------------------------------------------------------------------------------
---------- Begin gc1-ce.uchicago.edu at Thu Jun 24 18:25:13 2010 GMT ----------
-------------------------------------------------------------------------------
Checking prerequisites needed for testing: PASS
Checking for a valid proxy for root@gc1-ce.uchicago.edu: PASS
Checking if remote host is reachable: PASS
Checking for a running gatekeeper: YES; port 2119
Checking authentication: PASS
Checking 'Hello, World' application: PASS
Checking remote host uptime: PASS
   13:25:18 up 126 days, 18:24,  2 users,  load average: 0.32, 0.08, 0.02
Checking remote Internet network services list: PASS
Checking remote Internet servers database configuration: PASS
Checking for GLOBUS_LOCATION: /opt/osgce/globus
Checking expiration date of remote host certificate: Apr 23 19:29:04 2011 GMT
Checking for gatekeeper configuration file: YES
  /opt/osgce/globus/etc/globus-gatekeeper.conf
Checking users in grid-mapfile, if none must be using Prima: alice,cdf,cigi,compbiogrid,dayabay,des,dosar,engage,fermilab,geant4,glow,gluex,gpn,grase,gridunesp,grow,hcc,i2u2,icecube,ilc,jdem,ligo,mis,nanohub,nwicg,nysgrid,ops,osg,osgedu,samgrid,sbgrid,star,usatlas1,uscms01
Checking for remote globus-sh-tools-vars.sh: YES
Checking configured grid services: PASS
  jobmanager,jobmanager-condor,jobmanager-fork,jobmanager-managedfork
Checking for OSG osg-attributes.conf: YES
Checking scheduler types associated with remote jobmanagers: PASS
  jobmanager is of type managedfork
  jobmanager-condor is of type condor
  jobmanager-fork is of type managedfork
  jobmanager-managedfork is of type managedfork
Checking for paths to binaries of remote schedulers: PASS
  Path to condor binaries is /opt/condor/bin
  Path to managedfork binaries is .
Checking remote scheduler status: PASS
  condor : 1 jobs running, 0 jobs idle/pending
Checking if Globus is deployed from the VDT: YES; version 2.0.0p16
Checking for OSG version: NO
Checking for OSG grid3-user-vo-map.txt: YES
  osgedu users: osgedu
  atlas users: usatlas1
Checking for OSG site name: UC_GC1_CE
Checking for OSG $GRID3 definition: /opt/osgce
Checking for OSG $OSG_GRID definition: /osg/wn
Checking for OSG $APP definition: /osg/app
Checking for OSG $DATA definition: /osg/data
Checking for OSG $TMP definition: /osg/data
Checking for OSG $WNTMP definition: /tmp
Checking for OSG $OSG_GRID existence: PASS
Checking for OSG $APP existence: PASS
Checking for OSG $DATA existence: PASS
Checking for OSG $TMP existence: PASS
Checking for OSG $APP writability: FAIL
Checking for OSG $DATA writability: FAIL
Checking for OSG $TMP writability: FAIL
Checking for OSG $APP available space: 16.142 GB
Checking for OSG $DATA available space: 16.142 GB
Checking for OSG $TMP available space: 16.142 GB
Checking for OSG additional site-specific variable definitions: YES
  MountPoints
    SAMPLE_LOCATION default /SAMPLE-path
    SAMPLE_SCRATCH devel /SAMPLE-path
Checking for OSG execution jobmanager(s): gc1-ce.uchicago.edu/jobmanager-condor
Checking for OSG utility jobmanager(s): gc1-ce.uchicago.edu/jobmanager
Checking for OSG sponsoring VO: 'osg'
Checking for OSG policy expression: NONE
Checking for OSG setup.sh: YES
Checking for OSG $Monalisa_HOME definition: /opt/osgce/MonaLisa
Checking for MonALISA configuration: PASS
  key ml_env vars:
    FARM_NAME = gc1-ce.uchicago.edu
    FARM_HOME = /opt/osgce/MonaLisa/Service/VDTFarm
    FARM_CONF_FILE = /opt/osgce/MonaLisa/Service/VDTFarm/vdtFarm.conf
    SHOULD_UPDATE = false
    URL_LIST_UPDATE = http://monalisa.cacr.caltech.edu/FARM_ML,http://monalisa.cern.ch/MONALISA/FARM_ML
  key ml_properties vars:
    lia.Monitor.group = Test
    lia.Monitor.useIPaddress = undef
    MonaLisa.ContactEmail = root@gc1-ce.uchicago.edu
Checking for a running MonALISA: NO
  MonALISA does not appear to be running
Checking for a running GANGLIA gmond daemon: NO
  gmond does not appear to be running
Checking for a running GANGLIA gmetad daemon: NO
  gmetad does not appear to be running
Checking for a running gsiftp server: YES; port 2811
Checking gsiftp (local client, local host -> remote host): PASS
Checking gsiftp (local client, remote host -> local host): PASS
Checking that no differences exist between gsiftp'd files: PASS
-------------------------------------------------------------------------------
----------- End gc1-ce.uchicago.edu at Thu Jun 24 18:28:52 2010 GMT -----------
-------------------------------------------------------------------------------
===============================================================================
Info: Site verification completed at Thu Jun 24 18:28:52 2010 GMT.

If you have problems you can look at TroubleshootingComputeElement to locate the log files and find some more tests and troubleshooting suggestions.

For support check Tier3Help.

References

  1. ComputeElementInstall - Compute element on the release documentation
  2. Release Documentation - Release documentation Web
  3. TroubleshootingComputeElement - CE troubleshooting



Comments

PM2RPM?_TASK = CE RobertEngel 28 Aug 2011 - 06:15

-- MarcoMambelli - 28 May 2010

Topic revision: r11 - 22 Feb 2012 - 16:31:22 - KyleGross
 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..