You are here: TWiki > Tier3 Web>CondorRPMInstall (05 Mar 2012, RobGardner)

Installing Condor using the RPM distribution

Tier3
CondorRPMInstall
Released
by MarcoMambelli

Introduction

on If you are using Rocks, commonly used by CMS, or you install using Kickstart files like the one provided by ATLAS, then you may not need any of this. The may install and setup Condor for you. Check your VO documentation first.

We will use the latest stable release of Condor. As of May 27, 2011, this is 7.6.0.

This installation uses the Condor RPM distribution. It can be downloaded from the Condor site or installed using a RPM or yum repository. The Condor team set up a yum repository that can be used for this installation.

Condor needs to be installed (using the procedure below) on all nodes of the batch queue, the headnode and the worker nodes, and also on the interactive nodes used to submit condor jobs.

Certain operations, like the RPM installation and some system configuration, has to be repeated on each node. Some steps like the configuration of condor_config are performed only once, e.g. on the head node osg-ce, others like the customization of the local condor configuration file is different for each node.

Sharing at least the directory hosting the configuration files allows to simplify a bit the configuration by making it easy to make cluster-wide configuration changes. Anyway this installation is possible also having no shared directories if the customized condor_config file is replicated on all nodes of the queue.

Some useful links:


A note about the directory structure: This Condor installation was structured to facilitate upgrades to Condor with minimal effort. Condor RPM follows the Filesystem Hierarchy Standard. For more information on the directory structure check the release notes. The /var/lib/condor directory contains the Condor spool and may be mounted from a different partition as detailed in the section about isolated spool directory. In addition to the files provided by the RPM there is a shared directory to simplify the configuration (/nfs/condor/condor-etc). Below you can find how to avoid any shared file.

Preparing the yum install

If you don't have it already, download the YUM repository information provided by the Condor team in http://www.cs.wisc.edu/condor/yum/repo.d/, e.g. for RHEL5 (and derived):
cd /etc/yum.repos.d
wget http://www.cs.wisc.edu/condor/yum/repo.d/condor-stable-rhel5.repo

Condor Installation and Configuration

On each node Start with installing Condor from the repository
yum install condor

Shared configuration files

On the server exporting /nfs/condor/condor-etc (other nodes cannot write if you choose to export the directory with root squash) edit the following configuration files:
  • Create the cluster Condor configuration file /nfs/condor/condor-etc/condor_config.cluster with the following content:
    • Copy the following content (also attached in condor_config.cluster) changing the values to suite your cluster (yourdomain.org, gc1-ce.yourdomain.org, in red). You may find some suggestion in the local configuration file /etc/condor/condor_config.local:
      ## Condor configuration for OSG T3
      ## For more detial please see
      ## http://www.cs.wisc.edu/condor/manual/v7.4/3_3Configuration.html
      LOCAL_CONFIG_FILE = /nfs/condor/condor-etc/condor_config.$(HOSTNAME)
      # The following should be your T3 domain
      UID_DOMAIN = yourdomain.org
      # Human readable name for your Condor pool
      COLLECTOR_NAME = "Tier 3 Condor at $(UID_DOMAIN)"
      # A shared file system (NFS), e.g. job dir, is assumed if the name is the same
      FILESYSTEM_DOMAIN = $(UID_DOMAIN)
      ALLOW_WRITE = *.$(UID_DOMAIN)
      CONDOR_ADMIN = root@$(FULL_HOSTNAME)
      # The following should be the full name of the head node
      CONDOR_HOST = gc1-ce.yourdomain.org
      # Port range should be opened in the firewall (can be different on different machines)
      # This 9000-9999 is coherent with the iptables configuration in the T3 documentation 
      IN_HIGHPORT = 9999
      IN_LOWPORT = 9000
      # This is to enforce password authentication
      SEC_DAEMON_AUTHENTICATION = required
      SEC_DAEMON_AUTHENTICATION_METHODS = password
      SEC_CLIENT_AUTHENTICATION_METHODS = password,fs,gsi
      SEC_PASSWORD_FILE = /var/lib/condor/condor_credential
      ALLOW_DAEMON = condor_pool@*
      ##  Sets how often the condor_negotiator starts a negotiation cycle 
      ##  for negotiator and schedd). 
      #  It is defined in seconds and defaults to 60 (1 minute), default is 300. 
      NEGOTIATOR_INTERVAL = 20
      ##  Scheduling parameters for the startd
      TRUST_UID_DOMAIN = TRUE
      # start as available and do not suspend, preempt or kill
      START = TRUE
      SUSPEND = FALSE
      PREEMPT = FALSE
      KILL = FALSE
      
    • Make sure that you have the following important line in the file
      CONDOR_HOST = gc1-ce
      • Note: CONDOR_HOST can be set with or without the domain name: gc1-ce or gc1-ce.yourdomain.org
  • On the NFS server create the files with the host configuration specific for the nodes using the following content. We will create 3 base configuration files: one for the headnode, one for worker nodes, one for the interactive nodes (user interface). (specific for the headnode) copying the following line:
    • For the headnode, /nfs/condor/condor-etc/condor_config.headnode:
      ## OSG T3 host configuration
      ## For more info: http://www.cs.wisc.edu/condor/manual/v7.4/3_3Configuration.html
      # List of daemons on the node (headnode requires collector and negotiator, 
      # schedd required to submit jobs, startd to run jobs)
      DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR
      
    • For the worker nodes, /nfs/condor/condor-etc/condor_config.worker:
      ## OSG T3 host configuration
      ## For more info: http://www.cs.wisc.edu/condor/manual/v7.4/3_3Configuration.html
      # List of daemons on the node (headnode requires collector and negotiator, 
      # schedd required to submit jobs, startd to run jobs)
      DAEMON_LIST = MASTER, STARTD
      
    • For the interactive nodes, /nfs/condor/condor-etc/condor_config.interactive:
      ## OSG T3 host configuration
      ## For more info: http://www.cs.wisc.edu/condor/manual/v7.4/3_3Configuration.html
      # List of daemons on the node (headnode requires collector and negotiator, 
      # schedd required to submit jobs, startd to run jobs)
      DAEMON_LIST = MASTER, SCHEDD
      
  • Then, always on the NFS server, for each node create a link pointing to the template, e.g.:
    cd /nfs/condor/condor-etc/
    ln -s condor_config.headnode condor_config.gc1-ce
    ln -s condor_config.interactive condor_config.gc1-ui1
    ln -s condor_config.worker condor_config.gc1-c001
    ln -s condor_config.worker condor_config.gc1-c002
    ln -s condor_config.worker condor_config.gc1-c003
    
    Each node must have its own condor_config.<hostname> file. If some nodes require a special configuration you can copy the template (e.g. condor_config.worker) and customize it.

Remaining node configuration

On each node perform these remaining configuration steps.
  • Edit the file /etc/condor/condor_config. This is the default configuration that will be invoked when condor is started. We will direct this file to be followed by specific configurations for T3 purposes. Replace:
    ##  Where is the machine-specific local config file for each host?
    LOCAL_CONFIG_FILE      = $(RELEASE_DIR)/etc/$(HOSTNAME).local
    
    With
    ##  Next configuration to be read is for the T3 cluster setup
    LOCAL_CONFIG_FILE       = /nfs/condor/condor-etc/condor_config.cluster
    

  • Remove the default condor_config.local in the /etc/condor directory to avoid possible confusion.
    rm /etc/condor/condor_config.local
    
  • Set the password that will be used by the Condor system (at the prompt enter the same password for all nodes):
    condor_store_cred -c add
    
  • Enable automatic startup at boot:
    chkconfig --level 235 condor on
    

Start and test Condor

Condor is starting automatically during reboots. You can start it manually typing
/etc/init.d/condor start 
(should say ok)

You can check if Condor is running correctly

condor_config_val log   # (should be /var/log/condor/)
cd /var/log/condor/
#check master log file
less MasterLog
# verify the status of the negotiator
condor_status -negotiator

You can see the resources in your Condor cluster using condor_status and submit test jobs with condor_submit. Check CondorTest for more.

Setup

Condor is installed in the default path, so there is no need of special setup to use it. It will be automatically in the environment of every user.

Upgrades

Only one version of Condor at the time can be installed via RPM and used. To install a different version just remove the old RPM and install the new one following the instructions above. The configuration files in the shared directory will persist so you can skip that step during updates.

Special needs

The following sections present instructions or suggestion for uncommon configurations

Changes to the Firewall (iptables)

If you are using a Firewall (e.g. iptables) on all nodes you need to open the ports used by Condor:
  • Edit the /etc/sysconfig/iptables file to add these lines ahead of the reject line:
    -A RH-Firewall-1-INPUT  -s <network_address> -m state --state ESTABLISHED,NEW -p tcp -m tcp --dport 9000:10000 -j ACCEPT  
    -A RH-Firewall-1-INPUT  -s <network_address> -m state --state ESTABLISHED,NEW -p udp -m udp --dport 9000:10000 -j ACCEPT 
    
    where the network_address is the address of the intranet of the T3 cluster, e.g. 192.168.192.0/18. (Or the extranet if your T3 does not have a separate intranet). You can omit the -s option if you have nodes of your Condor cluster (startd, schedd, ...) outside of that network.
  • Restart the firewall:
    /etc/init.d/iptables restart
    

Installation without any shared directory

If you choose not to use NFS in your cluster and there is no shared /nfs/condor/condor-etc/ the section above about shared configuration files is not valid. All the configuration files should be in /etc/condor.
  • condor_config should be edited to have
    LOCAL_CONFIG_FILE = /etc/condor/condor_config.cluster
  • condor_config.cluster should be created as described and replicated on each node in /etc/condor and should contain the modified:
    LOCAL_CONFIG_FILE = /nfs/condor/condor-etc/condor_config.$(HOSTNAME)
  • condor_config.<hostname> should be created on each node in /etc/condor by copying the proper condor_config.headnode/worker/interactive described above and by customizing it for the needs of the node.

Changes to the cluster config or to the configuration of one of the node types require synchronization by replicating the proper files after the change.

Mounting a separate partition for /var/lib/condor

/var/lib/condor is the directory used by Condor for status files and spooling (/scratch/condor in the shared installation). For performance reason it should always be a local disk. Is is recommended for it to be big in order to accommodate jobs that use a lot of disk space (e.g. ATLAS recommends 20GB for each job slot on the worker nodes) and possibly on a separate partition so that when a job fills up the disk, it will not fill the system disk and bring down the system. The partition can be mounted on /var/lib/condor before installing Condor or at a latter time, e.g.:
/etc/init.d/condor stop
cd /var/lib
mv condor condor_old
mkdir condor
mount -t ext3 /dev/<your partition> condor
chown condor:condor condor
mv condor_old/* condor/
rmdir condor_old
/etc/init.d/condor start

Use the old RPMs from Condor

The new RPMs distributed by the condor team are much better than the previous one, so the use of the previous one is not supported. Anyway if you must use the old RPMs you can check the old instructions for RPM installation to see the additional steps necessary to complete the installation.


Complete: 2
Responsible: MarcoMambelli - 17 Nov 2009
Reviewer - date:

Comments

Attach example of condor_config MarcoMambelli 12 Feb 2010 - 20:15

Topic attachments
I Attachment Action Size Date Who Comment
elsecluster condor_config.cluster manage 1.5 K 18 Feb 2010 - 22:43 MarcoMambelli  
Topic revision: r20 - 05 Mar 2012 - 18:07:49 - RobGardner
 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..