Installing HTCondor as a Batch System

1 About This Document

This document describes the process of installing and configuring HTCondor to use as the batch system for your compute cluster. It is intended to help site administrators who are responsible for setting up and maintaining the cluster. HTCondor is a very large and rich software system, so this document cannot cover all of its possible use cases and configuration settings; instead, it presents the most common use cases and basic configuration. For more information about HTCondor, please refer to the HTCondor Manual.

2 About the HTCondor Package

This document uses the HTCondor RPM from the OSG repository. Our RPM was inspired by the Fedora project, which means that started with their source RPM and modified it for use in the OSG environment. The RPM provides most of HTCondor’s functionality, except for Standard Universe and HTCondor-G support for CREAM and NorduGrid.

Other RPMs, including those from the Fedora or HTCondor repositories, may be similar but the instructions in this document may need some adapting. Installing HTCondor using some other packaging (e.g., a tarball) is very different. Some cluster management tools, like Rocks, may install and configure HTCondor for you.

See CondorInformation for an overview of the different options for installing HTCondor.

Until late 2012, the HTCondor software was known as “Condor”. You may still see the older name referring to the same software.

3 Engineering Considerations

HTCondor must be installed and configured on all nodes of the batch system, including:

  • Worker nodes (aka computer nodes, execute nodes)
  • One or more submit nodes, including the Compute Element and any local interactive submission nodes
  • A head node, containing the HTCondor Central Manager

Installation and some configuration steps must be performed on each node, but the specific configuration for each node varies by the type of that node. For example, the head node and submit nodes may each have custom configuration, but all worker nodes may have the same configuration. For nodes that have similar or identical configuration, it may be helpful to use a cluster management system (e.g., cfengine, puppet, chef) to manage machine configuration centrally, or to use a shared filesystem for common configuration files. Deciding on and implementing a cluster configuration approach is beyond the scope of this document.

Installing HTCondor from the RPMs means that all files are installed in common, default paths; generally, the RPMs follow Filesystem Hierarchy Standard. Therefore, tasks like running common commands and viewing man pages just work for all users following the installation. Further, the installation procedure below is designed to simplify HTCondor software upgrades. Further notes on the layout of files in the HTCondor RPMs:

  • The /var/lib/condor directory contains the HTCondor spool and may be mounted from a different partition (see below)
  • The /etc/condor directory contains the HTCondor configuration files
  • It is possible to use a shared configuration directory (e.g., /nfs/condor/condor-etc); see below for options on sharing configuration

Finally, it is important to decide if you plan to use GSI (x509 PKI) authentication with this HTCondor installation. The CA Certificates are not always necessary. You will need them if you use GSI authentication with HTCondor. If you are not using GSI authentication you can skip the installation of the CA Certificates below.

4 Requirements

4.1 Host and OS

To install HTCondor, you will need:

  • One host for the HTCondor head node (Collector and Negotiator)
  • A Compute Element for an HTCondor submit node (Schedd), plus other local submit nodes as desired
  • Hosts for HTCondor to execute jobs (Startds)
  • OS is Red Hat Enterprise Linux 6, 7, and variants (see details...)
  • Root access

4.2 Network

For more details on overall Firewall configuration, please see our Firewall documentation.

Service Name Protocol Port Number Inbound Outbound Comment
HTCondor collector tcp 9618 Y   HTCondor Collector (received ClassAds from resources and jobs)
HTCondor port range tcp LOWPORT, HIGHPORT Y   contiguous range of ports

  • To force network communication over TCP, set UPDATE_COLLECTOR_WITH_TCP=True in your HTCondor configuration on each node.
  • LOWPORT and HIGHPORT are two values set in the HTCondor configuration file, e.g.,
    LOWPORT = <low port>
    HIGHPORT = <high port>
  • The HTCondor collector port can be changed in the configuration file. For more information please check the Networking section of the HTCondor manual.

5 Installation Procedure

The CA Certificates are not always necessary. You will need them if you use GSI (x509 PKI) authentication with HTCondor. If you are not using GSI authentication you can skip the following section

5.1 Installing HTCondor

  1. Install HTCondor from the OSG yum repository. Choose one of the following, based on your architecture (most users will need the x86_64 version):
    [root@client ~]$ yum install condor.x86_64
    [root@client ~]$ yum install condor.i386

6 Configuring HTCondor

Most likely, your HTCondor system includes many nodes, especially worker nodes. Different nodes or types of nodes may require different configuration files, and yet the different configuration files may have many shared settings. There are several ways to manage your HTCondor configuration files, including:

  • Use a cluster management tool (e.g., cfengine, Puppet, Chef) to manage configuration centrally
  • Manually create and maintain configuration files centrally, and use a shared filesystem to distribute them (especially to worker nodes)
  • Manually create, maintain, and distribute configuration files to each machine

Select a method that works best for you.

6.1 Head Node

Configure your head node by edit the following file, depending on your configuration method:

Configuration method File
Shared filesystem /nfs/condor/condor-etc/condor_config.headnode
Cluster management tool /etc/condor/config.d/local.conf
Manual /etc/condor/config.d/local.conf

And add the following text:


If your head node is also the gatekeeper of the cluster, then you need to set the daemon list to the union of the two e.g. DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD

6.2 Submit Node

Configure your submit node by edit the following file, depending on your configuration method:

Configuration method File
Shared filesystem /nfs/condor/condor-etc/condor_config.submit
Cluster management tool /etc/condor/config.d/local.conf
Manual /etc/condor/config.d/local.conf

And add the following text:


6.3 Worker Node

Configure your worker node by edit the following file, depending on your configuration method:

Configuration method File
Shared filesystem /nfs/condor/condor-etc/condor_config.worker
Cluster management tool /etc/condor/config.d/local.conf
Manual /etc/condor/config.d/local.conf

And add the following text:


6.4 Common Configuration

On all nodes add the appropriate file with the following content (changing the values in red to suit your cluster). You may find some suggestions in a previously installed local configuration file /etc/condor/condor_config.local:

This table specifies parameters and values that are required by each configuration method. Alternatively, this means that if an entire line is in red, it can be removed unless it is specified by your configuration method.

Configuration method File Parameter Value
Shared filesystem /nfs/condor/condor-etc/condor_config.cluster LOCAL_CONFIG_FILE /nfs/condor/condor-etc/condor_config.$(HOSTNAME)
Cluster management tool /etc/condor/config.d/cluster.conf REQUIRE_LOCAL_CONFIG_FILE FALSE
Manual /etc/condor/config.d/cluster.conf REQUIRE_LOCAL_CONFIG_FILE FALSE

## Condor configuration for OSG Clusters
## For more detail please see
LOCAL_CONFIG_FILE = /nfs/condor/condor-etc/condor_config.$(HOSTNAME)
# The following should be your cluster domain. This is an arbitrary string used by Condor, not necessarily matching your IP domain
# Human readable name for your Condor pool
COLLECTOR_NAME = "OSG Cluster Condor at $(UID_DOMAIN)"
# A shared file system (NFS), e.g. job dir, is assumed if the name is the same
# Here you have to use your network domain, or any comma separated list of hostnames and IP addresses including all your 
# condor hosts. * can be used as wildcard
# The following should be the full name of the head node (Condor central manager)
# Port range should be opened in the firewall (can be different on different machines)
# This 9000-9999 is coherent with the iptables configuration in the Firewall documentation 
# This is to force communication over TCP
# This is to enforce password authentication
SEC_PASSWORD_FILE = /var/lib/condor/condor_credential
ALLOW_DAEMON = condor_pool@*
##  Sets how often the condor_negotiator starts a negotiation cycle 
##  for negotiator and schedd). 
#  It is defined in seconds and defaults to 60 (1 minute), default is 300. 
##  Scheduling parameters for the startd
# start as available and do not suspend, preempt or kill
# In this setup we use the config directory instead of the local config

CONDOR_HOST can be set with or without the domain name: gc-ce or

Remaining node configuration

On each node perform these remaining configuration steps.
  1. If present, remove the previously installed /etc/condor/condor_config.local to avoid possible confusion.
    [root@client ~]$ rm /etc/condor/condor_config.local
  2. Set the password that will be used by the Condor system (at the prompt enter the same password for all nodes):
    [root@client ~]$ condor_store_cred -c add
  3. Shared filesystem configuration only: Edit the file /etc/condor/condor_config. This is the default configuration that will be invoked when condor is started. We will direct Condor to follow this configuration with the OSG specific configuration. Set the local config file to:
    ##  Next configuration to be read is for the OSG cluster setup
    LOCAL_CONFIG_FILE       = /nfs/condor/condor-etc/condor_config.cluster
  4. Start Condor and enable automatic startup as illustrated below.

6.5 Special needs

The following sections present instructions or suggestion for uncommon configurations

Changes to the Firewall (iptables)

If you are using a Firewall (e.g. iptables) on all nodes you need to open the ports used by Condor:
  • Edit the /etc/sysconfig/iptables file to add these lines ahead of the reject line:
    -A RH-Firewall-1-INPUT  -s <network_address> -m state --state ESTABLISHED,NEW -p tcp -m tcp --dport 9000:10000 -j ACCEPT  
    -A RH-Firewall-1-INPUT  -s <network_address> -m state --state ESTABLISHED,NEW -p udp -m udp --dport 9000:10000 -j ACCEPT 
    where the network_address is the address of the intranet of the OSG cluster, e.g. (Or the extranet if your OSG cluster does not have a separate intranet). You can omit the -s option if you have nodes of your Condor cluster (startd, schedd, ...) outside of that network.
  • Restart the firewall:
    [root@client ~]$ /sbin/service iptables restart

Mounting a separate partition for /var/lib/condor

/var/lib/condor is the directory used by Condor for status files and spooling, sometime referred as the scratch space. For performance reasons it should always be on a local disk. Is is recommended for it to be big in order to accommodate jobs that use a lot of disk space (e.g. ATLAS recommends 20GB for each job slot on the worker nodes) and possibly on a separate partition so that when a job fills up the disk, it will not fill the system disk and bring down the system. The partition can be mounted on /var/lib/condor before installing Condor or at a later time, e.g.:
[root@client ~]$ /sbin/service condor stop
[root@client ~]$ cd /var/lib
[root@client ~]$ mv condor condor_old
[root@client ~]$ mkdir condor
[root@client ~]$ mount -t ext3 /dev/<your partition> condor
[root@client ~]$ chown condor:condor condor
[root@client ~]$ mv condor_old/* condor/
[root@client ~]$ rmdir condor_old
[root@client ~]$ /sbin/service condor start

7 Services

The HTCondor master on each node taked care to start and monitor the correct services as selected in the configuration file.

7.1 Starting and Enabling Services

To start the services:

  1. Start Condor:
    [root@client ~]$ /sbin/service condor start

You should also enable the appropriate services so that they are automatically started when your system is powered on:

  • Enable Condor (start automatically on boot):
    [root@client ~]$ /sbin/chkconfig condor on

7.2 Stopping and Disabling Services

To stop the services:

  1. Stop Condor:
    [root@client ~]$ /sbin/service condor stop

In addition, you can disable services by running the following commands. However, you don't need to do this normally.

  • Optionally, to disable Condor:
    [root@client ~]$ /sbin/chkconfig condor off

8 Troubleshooting

8.1 Useful configuration and log files

Configuration Files

Service or Process Configuration File Description
condor /etc/condor/condor_config Configuration file
  /etc/condor/condor_config.cluster Configuration file

Log files

Service or Process Log File Description
condor /var/log/condor/ All log files

8.2 Test Condor

After starting Condor you can check if it is running correctly:

[user@client ~]$ condor_config_val log   # (should be /var/log/condor/)
[user@client ~]$ cd /var/log/condor/
#check master log file
[user@client ~]$ less MasterLog
# verify the status of the negotiator
[user@client ~]$ condor_status -negotiator

You can see the resources in your Condor cluster using condor_status and submit test jobs with condor_submit. Check CondorTest? for more.

9 How to get Help?

To get assistance please use this Help Procedure.

10 References

Additional configuration: On using Condor:


Topic revision: r32 - 06 Dec 2016 - 18:12:41 - KyleGross
Hello, TWikiGuest!


TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..