on If you are using Rocks, commonly used by CMS, or you install using Kickstart files like the one provided by ATLAS, then you may not need any of this. The may install and setup Condor for you. Check your VO documentation first.
We will use the latest stable release of Condor. As of May 27, 2011, this is 7.6.0.
This installation uses the Condor RPM distribution. It can be downloaded from the Condor site or installed using a RPM or yum repository.
The Condor team set up a yum repository that can be used for this installation.
Condor needs to be installed (using the procedure below) on all nodes of the batch queue, the headnode and the worker nodes, and also on the interactive nodes used to submit condor jobs.
Certain operations, like the RPM installation and some system configuration, has to be repeated on each node.
Some steps like the configuration of condor_config
are performed only once, e.g. on the head node
, others like the customization of the local condor configuration file is different for each node.
Sharing at least the directory hosting the configuration files allows to simplify a bit the configuration by making it easy to make cluster-wide configuration changes.
Anyway this installation is possible also having no shared directories if the customized condor_config
file is replicated on all nodes of the queue.
Some useful links:
A note about the directory structure:
This Condor installation was structured to facilitate upgrades to Condor with minimal effort. Condor RPM follows the Filesystem Hierarchy Standard
. For more information on the directory structure check the release notes
directory contains the Condor spool and may be mounted from a different partition as detailed in the section about isolated spool directory
In addition to the files provided by the RPM there is a shared directory to simplify the configuration (
). Below you can find how to avoid any shared file
If you don't have it already, download the YUM repository information provided by the Condor team in
, e.g. for RHEL5 (and derived):
On each node
Start with installing Condor from the repository
yum install condor
Shared configuration files
On the server exporting
(other nodes cannot write if you choose to export the directory with root squash) edit the following configuration files:
Remaining node configuration
On each node
perform these remaining configuration steps.
- Edit the file
/etc/condor/condor_config. This is the default configuration that will be invoked when condor is started. We will direct this file to be followed by specific configurations for T3 purposes. Replace:
## Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## Next configuration to be read is for the T3 cluster setup
LOCAL_CONFIG_FILE = /nfs/condor/condor-etc/condor_config.cluster
Condor is starting automatically during reboots. You can start it manually typing
(should say ok)
You can check if Condor is running correctly
condor_config_val log # (should be /var/log/condor/)
#check master log file
# verify the status of the negotiator
You can see the resources in your Condor cluster using
and submit test jobs with
Condor is installed in the default path, so there is no need of special setup to use it. It will be automatically in the environment of every user.
Only one version of Condor at the time can be installed via RPM and used.
To install a different version just remove the old RPM and install the new one following the instructions above.
The configuration files in the shared directory will persist so you can skip that step during updates.
The following sections present instructions or suggestion for uncommon configurations
Changes to the Firewall (iptables)
If you are using a Firewall (e.g. iptables) on all nodes you need to open the ports used by Condor:
- Edit the
/etc/sysconfig/iptables file to add these lines ahead of the reject line:
-A RH-Firewall-1-INPUT -s <network_address> -m state --state ESTABLISHED,NEW -p tcp -m tcp --dport 9000:10000 -j ACCEPT
-A RH-Firewall-1-INPUT -s <network_address> -m state --state ESTABLISHED,NEW -p udp -m udp --dport 9000:10000 -j ACCEPT
where the network_address is the address of the intranet of the T3 cluster, e.g. 192.168.192.0/18. (Or the extranet if your T3 does not have a separate intranet). You can omit the
-s option if you have nodes of your Condor cluster (startd, schedd, ...) outside of that network.
- Restart the firewall:
Installation without any shared directory
If you choose not to use NFS in your cluster and there is no shared
the section above about shared configuration files is not valid. All the configuration files should be in
Changes to the cluster config or to the configuration of one of the node types require synchronization by replicating the proper files after the change.
Mounting a separate partition for /var/lib/condor
is the directory used by Condor for status files and spooling (
in the shared installation).
For performance reason it should always be a local disk.
Is is recommended for it to be big in order to accommodate jobs that use a lot of disk space (e.g. ATLAS recommends 20GB for each job slot on the worker nodes) and possibly on a separate partition so that when a job fills up the disk, it will not fill the system disk and bring down the system.
The partition can be mounted on
before installing Condor or at a latter time, e.g.:
mv condor condor_old
mount -t ext3 /dev/<your partition> condor
chown condor:condor condor
mv condor_old/* condor/
Use the old RPMs from Condor
The new RPMs distributed by the condor team are much better than the previous one, so the use of the previous one is not supported. Anyway if you must use the old RPMs you can check the old instructions for RPM installation
to see the additional steps necessary to complete the installation.
- 17 Nov 2009
Reviewer - date:
| Attach example of condor_config
|| 12 Feb 2010 - 20:15