Installing and Maintaining HTCondor-CE

About This Guide

The HTCondor-CE software is a job gateway for an OSG Compute Element (CE). As such, HTCondor-CE is the entry point for jobs coming from the OSG — it handles authorization and delegation of jobs to your local batch system. In OSG today, most CEs accept pilot jobs from a factory, which in turn are able to accept and run end-user jobs.

Use this page to learn how to install, configure, run, test, and troubleshoot HTCondor-CE from the OSG software repositories.

Before Starting

Before starting the installation process, consider the following points (consulting the Reference section below as needed):

  • User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia (UID 42401)
  • Service certificate: The HTCondor-CE service uses a host certificate at /etc/grid-security/hostcert.pem and an accompanying key at /etc/grid-security/hostkey.pem
  • Network ports: The pilot factories must be able to contact your HTCondor-CE service on ports 9619 and 9620 for condor versions < 8.3.2 (TCP)
  • Host choice: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster

As with all OSG software installations, there are some one-time (per host) steps to prepare in advance:

Installing HTCondor-CE

An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., GridFTP, a Gratia probe, authorization software). To simplify installation, OSG provides convenience RPMs that install all required software with a single command.

  1. Clean yum cache:

    [root@client ~]$ yum clean all --enablerepo=*
  2. Update software:

    [root@client ~]$ yum update

    This command will update all packages on your system.

  3. If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step.

    If your batch system is… Then run the following command…
    HTCondor yum install empty-condor --enablerepo=osg-empty
    PBS yum install empty-torque --enablerepo=osg-empty
    SGE yum install empty-gridengine --enablerepo=osg-empty
    SLURM yum install empty-slurm --enablerepo=osg-empty
  4. If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo. Otherwise, skip to the next step.

    exclude=condor empty-condor
  5. Select the appropriate convenience RPM(s):

    If your batch system is… Then use the following package(s)…
    HTCondor osg-ce-condor
    LSF osg-ce-lsf
    PBS osg-ce-pbs
    SGE osg-ce-sge
    SLURM osg-ce-slurm
  6. Install the CE software:

    [root@client ~]$ yum install PACKAGE(S)

HELP NOTE
To ease the transition from GRAM to HTCondor-CEs, the convenience RPMs install both types of job gateway software. By default, the HTCondor gateway is enabled and the GRAM gateway is disabled, which is the correct configuration for most HTCondor-CE-based sites (but see the gateway configuration section below for more options).

HELP NOTE
HTCondor-CE version 1.6 or later is required to send site resource information to OSG for matching jobs to resources.

Configuring HTCondor-CE

There are a few required configuration steps to connect HTCondor-CE with your batch system and authorization method. For more advanced configuration, see the section on optional configurations.

Enabling HTCondor-CE

If you are installing HTCondor-CE on a new host, the default configuration is correct and you can skip this step and continue onto Configuring the batch system! However, if you are updating a host that used a Globus GRAM job gateway (aka the Globus gatekeeper), you must enable the HTCondor job gateway.

  1. Decide whether to disable GRAM (the preferred option) or run both HTCondor and GRAM CEs

  2. Edit the gateway configuration file /etc/osg/config.d/10-gateway.ini to reflect your choice

    To enable HTCondor-CE and disable GRAM CE:

    gram_gateway_enabled = False
    htcondor_gateway_enabled = True

    To enable both HTCondor and GRAM CEs:

    gram_gateway_enabled = True
    htcondor_gateway_enabled = True

More information about the Globus GRAM CE can be found here.

Configuring the batch system

Enable your batch system by editing the enabled field in the /etc/osg/config.d/20-YOUR BATCH SYSTEM.ini file:

enabled = True

Batch systems other than HTCondor

If you are using HTCondor as your local batch system (i.e., in addition to your HTCondor-CE), skip to the configuring authorization section. For other batch systems (e.g., PBS, LSF, SGE, SLURM), keep reading.

Sharing the spool directory

To transfer files between the CE and the batch system, HTCondor-CE requires a shared file system. The current recommendation is to run a dedicated NFS server (whose installation is beyond the scope of this document) on the CE host. In this setup, HTCondor-CE writes to the local spool directory, the NFS server exports the it, and the NFS server shares the it with all of the worker nodes.

HELP NOTE
If you choose not to host the NFS server on your CE, you will need to turn off root squash so that the HTCondor-CE daemons can write to the spool directory.

By default, the spool directory is /var/lib/condor-ce but you can control this by setting SPOOL in /etc/condor-ce/config.d/99-local.conf (create this file if it doesn't exist). For example, the following sets the SPOOL directory to /home/condor:

SPOOL=/home/condor

HELP NOTE
The shared spool directory must be readable and writeable by the condor user for HTCondor-CE to function correctly.

Disable worker node proxy renewal

Worker node proxy renewal is not used by HTCondor-CE and leaving it on will cause some jobs to be held. Edit /etc/blah.config on the HTCondor-CE host and set the following two values:

blah_disable_wn_proxy_renewal=yes
blah_delegate_renewed_proxies=no
blah_disable_limited_proxy=yes

HELP NOTE
There should be no whitespace around the =.

Configuring authorization

There are two methods to manage authorization for incoming jobs, edg-mkgridmap and GUMS. edg-mkgridmap is easy to set up and maintain, and GUMS has more features and capabilities. We recommend using edg-mkgridmap unless you have specific needs that require the use of GUMS. Some examples of these specific requirements are:

  • You want to map users based on rules
  • You need to support multiple VO roles
  • You need to support gLExec for pilot jobs

Authorization with edg-mkgridmap

To configure your CE to use edg-mkgridmap:

  1. Follow the configuration instructions in the edg-mkgridmap document to define the VOs that your site accepts

  2. Set some critical gridmap attributes by editing the /etc/osg/config.d/10-misc.ini file on the HTCondor-CE host:

    authorization_method = gridmap
    

Authorization with GUMS

  1. Follow the instructions in the GUMS installation and configuration document to prepare GUMS

  2. Set some critical GUMS attributes by editing the /etc/osg/config.d/10-misc.ini file on the HTCondor-CE host:

    authorization_method = xacml
    gums_host = YOUR GUMS HOSTNAME
    

HELP NOTE
Once gsi-authz.conf is in place, your local HTCondor will attempt to utilize the LCMAPS callouts if enabled in the condor_mapfile. If this is not the desired behavior, set GSI_AUTHZ_CONF=/dev/null in the local HTCondor configuration.

Configuring CE collector advertising

To split jobs between the various sites of the OSG, information about each site's capabilities are uploaded to a central collector. The job factories then query the central collector for idle resources and submit pilot jobs to the available sites. To advertise your site, you will need to enter some information about the worker nodes of your clusters.

Please see the Subcluster / Resource Entry configuration document about configuring the data that will be uploaded to the central collector.

Applying configuration settings

Making changes to the OSG configuration files in the /etc/osg/config.d directory does not apply those settings to software automatically. Settings that are made outside of the OSG directory take effect immediately or at least when the relevant service is restarted. For the OSG settings, use the osg-configure tool to validate (to a limited extent) and apply the settings to the relevant software components. The osg-configure software is included automatically in an HTCondor-CE installation.

  1. Make all changes to .ini files in the /etc/osg/config.d directory

    Note: This document describes the critical settings for HTCondor-CE and related software. You may need to configure other software that is installed on your HTCondor-CE host, too.

  2. Validate the configuration settings

    [root@client ~]$ osg-configure -v

    Fix any errors (at least) that osg-configure reports.

  3. Once the validation command succeeds without errors, apply the configuration settings:

    [root@client ~]$ osg-configure -c
  4. Generate a user-vo-map file with your authorization set up:

    1. If you're using edg-mkgridmap, run the following:

      [root@client ~]$ edg-mkgridmap
    2. If you're using GUMS, run the following:

      [root@client ~]$  gums-host-cron

Optional configuration

The following configuration steps are optional and will likely not be required for setting up a small site. If you do not need any of the following special configurations, skip to the section on using HTCondor-CE.

Transforming and filtering jobs

If you need to modify or filter jobs, more information can be found in the Job Router Recipes document.

HELP NOTE
If you need to assign jobs to HTCondor accounting groups, refer to this section.

Configuring for multiple network interfaces

If you have multiple network interfaces with different hostnames, the HTCondor-CE daemons need to know which hostname and interface to use when communicating to each other. Set NETWORK_HOSTNAME and NETWORK_INTERFACE to the hostname and IP address of your public interface, respectively, in /etc/condor-ce/config.d/99-local.conf directory with the line:

NETWORK_HOSTNAME=condorce.example.com
NETWORK_INTERFACE=127.0.0.1

Replacing condorce.example.com text with your public interface’s hostname and 127.0.0.1 with your public interface’s IP address.

Limiting or disabling locally jobs running on the CE

If you want to limit or disable jobs running locally on your CE, you will need to configure HTCondor-CE's local and scheduler universes. Local and scheduler universes are HTCondor-CE’s analogue to GRAM’s managed fork: they allow jobs to be run on the CE itself, mainly for remote troubleshooting. Pilot jobs will not run as local/scheduler universe jobs so leaving them enabled does NOT turn your CE into another worker node.

The two universes are effectively the same (scheduler universe launches a starter process for each job), so we will be configuring them in unison.

  • To change the default limit on the number of locally run jobs (the current default is 20), add the following to /etc/condor-ce/config.d/99-local.conf:
    START_LOCAL_UNIVERSE = TotalLocalJobsRunning + TotalSchedulerJobsRunning < <job limit>
    START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE)
  • To only allow a specific user to start locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf:
    START_LOCAL_UNIVERSE = target.Owner =?= "<username>"
    START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE)
  • To disable locally run jobs, add the following to /etc/condor-ce/config.d/99-local.conf:
    START_LOCAL_UNIVERSE = False
    START_SCHEDULER_UNIVERSE = $(START_LOCAL_UNIVERSE)

HELP NOTE
RSV requires the ability to start local universe jobs so if you are using RSV, you need to allow local universe jobs from the rsv user.

Accounting with multiple CEs or local user jobs

HELP NOTE
For non-HTCondor batch systems only

If your site has multiple CEs or you have non-grid users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Use the following table to determine which file requires editing:

If your batch system is… Then edit the following file on your CE(s)…
LSF /etc/gratia/pbs-lsf/ProbeConfig
PBS /etc/gratia/pbs-lsf/ProbeConfig
SGE /etc/gratia/sge/ProbeConfig
SLURM /etc/gratia/slurm/ProbeConfig

Then edit the value of SuppressNoDNRecords so that it reads:

SuppressNoDNRecords="1"

HTCondor accounting groups

HELP NOTE
For HTCondor batch systems only

If you want to provide fairshare on a group basis, as opposed to a Unix user basis, you can use HTCondor accounting groups. They are independent of the Unix groups the user may already be in and are documented in the HTCondor manual. If you are using HTCondor accounting groups, you can map jobs from the CE into HTCondor accounting groups based on their UID, their DN, or their VOMS attributes.

  • To map DNs or VOMS attributes to an accounting group, add lines to /etc/osg/extattr_table.txt with the following form:
    SubjectOrAttribute GroupName
    The SubjectOrAttribute can be a Perl regular expression.
    cmsprio cms.other.prio
    cms\/Role=production cms.prod
    \/DC=com\/DC=DigiCert-Grid\/O=Open\ Science\ Grid\/OU=People\/CN=Brian\ Lin\ 1047 osg.test
    .* other
    

HELP NOTE
Entries in /etc/osg/uid_table.txt are honored over /etc/osg/extattr_table.txt if a job would match to lines in both files.

Install and run the HTCondor-CE View

The HTCondor-CE View is an optional web interface to the status of your CE. To run the View,

  1. Begin by installing the package htcondor-ce-view:
    [root@client ~]$ yum install htcondor-ce-view
  2. Next, uncomment the DAEMON_LIST configuration located at /etc/condor-ce/config.d/05-ce-view.conf:
    DAEMON_LIST = $(DAEMON_LIST), CEVIEW, GANGLIAD
  3. Restart the CE service:
    [root@client ~]$ service condor-ce restart
  4. Verify the service by entering your CE's hostname into your web browser

The website is served on port 80 by default. To change this default, edit the value of HTCONDORCE_VIEW_PORT in /etc/condor-ce/config.d/05-ce-view.conf.

Using HTCondor-CE

As a site administrator, there are a few ways in which you might use the HTCondor-CE:

  • Managing the HTCondor-CE and associated services
  • Using HTCondor-CE administrative tools to monitor and maintain the job gateway
  • Using HTCondor-CE user tools to test gateway operations

Managing HTCondor-CE and associated services

In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are:

Software Service name Notes
Fetch CRL fetch-crl-boot and fetch-crl-cron See CA documentation for more info
Gratia gratia-probes-cron Accounting software
Your batch system condor or pbs_server or …  
HTCondor-CE condor-ce  

Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root):

To … On EL 6, run the command… On EL 7, run the command…
Start a service service SERVICE-NAME start systemctl start SERVICE-NAME
Stop a service service SERVICE-NAME stop systemctl start SERVICE-NAME
Enable a service to start during boot chkconfig SERVICE-NAME on systemctl enable SERVICE-NAME
Disable a service from starting during boot chkconfig SERVICE-NAME off systemctl disable SERVICE-NAME

Using HTCondor-CE tools

Some of the HTCondor-CE administrative and user tools are documented in the HTCondor-CE troubleshooting guide.

Validating HTCondor-CE

There are different ways to make sure that your HTCondor-CE host is working well:

Troubleshooting HTCondor-CE

For information on how to troubleshoot your HTCondor-CE, please refer to the HTCondor-CE troubleshooting guide.

Registering the CE

To be part of the OSG Production Grid, your CE must be registered in the OSG Information Management System (OIM). To register your resource:

  1. Obtain, install, and verify your user certificate (which you may have done already)
  2. Register your site and CE in OIM

Getting Help

To get assistance, please use the this page.

Reference

Here are some other HTCondor-CE documents that might be helpful:

Configuration

The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories.

Location Comment
/usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates)
/etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf)

For a detailed order of the way configuration files are parsed, run the following command:

[user@client ~]$ condor_ce_config_val -config

Users

The following users are needed by HTCondor-CE at all sites:

User Comment
condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user.
gratia Runs the Gratia probes to collect accounting data

Certificates

Certificate User that owns certificate Path to certificate
Host certificate root /etc/grid-security/hostcert.pem
/etc/grid-security/hostkey.pem

Find instructions to request a host certificate here.

Networking

For more details on overall Firewall configuration, please see our Firewall documentation.

Service Name Protocol Port Number Inbound Outbound Comment
HTCondor-CE port tcp 9619 Y   Used to locate HTCondor-CE daemons
HTCondor-CE shared_port daemon port tcp 9620 Y   Only for HTCondor-CE < 8.3.2: Used for aggregating ephemeral ports used by HTCondor into a single network port

Allow inbound and outbound network connection to all internal site servers, such as GUMS and the batch system head-node only ephemeral outgoing ports are necessary.

Topic attachments
I Attachment Action Size Date Who Comment
pngpng condor-ce-condor-schematics.png manage 200.0 K 09 May 2014 - 20:52 MarianZvada HTCondor-CE with HTCondor pool
Topic revision: r145 - 18 May 2017 - 21:04:30 - BrianLin
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..