Submitting Jobs to an HTCondor-CE

About This Guide

This document outlines methods of manual submission to an HTCondor-CE. It is intended for site administrators wishing to verify the functionality of their HTCondor-CE installation and developers writing software to submit jobs to an HTCondor-CE (e.g., pilot jobs).

HELP NOTE
Most incoming jobs are pilots from factories and that manual submission does not reflect the standard method that jobs are submitted to OSG CE’s.

Submitting Jobs...

There are two main methods for submitting files to an HTCondor-CE: using the tools bundled with the htcondor-ce-client package and using the condor_submit command with a submit file. Both methods will test end-to-end job submission but the former method is simpler while the latter will walk you through writing your own submit file.

Before attempting to submit jobs, you will need to generate a proxy from a user certificate before running any jobs. To generate a proxy, run the following command on the host you plan on submitting from:

[user@client ~]$ voms-proxy-init

Using HTCondor-CE tools

There are two HTCondor-CE tools that allow users to test the functionality of their HTCondor-CE: condor_ce_trace and condor_ce_run. The former is the preferred tool as it provides useful feedback if failure occurs while the latter is simply an automated submission tool. These commands may be run from any host that has htcondor-ce-client installed, which you may wish to do if you are testing availability of your CE from an external source.

condor_ce_trace

condor_ce_trace is a Python script that uses HTCondor's Python bindings to run diagnostics, including job submission, against your HTCondor-CE. To submit a job with condor_ce_trace, run the following command:

[user@client ~]$ condor_ce_trace --debug condorce.example.com

Replacing the highlighted text with the hostname of the CE. On success, you will see Job status: Completed and the environment of the job on the worker node it landed on. If you do not get the expected output, refer to the troubleshooting guide.

Requesting resources

condor_ce_trace doesn't make any specific resource requests so its jobs are only given the default resources by the CE. To request specific resources (or other job attributes), you can specify the --attribute option on the command line:

[user@client ~]$ condor_ce_trace --debug --attribute='+resource1=value1'...--attribute='+resourceN=valueN' condorce.example.com

To submit a job that requests 4 cores, 4 GB of RAM, a wall clock time of 2 hours, and the 'osg' queue, run the following command:

[user@client ~]$ condor_ce_trace --debug --attribute='+xcount=4' --attribute='+maxMemory=4000' --attribute='+maxWallTime=120' --attribute='+remote_queue=osg' condorce.example.com

For a list of other attributes that can be set with the --attribute option, consult this section.

condor_ce_run

condor_ce_run is a Python script that calls condor_submit on a generated submit file and tracks its progress with condor_q. To submit a job with condor_ce_run, run the following command:

[user@client ~]$ condor_ce_run -r condorce.example.com:9619 /bin/env

Replacing the highlighted text with the hostname of the CE. The command will not return any output until it completes: When it does you will see the environment of the job on the worker noded it landed on. If you do not get the expected output, refer to the troubleshooting guide.

Using a submit file...

If you are familiar with HTCondor, submitting a job to an HTCondor-CE using a submit file follows the same procedure as submitting a job to an HTCondor batch system: Write a submit file and use condor_submit (or in one of our cases, condor_ce_submit) to submit the job. This is by virtue of the fact that HTCondor-CE is just a special configuration of HTCondor. The major differences occur in the specific attributes for the submit files outlined below.

From the CE host

This method uses condor_ce_submit to submit directly to an HTCondor-CE. The only reason we use condor_ce_submit in this case is to take advantage of the already running daemons on the CE host.

  1. Write a submit file, ce_test.sub:
    # Required for local HTCondor-CE submission
    universe = vanilla
    use_x509userproxy = true
    +Owner = undefined
    
    # Files
    executable = ce_test.sh
    output = ce_test.out
    error = ce_test.err
    log = ce_test.log
    
    # File transfer behavior
    ShouldTransferFiles = YES
    WhenToTransferOutput = ON_EXIT
    
    # Optional resource requests
    #+xcount = 4            # Request 4 cores
    #+maxMemory = 4000      # Request 4GB of RAM
    #+maxWallTime = 120     # Request 2 hrs of wall clock time
    #+remote_queue = "osg"  # Request the OSG queue
    
    # Run job once
    queue

    Replacing the highlighted text with the path to the executable you wish to run.

    1. You can use any executable you choose for the executable field. If you don't have one in mind, you may use the following example test script:
      #!/bin/bash
      
      date
      hostname
      env 
    2. Mark the test script as executable:
      [user@client ~]$ chmod +x ce_test.sh
  2. Submit the job:
    [user@client ~]$ condor_ce_submit ce_test.sub

From another host

For this method, you will need a functional HTCondor submit node. If you do not have one readily available, you can install the condor package from the OSG repository to get a simple submit node:

  1. Follow these instructions to install HTCondor
  2. Start the condor service:
    [root@client ~]$ service condor start

  1. Write a submit file, ce_test.sub:
    # Required for remote HTCondor-CE submission
    universe = grid 
    use_x509userproxy = true
    grid_resource = condor condorce.example.com condorce.example.com:9619
    
    # Files
    executable = ce_test.sh
    output = ce_test.out
    error = ce_test.err
    log = ce_test.log
    
    # File transfer behavior
    ShouldTransferFiles = YES
    WhenToTransferOutput = ON_EXIT
    
    # Optional resource requests
    #+xcount = 4            # Request 4 cores
    #+maxMemory = 4000      # Request 4GB of RAM
    #+maxWallTime = 120     # Request 2 hrs of wall clock time
    #+remote_queue = "osg"  # Request the OSG queue
    
    # Run job once
    queue

    Replacing the highlighted text with the path to the executable you wish to run and the red text with the hostname of the CE you wish to test.

    NOTE: the grid_resource line should start with condor and is not related to which batch system you are using.

    1. You can use any executable you choose for the executable field. If you don't have one in mind, you may use the following example test script:
      #!/bin/bash
      
      date
      hostname
      env 
    2. Mark the test script as executable:
      [user@client ~]$ chmod +x ce_test.sh
  2. Submit the job:
    [user@client ~]$ condor_submit ce_test.sub

Tracking job progress

When the job completes, stdout will be placed into ce_test.out, stderr will be placed into ce_test.err, and HTCondor logging information will be placed in ce_test.log. You can track job progress by looking at the condor queue by running the following command on the CE host:

[user@client ~]$ condor_ce_q

Using the following table to determine job status:

This value in the ST column... Means that the job is...
I idle
C complete
X being removed
H held
< transferring input
> transferring output

How Job Routes Affect Your Job

Upon successful submission of your job, the Job Router takes control of your job by matching it to routes and submitting a transformed job to your batch system.

Matching

First, the Job Router checks if your job matches any routes. It does this by checking the routes Requirements expression against the job and selecting the first match. If your job does not match any routes, the job will be put on hold and eventually removed from the CE queue without completing.

HELP NOTE
The JobRouter matches jobs to routes in a round-robin fashion. This means that if a job can match to multiple routes, it can be routed by any of them! So when writing job routes, make sure that they are exclusive to each other and that your jobs can only match to a single route.

Examples

The following three routes only perform filtering and submission of routed jobs to an HTCondor batch system. The only differences are in the types of jobs that they match:

  • Route 1: Matches jobs whose attribute foo is equal to bar.
  • Route 2: Matches jobs whose attribute foo is equal to baz.
  • Route 3: Matches jobs whose attribute foo is neither equal to bar nor baz.

HELP NOTE
Setting a custom attribute for submission requires the + prefix but it is unnecessary in the job routes.

JOB_ROUTER_ENTRIES = [ \
     TargetUniverse = 5; \
     name = "Route 1"; \
     Requirements = (TARGET.foo =?= "bar"); \
] \
[ \
     TargetUniverse = 5; \
     name = "Route 2"; \
     Requirements = (TARGET.foo =?= "baz"); \
] \
[ \
     TargetUniverse = 5; \
     name = "Route 3"; \
     Requirements = (TARGET.foo =!= "bar") && (TARGET.foo =!= "baz"); \
]

If a user could submitted their job with +foo=bar, the job would match Route 1.

Route defaults

Route defaults can be set for batch system queue, maximum memory, number of cores to request, and maximum walltime. The submitting user can override any of these by setting the corresponding attribute in their job.

Examples

The following route takes all incoming jobs and submits them to an HTCondor batch system requesting 1GB of memory.

JOB_ROUTER_ENTRIES = [ \
     TargetUniverse = 5; \
     name = "Route 1"; \
     set_default_maxMemory = 1000; \
] 

A user could submit their job with the attribute +maxMemory=2000 and that job would be submitted requesting 2GB memory instead of the default of 1GB.

Reference

Here are some other HTCondor-CE documents that might be helpful:

Job attributes

The following table is a reference of job attributes that can be included in HTCondor submit files and their GlobusRSL equivalents. A more comprehensive list of submit file attributes specific to HTCondor-CE can be found in the HTCondor manual.

HTCondor Attribute Globus RSL Summary
arguments arguments Arguments that will be provided to the executable for the job.
error stderr Path to the file on the client machine that stores stderr from the job.
executable executable Path to the file on the client machine that the job will execute.
input stdin Path to the file on the client machine that stores input to be piped into the stdin of the job.
+maxMemory maxMemory The amount of memory in MB that you wish to allocate to the job.
+maxWallTime maxWallTime The maximum walltime (in minutes) the job is allowed to run before it is removed.
output stdout Path to the file on the client machine that stores stdout from the job.
+remote_queue queue Assign job to the target queue in the scheduler. Note that the queue name should be in quotes.
transfer_input_files file_stage_in A comma-delimited list of all the files and directories to be transferred into the working directory for the job, before the job is started.
transfer_output_files transfer_output_files A comma-delimited list of all the files and directories to be transferred back to the client, after the job completes.
+xcount xcount The number of cores to allocate for the job.

If you are setting an attribute to a string value, make sure enclose the string in double-quotes ("), otherwise HTCondor-CE will try to find an attribute by that name.

Topic revision: r16 - 07 Dec 2016 - 22:54:27 - BrianLin
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..