Troubleshooting Guide

About this Document

hand This document is for System Administrators and Grid Users. It supports the discovery of errors during job submission and offers possible solutions to resolve them.

Before you Begin

  • Make sure your Grid Proxy or VOMS Proxy is valid.
  • Make sure that you have an active Internet connection and can ping the site you are attempting to access.

HTCondor

HTCondor Issues

HTCondor Errors

HTCondor Hold Codes

  1. You can find the error code in the log file specified in the condor submit file. An error entry will look something like this:
            018 (035.000.000) 04/10 12:01:04 Globus job submission failed!
                Reason: 5 the executable does not exist
    
            012 (035.000.000) 04/10 12:01:04 Job was held.
                Globus error 5: the executable does not exist
                Code 2 Subcode 5
    
  2. To find all jobs that are being held by Condor use the condor_q command:
    [user@host ~]$ condor_q -hold
    -- Submitter: host.opensciencegrid.org : <IP:PORT> : host.opensciencegrid.org
     ID      OWNER           HELD_SINCE HOLD_REASON                   
    4989741.0   user          5/23 17:08 Globus error 47: the gatekeeper failed to r
    ...
    
    Take note of the ID = 4989741.0 which can be used to retrieve more details on the problem:
    [user@host ~]$ condor_q -l 4989741.0 | grep HoldReason
    LastHoldReason = "Globus error 47: the gatekeeper failed to run the job manager"
    LastHoldReasonCode = 2
    LastHoldReasonSubCode = 47
    HoldReason = "Globus error 47: the gatekeeper failed to run the job manager"
    HoldReasonCode = 2
    HoldReasonSubCode = 47
    
    In the example above the major error code 2 refers to a Globus problem on the remote site. The minor error code 47 means that the gatekeeper failed to run the job manager. For a list of Globus Errors, see Globus Errors. For a list of Condor hold codes by reason, see Condor Errors.

HTCondor Log Files

Further troubleshooting can be done inspecting the HTCondor log files of the various daemons (use condor_config_val LOG to find the location). To increase the logging level you set the level of the specific daemon in the configuration file, e.g. SCHEDD_DEBUG=D_FULLDEBUG

Globus

Globus Issues

Globus Errors

Globus Error Codes

Other Globus Issues:

References

Topic revision: r18 - 06 Dec 2016 - 18:12:45 - KyleGross
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..