Managed Fork


Included topic: Managed Fork


Introduction to Managed Fork

Managed Fork is an optional service which replaces the default fork jobmanager with Condor to manage incoming fork requests. Commands such as condor_q and condor_history can be used to see the actual command lines of the fork jobs during and after execution, providing an important logging capacity. More importantly, the number of fork jobs can be controlled with a configurable policy to help ensure that the CE is not overwhelmed by fork jobs. This is a very important consideration: the standard fork manager allows a user to accidentally or maliciously "fork bomb" a CE. As such, Managed Fork is highly recommended.

ALERT! IMPORTANT
The Managed Fork job manager does not schedule fork jobs onto compute nodes in the execution pool. Using a Condor local universe, the jobs are still scheduled to the CE headnode, but since the local universe is used they should run quickly and without delay unless a scheduling limit for the CE has been reached.

Install using Condor from the VDT

If you want to use Condor from the VDT, installation is simple:
# cd $VDT_LOCATION
# source $VDT_LOCATION/setup.sh
# pacman -get  ITB:ManagedFork
During the update you may be asked if you would like to run Condor. You will need to answer y to this because ManagedFork uses Condor to handle fork jobs on the CE.

Install using an existing Condor

Many site administrators have a pre-existing Condor (for the cluster's batch system, eg.) and wish to use it, rather than another copy of Condor, for Managed Fork jobmanger. The installation process is similar, but make sure you have VDTSETUP_CONDOR_LOCATION and VDTSETUP_CONDOR_CONFIG defined as discussed in PreparingComputeElement in order to specify your Condor installation. The installation is done in same directory as the OSG CE software. Further reading on this is available here. Setup these variables, then do:
# cd $VDT_LOCATION
# source $VDT_LOCATION/setup.sh
# pacman -get  ITB:ManagedFork

Enabling Managed Fork

To configure the default jobmanager to be the Managed Fork jobmanager, execute the following command.
# source $VDT_LOCATION/setup.sh
# $VDT_LOCATION/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y

By default, the Managed Fork jobmanager will behave just like the fork jobmanager. If you wish to restrict it, you need to modify your local Condor configuration. If you're using Condor from the VDT this can be done by editing $VDT_LOCATION/condor/local.<hostname>/condor_config.local.

Here are some configuration suggestions:

  • Only allow 20 local universe jobs to execute concurrently:
   START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20
  • Set a hard limit on most jobs, but always let grid monitor jobs run (strongly recommended):
   START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20 || GridMonitorJob =?= TRUE

Disabling Managed Fork

To put back in place the default fork jobmanager (i.e. to disable the Managed Fork), execute the following command:
# source $VDT_LOCATION/setup.sh
# $VDT_LOCATION/vdt/setup/configure_globus_gatekeeper --managed-fork n --server y

Further Details on Managed Fork

For more details on setup and configuration, refer to the VDT Managed Fork Jobmanager Release Notes.


Complete: 2
Responsible: StevenTimm - 25 Oct 2007
Reviewer - date: RobGardner - 03 Nov 2007

Topic revision: r30 - 16 May 2008 - 15:46:15 - StevenTimm
 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..