Job Submission Comparison

A Comparison of Job Submission Methods on the Open Science Grid

Intended Audience

This document is aimed at Virtual Organizations (VOs) and advanced users interested in understanding the most widely used and available mechanisms for submitting jobs to the OSG. It is meant to be used in conjunction with advice from OSG experts for help in guiding deployment decisions.

Introduction

Grid computing on the OSG utilizes computational resources that are distributed over many independent sites with only a thin layer of Grid middleware shared between them. This deployment model enables the resource providers to be autonomous, especially with regards to operations. Resource providers operate their local distributed resources according to local preferences and expertise. These resources are often integrated with non-Grid resources at the local institution.

However, the Grid deployment model introduces several problems for the users of the system, three major problems being:

  • The complexity of job scheduling in the Grid environment
  • The non-uniformity of compute resources
  • No easy way to accommodate end-to-end job monitoring

When providing job submission services to its customers and constituents, VOs have a variety of technology choices it can make to solve these issues depending on the types and quantities of the jobs managed. To help OSG better understand the tradeoffs in technology, and therefore be able to better advise its VOs, this paper is designed to compare and contrast the three main technologies that are used for job routing and distribution on the OSG: the Engage OSG Matchmaker, Glide-in WMS, and PANDA. (Additional job submission tools that have been used to a lesser degree on the OSG include Gridway, Pegasus and Swift.) Since each of these services relies on an underlying information architecture we include a brief section on the Information services that OSG offers.

A Brief Overview of OSG Information Sources

A reliable information layer including resource names, availabilities, capabilities, loads, and access mechanisms is an essential component of any Grid infrastructure. This information is normally used by application developers in order to find resources for their jobs. To create and manage this information in a systematic way, a variety of tools have evolved for use on the OSG. Tools that produce information include: The OSG Information Management system (OIM) and Generic Information Providers (GIP). To view the information, one can use MyOSG, Berkley Database Information Index (BDII) systems; and the Resource Selection Service (ReSS) systems.

Every site and resource that is considered part of the OSG is registered in the OIM system (https://oim.grid.iu.edu/oim/home). This database is maintained at the Grid Operation Center (GOC) and stores basic (mostly static) information about sites and resources such as names, descriptions, and contact information.

Dynamic information, such as availabilities, capabilities, and loads, is provided by localized GIP that run at each site and maintain a local information repository. There is one GIP running each CE at the site. Data from the GIP is dynamically aggregated via CEMon clients at each CE, which then push information to a centralized CEMon collector run by OSG Operations. This data is pushed out simultaneously to several different places: a BDII running at the GOC, an international BDII running at CERN, and a ReSS instance running at FNAL. The GIP maintain the site information according to the GLUE schema - an standard for describing grid sites (an outgrowth of a collaborative effort between European and US Grid projects).

The GOC BDII provides a centralized point of access that job schedulers can query in order to intelligently distribute jobs across the OSG. It is deemed a “critical” service that is maintained in a tightly controlled environment by the GOC. The international top level BDII aggregates data from EGEE together with data on US LHC sites for international and intra grid submissions. This service is managed centrally for the LHC and operated at CERN.

In addition to populating the BDII, a highly available ReSS instance is hosted at FNAL. This stores largely the same information as the BDII but presents the data in the form of a Condor ClassAd. This provides a Condor-friendly mechanism for matching jobs with resources across the OSG.

Whether accessing this data via ReSS or BDII, grid users can easily access the resources' published information and use it manually or in their automated job management system.

Job Submission Methods

This section provides a high level architecture and design philosophy for the Trash/Engagement OSG Matchmaker (OSGMM), glideinWMS and PanDA job submission and management technologies.

Engage OSG Matchmaker

High Level Architecture and Design Consideration

The OSG MatchMaker (OSGMM) is a tool that was created to give small to medium sized VOs a straightforward yet powerful submit interface to OSG. It is designed to retrieve VO specific site information from ReSS and also to verify and maintain the sites by regularly submitting verification/maintenance jobs to make sure a site can continue receiving jobs. Site information from OSGMM is inserted into a local Condor install so that Condor can match jobs with the resources. OSGMM then monitors jobs in the system to maintain pending/running counts and a moving window of job success/failure rates for each site. This is used to back off from sites if they break or become busy with the resource owners' jobs. Site status information can be queried and imported by other systems such as Pegasus and Swift.

OSGMM-Overview.png

Usage Description

OSG MatchMaker provides/maintains site information in a Condor-G system. In order to run jobs, the user must write a Condor job description. If they are not familiar with Condor submission systems, as is the normal case, the Trash/Engagement staff will work one on one with users to help. Often these jobs are based on previously written examples that incorporate best practices. The Condor job description contains a requirements line which enables the user to be very specific about where the jobs should end up. Possible requirements include but are not limited to OS, architecture, network setup, software and the availability of certain datasets. Also, job submission scripts need to be tweaked for each site due to different ways that sites are setup. In so doing, users gradually begin to understand and work with the complexities of the grid.

OSGMM does not specify a mechanism for data transfers. Data transfer management logic for example is often written into a job wrapper script. Most of the time this script is composed of simple globus-url-copy commands to retrieve inputs and push back outputs back to the submit host. This has an important advantage of being able to easily integrate with whatever transfer mechanisms users are most comfortable with at their site including GridFTP, SRM or even plain FTP.

The job submission side is also flexible and most users end up with using Condor DAGMan to create a workflow of their jobs. DAGMan also provides some important mechanisms such as job retries and pre/post script for the jobs. Because the OSGMM system ends up using Condor-G to submit and manage the jobs, there is a certain overhead for each job (combination of Condor-G/GRAM/LRM overheads) so short jobs (< 1 hour wall time) are not recommended.

Implementation, Deployment, and Management Description

OSGMM was designed to be easy to deploy and support at the VO/university/lab submit level. It is shipped with VDT and can be installed with the OSG client software stack with just one additional pacman command. OSGMM should be installed using the VDT method for a shared install which means root has to do the installing. This method takes care of setting up startup scripts for services and privilege separation as Condor and OSGMM will be configured to run as their own non-privileged users. OSGMM requires about 1 GB RAM (used to track jobs in the system). One instance will serve all users on that submit host, using Condor's fair-share algorithm to handle jobs from multiple users.

Each VO can have a central OSGMM instance (polling ReSS for information) to run verification/maintenance jobs and then have other instances of OSGMM (polling the VO OSGMM instance for information including site status) running on the submit hosts. General verification and maintenance jobs ship with OSGMM, but each VO can augment the jobs with their own tests and for example software installations. This provides a solution for the VO to push out software to sites, and then advertise availability of the software back to the system. OSGMM scales up to managing about 5,000 Condor-G jobs per submit host. For VOs that prefer not to host their own job submission system, however, the Engage team hosts a fully functional submit host that VOs can utilize.

Some OSGMM Considerations

The OSGMM system is readily adaptable to submit jobs to all available sites on the OSG since in general there are few site requirements needed to use it (e.g. worker nodes are not required to have external network connections). Jobs are sent to distributed systems and then managed by the remote batch queuing mechanisms in place at each site.

To the user of the OSGMM system however, the heterogeneity of site configurations (e.g. gateway configurations, environment configurations, ...) can cause job failures until certain site specific configurations are incorporated into the job submission scripts. Hence users of this system sometimes need to know more about the internal workings of the sites than they prefer to. By the time that jobs actually get to the remote site, conditions may have changed and one can experience longer than expected waiting for the remote queue to schedule the job for example. Also, if a site is not configured correctly, even correctly configured jobs can fail with unexpected results that can be difficult to debug. Nevertheless, OSGMM has proven to be a powerful tool for job submission on the OSG and has contributed to the success of many users, especially those that have utilized the Trash/Engagement VO.

"Pilot" based submission methods described below offers an alternative approach to job submission. In general, they take more effort to set up and run but add some important capabilities as will be described below.

Documentation and Further Reading

Glide-in WMS

High Level Architecture and Design Consideration

The Glidein Workload Management System (glideinWMS) utilizes a centralized pilot based submission system called a pilot or Glidein Factory (GF). A "pilot" job is a special type of job that lands on a distributed resource and then pulls in the users job to start executing it. This approach, often called the PULL model, has a few advantages over the standard, or PUSH model, implemented by the OSG MatchMaker:

  • The matchmaking of jobs to sites is much easier. Instead of trying to predict where the job will start first, pilots are sent to all the Grid sites that are supposed to be able to run the job. The first pilot that starts gets the job. The remaining glideins will either start another job that can run there (even if they were not submitted for the purpose of serving that job), or terminate within a short period of time. As long as there is a significant amount of jobs in the queue, only a small fraction of pilots will terminate without performing any useful work.
  • The pilot can validate the node before pulling a user job; as a consequence correctly configured users jobs are less likely to fail. Pilot jobs can of course fail to start with the same frequency as OSG MatchMaker submitted jobs, but these failures are seen only by the Glidein Factory that submitted the pilot job and the user does not "experience" this type of failure.
  • Due to the centralized nature of the system, a pilot system can handle user priorities uniformly across the Grid. In the OSG MatchMaker model, each Grid site maintains the priorities of the users, based on the policies and historical usage data specific to that site.
  • Since the pilot runs alongside the user job, it can provide additional services, like pseudo-interactive monitoring.

This approach has of course also disadvantages:

  • It absolutely requires networking between the worker nodes and the pilot infrastructure. The OSG MatchMaker by itself does not have this requirement.
  • It requires a much heavier investment in hardware, so it must manage all the job information in greater detail.
  • In order to get proper security, the pilot must be able to switch identity once the user job is pulled. This is not part of the original Grid security model; recently a Grid tool, namely gLExec has been created to provide this capability. Currently only a subset of Grid sites have deployed gLExec. If gLExec is not deployed at the site, glideinWMS can still run jobs on the site, however, there will be no identity switching and user job(s) will run under the same identity as the pilot.

The glideinWMS system was developed on top of the Condor system. Most of its functionality comes from Condor itself, with just a thin layer on top of it. This approach was chosen to minimize the development and maintenance cost of the product, since Condor already provided a very powerful platform on which to build on. Moreover, this allows users with previous Condor know-how an easy path to the Grid world. Users with existing Condor-ready jobs can almost transparently submit these to the Grid via glideinWMS.

The glideinWMS-specific services are composed of a frontend and a glidein factory (GF). The glidein factory is responsible to know which Grid sites are available and what are the site attributes, and to advertise these information as "entry-point ClassAds?" to a dedicated collector (known as the WMS pool). The frontend instead plays the role of a matchmaker, internally matching user jobs to the entry-point ClassAds?, and then requesting the needed amount of glideins to the factory. Finally, the factory will submit the pilots, also known as glideins, that will start the Condor daemons responsible for resource handling.

The picture below shows the architecture from a schematic point of view:
glideinWMS at a glance

Finally, GSI, utilizing X509 proxies, was chosen as the security mechanism. All communication between the various processes is authenticated via GSI and can be encrypted if desired.

Usage Description

From the end user point of view, glideinWMS is just a distributed Condor system. As with the OSG MatchMaker?, the user must write a Condor job description, although for the vanilla universe, in order to run jobs. Most of the VOs currently using gldieinWMS have portals to insulate the final users from the details of the underlying batch system thereby simplifying the user submission model even further.

Glideins advertise a set of attributes that enable users to match jobs to resources. The matching algorithm can be as complex or as simple as desired; for example, CMS matches just on the site name.

Finally, Condor DAGMan can be used to handle workflows just like with the OSG MatchMaker?. Jobs of a few minutes to several hours are supported; a single Condor schedd can easily handle 5-10Hz job turnaround rate. Very long jobs may be a problem due to preemption, like with the OSG MatchMaker?, but Condor will restart the failed jobs so users usually don't see any failures.

GlideinWMS? does not specify a mechanism for data transfers; i.e. it supports only what Condor supports. For modest input and output sizes, the standard Condor file transfer works fine (and recent versions of Condor support, possibly cached, HTTP transfer as well), but for larger file transfers users are left on their own, although many VOs have their own data handling solutions.

From the VO point of view, glideinWMS requires the installation of a Condor central manager and a submit node, and the installation of the glideinWMS-specific daemons.

The Condor central manager and submit node installation is very close to a standard Condor installation, with just small configuration changes; the glideinWMS installer can fully automate this, if desired.

The glideinWMS-specific services are composed of two parts; a factory and a frontend. The VO must install its own frontend, while the factory can be shared among several VOs. Although larger VOs are encouraged to host their own factory to attain maximum flexibility, UCSD currently hosts a factory that is open (upon request) to smaller OSG VOs. So factory installation will not be discussed here, as it is intended for advanced VO admins.

The frontend configuration entails installing a Web server, creating the proper glidein matching configuration and starting the frontend daemons. The main elements to be configured are the list of Condor daemon DNs, the credential(s) used for glidein submission, the VO specific validation and data gathering scripts, and finally the matchmaking expression. The last one is needed because, as described above, the frontend is the glideinWMS matchmaker, and currently does not have a matching logic powerful enough to do true matchmaking; so help from the VO administrators is needed.

The attributes published for the matchmaking at the Condor level are gathered as a mix of static attributes provided by the factory (usually extracted from the information system, a set of scripts provided by the factory and a set of scripts provided by the frontend. The reasoning behind this is that the factory can tailor the scripts toward the type of resources served, while the frontend extracts, and publishes, VO specific information. There is no prescribed naming schema for the attributes, although nothing prevents the factory and/or VO administrators to adopt one (like GLUE).

The factory and the frontend also can provide validation scripts, so user jobs never land on failing glide-in jobs; instead, only the glideins fail.

Implementation, Deployment, and Management Description

The VO Frontend periodically queries the user pool to check the status of user job queues. Based on the job queues, it instructs the glidein factory (GF) to submit glideins to run on Grid resources. A glidein is a properly configured Condor startd submitted as a Grid job. GlideinWMS? extensively uses the Condor classad mechanism as a means of communication between its services. Once a glidein starts on a worker node, it will join the user Condor pool, creating the obtained Grid-batch slot as a new slot in the Condor pool. At this point, a regular Condor job can start there and everything "appears" as if it is a dedicated resource. Condor itself handles the interaction with gLExec; it is only a matter of proper configuration.

glideinWMS does not provide a GUI interface for the users to submit their jobs, instead users use Condor client tools like condor_submit to submit their jobs. As the user interfaces with the Condor batch system, he/she is shielded from the problems mentioned above that are introduced by the Grid deployment. Based on the time slot allocated for glidein to run, this glidein created batch slot can run multiple user jobs before relinquishing its claim over the grid resource. In this mechanism glideinWMS can also be viewed as supporting resource provisioning. This mechanism is quite effective in running short jobs using glideinWMS. Since a single glidein can run multiple short jobs, the effective wait time/overhead of running several short jobs this way is equivalent to running a single grid job. The resource provisioning feature can also be used to provide a guaranteed time slot for long running jobs. glideinWMS also provides users with the capability of selecting resources based on resource characteristics, like, hardware type, OS, packages installed on the worker node and any custom information that is required for the user jobs to run. glideinWMS can be configured to run pluggable scripts as the glidein bootstraps. This is useful in advertising desired resource information. Thus the combination of glidein’s bootstrapping scripts and pluggable scripts can identify potential problems with the resource before the user job could start on the resource. This increases glideinWMS resiliency to bad CE/resources and the environments on the resources.

glideinWMS does provide a set of Web-based monitoring tools for both the glidein factory and the VO frontend. This allows the factory and VO frontend administrators to easily monitor the system and discover eventual problems.

Monitoring of the Condor pool can be achieved through standard Condor monitoring tools, varying from command line tools (like condor_q and condor_status), CondorView?, and up to commercial tools like the ones developed and sold by Cycle Computing.

Some Glide-in WMS Considerations

glideinWMS is used in OSG and LCG by several VOs like CMS, CDF, DZero, IceCUBE and GPN and by experimental High Energy physics groups like Minos, Minerva and NOVA. glideinWMS supports several workflows supported by Condor from a simple job submission to complex DAGs. During the factory configuration step, glideinWMS can query the information systems like ReSS and BDII to get the list of available grid resources. A system administrator can either accept all the sites or be more selective about the sites listed. glideinWMS will populate it's site list based on the selection made by the administrator and information about the sites listed in BDII or ReSS. The current version of glideinWMS does not have any support for data management by the glideinWMS system itself. Most of the VOs however that use this technology have their own data management system which can be easily used by the user jobs run via glideinWMS.

Although, all the glideinWMS components can be collocated on a single host for smaller use case, it is recommend that several of the services be run on different systems. following services per host for larger use case -

  • glidein factory collocated with WMS Condor Pool
  • User Condor Pool Collector
  • User Schedd
  • VO Frontend

Table below shows the scalability figures for the glideinWMS system tested so far.

Criteria Design goal Achieved so far
Total number of user jobs in the queue at a given time 100k 200k
Number of glideins in the system at any given time 10k ~26k
Number of running jobs per schedd at any given time 10k ~23k
Grid sites handled ~100 ~100

Most of the glideinWMS services do not need to be run with root privileges. HTTPD service on the VO frontend and Glidein factory node is typically installed and run as root user. Also, for security reasons, user Schedd(s) for non portal installation should run as root. Since there are several services to be installed and configured, the deployment model needs to be thought out based on the use case. Also, an administrator installing the services needs to be familiar with Condor Batch System and basic GSI concepts. If the glidein factory is being installed, knowledge of the Grid (OSG, EGEE, Nordugrid) is also needed.

It should be noted that UCSD is hosting a glidein factory that smaller VOs can use to start using the glideinWMS without having any knowledge of the Grid.

Although currently glideinWMS can be challenging to install, the glideinWMS team is actively working to both improve the documentation and to make the installation process as easy as possible.

Documentation and Further Reading

PanDA

High Level Architecture and Design Considerations

The Production and Distributed Analysis system (PanDA) is a pilot based job submission system. PanDA uses its pilot generator to submit and manage pilot jobs, which communicate with central service whose function is to perform brokerage and direct work load to sites. Unlike glideinWMS, PanDA pilots are not coupled to Condor software and are in principle arbitrary code, meant to communicate with the server via HTTPS. Principal features of its design were initially driven by operational requirements of Atlas experiment at LHC, however most of these are directly applicable to general OSG use. Some of the key design criteria were as follows:

  • The ability to use pilot jobs for the acquisition of processing resources and all the attendant advantages that pilots provide. Workload jobs are assigned to successfully activated and validated pilots based on PanDA-managed brokerage criteria. This 'late binding' of workload jobs to processing slots prevents latencies and failure modes in slot acquisition from impacting the jobs, and maximizes the flexibility of job allocation to resources based on the dynamic status of processing facilities and job priorities. The pilot is also a principal 'insulation layer' for PanDA, encapsulating the complex heterogeneous environments and interfaces of the grids and facilities on which Panda operates.
  • The client interface must allow easy integration with diverse front ends for job submission to PanDA.
  • A system-wide site/queue information database for recording static and dynamic information used throughout PanDA to configure and control system behavior from the 'cloud' (region) level down to the individual queue level. It is used by pilots to configure themselves appropriately for the queue they land on; by PanDA brokerage for decisions based on cloud and site attributes and status; and by the pilot scheduler to configure pilot job submission appropriately for the target queue.
  • Provide a coherent and comprehensible system view for users, and for PanDA's own job brokerage system, through a system-wide job database that records comprehensive static and dynamic information on all jobs in the system. To users and to PanDA itself, the job database appears essentially as a single attribute-rich queue feeding a worldwide processing resource.
  • Easy integration of local resources. The minimum site requirements are a grid computing element or local batch queue to receive pilots, outbound http support, and remote data copy support using grid data movement tools.

In contrast to glideinWMS that uses a Condor based WMS, PanDA created its own centralized relational database and brokerage system. The centralized database keeps detailed info about every pilot and every job status (with logs and error messages) and can be queried using standard RDP techniques. PanDA also has a user interface to that database (the monitor) that facilitates workload management, debugging and day-to-day operations.

Additional considerations are as follows:

  • A single workload management system to handle both ongoing managed production and individual users activity (such as analysis), so as to benefit from a common infrastructure and to allow all participants to leverage common operations support.
  • A coherent, homogeneous processing system layered over diverse and heterogeneous processing resources. This helps insulate production operators and analysis users from the complexity of the underlying processing infrastructure. It also maximizes the amount of code in the system that is independent of the underlying middleware and facilities
  • Security based on standard grid security mechanisms. Authentication and authorization is based on X.509 grid certificates, with the job submitter required to hold a grid proxy and VOMS role that is authorized for PanDA usage. User identity (DN) is recorded at the job level and used to track and control usage in PanDA monitoring, accounting and brokerage systems. The user proxy itself can optionally be recorded in MyProxy for use by pilots processing the user job, when pilot identity switching/logging (via gLExec) is in use.
  • Support for usage regulation at user and group levels based on quota allocations, job priorities, usage history, and user-level rights and restrictions.
  • A comprehensive monitoring system
    • detailed drill-down into job, site and data management information for problem diagnostics
    • Web-based interface for end-users and operators
    • usage and quota accounting
    • performance monitoring of Panda subsystems and the computing facilities being utilized.

Documentation

Usage Description

Jobs are submitted to PanDA via a command-line client interface by which users define job sets, and their associated data. Job descriptions are transmitted to the PanDA server via secure http (authenticated via a grid certificate proxy), with submission information returned to the client. There are several versions of that interface tailored towards the needs of different groups of PanDA users (e.g. Atlas users and generic users).The PanDA server receives job descriptions from these clients and places them into a global job queue, upon which a brokerage module operates to prioritize and assign work on the basis of job type, priority, input data and its locality, available CPU resources and other brokerage criteria.

An independent subsystem (called pilot generator or pilot scheduler) manages the delivery of pilot jobs to worker nodes via a number of scheduling systems. (Note: this feature is very different from glideinWMS where the Condor client itself actively manages the jobs.) A pilot once launched on a worker node contacts the dispatcher (residing on the central server) and receives an available job appropriate to the site. If no appropriate job is available, the pilot may immediately exit or may pause and ask again later, depending on its configuration (standard behavior is for it to exit). If, however, a job is available, the pilot obtains a payload job description, whose principal component is the URL from which the payload script needs to be downloaded. In general, payload scripts are hosted on a separate web server that is separate from the PanDA system proper. This allows users to exercise necessary level of control over the payloads, and reduce the load on the PanDA server, while still having standard security mechanisms in place.

Importantly, the pilot based mechanism helps to minimize both latencies and errors between job submission and launch. The pilot dispatch mechanism bypasses latencies in the scheduling system for submitting and launching the pilot itself. The pilot job mechanism in-turn isolates workload jobs from grid and batch system failure modes (a workload job is assigned only after the pilot successfully launches on a worker node). The pilot also isolates the PanDA system proper from grid heterogeneities, which are encapsulated in the pilot, so that at the PanDA level the grids utilized by PanDA appear homogeneous. Pilots generally carry a generic 'production' grid proxy, with an additional VOMS attribute 'pilot' indicating a pilot job. Optionally, pilots may use glexec to switch their identity on the worker node to that of the job submitter.

The overall PanDA architecture is shown below

panda-arch.jpg

To better illustrate how PanDA is used, let us enumerate its principal components with which the end-user or agent interacts most often:

  • PanDA server
  • PanDA monitor
  • Pilot generator (aka pilot scheduler)

The first two components are centrally installed, managed and maintained. The pilot scheduler can be located anywhere and run either centrally, or by VOs (e.g. in case of CHARMM project).

Assuming that these components are in place and running, a typical usage scenario unfolds as follows:

  • The user submits a job using a command line client provided by PanDA
  • The job is registered on the server and becomes visible to the user in the monitor (the Web portal)
  • Pilots submitted to sites defined by the user are submitted and run without user intervention
  • PanDA brokerage mechanism picks a pilot according to pre-defined criteria, and communicates information to the pilot, which is sufficient to obtain the actual workload (an example of brokerage criteria may be the location of particular data on a site)
  • The pilot downloads and executes the payload; throughout the process the status of the pilot and the job is reflected in the monitor
  • Upon job completion, the standard output, standard error and log files become available to the user to inspect on Web pages served by the monitor

Implementation, Deployment, and Management Description

PanDA in its core parts (server and monitor) is using industry-standard, well understood and tested components such as Apache server software and SSL-based encryption. The Apache Web server is equipped with modpython and modgridsite. PanDA is using a RDBMS as its backend, and has in fact been successfully deployed using both MySQL and ORACLE RDBMS, as dictated by deployment requirements. As commented above the server and monitor components, both implemented as Web services, are centrally managed. As such, they are subject to security and access control protocols of the site where they are deployed (same applies to RDBMS). As a result, the amount of software that needs to be download and installed by users not employing specific project frameworks such as ATLAS is small (two Python scripts).

The pilot scheduler can be deployed on virtually any system capable of job submission via Condor-G (which in PanDA serves as a way to generalize submission to a variety of local schedulers, such as Condor, pbs, sge etc.). Whether it resides on one of the centrally managed Panda computers, or on a system managed by the VO, depends on the agreement between PanDA support team and the VO. There are examples of either arrangement. Typically, when a VO is starting out with job submission to PanDA, the support team assists them in setting up pilot submission and testing out new queues and site setup. However, once they are satisfied that these components operate properly, the VO takes over the responsibility for pilot submission and hosting of the pilot scheduler. To take full advantage of the PanDA monitoring mechanisms, the pilot submission host must also have an instance of a Web server running, in order to serve the content of log files. Configuration of this server is extremely straightforward since it only calls for serving static content from the local disk.

High Level Functionality Comparison Table

Submission Capabilities and Requirements Engage OSGMM Glide-in WMS PANDA
Job Distribution Mechanism Condor-G Pilot Based Pilot Based
Is the service centrally hosted? Each VO must install at least one instance of OSGMM The Pilot Factory is centrally hosted, VOs must each install their own Job submission infrastructure to submit jobs to the Pilot Factory. VOs can also install their own Pilot Factory if desired PanDA server, monitor and their databases are centrally hosted. Pilot submission is a light weight process that is hosted by the VO. VOs must each install their instance of Job submission scripts
Difficulty level for new VOs to install and setup the required job submission infrastructure Easy* -- components are available through the VDT Moderate/Difficult*, active work on making it easier, new VOs currently require handholding to get started Moderate*, VOs have a minimal number of frontend services to set up and maintain but handholding from developers is still needed.
Difficulty level for new users to get their applications running Moderate*, users must embed their jobs in Condor wrappers. Data movement can be tricky to manage; Condor provides some built-in data handling capabilities. Easy*, submission is based on a command line infrastructure. Data management can be tricky since it requires management of GridFTP? staging.
Is the setup and operation of the system well documented in terms of "thoroughness" and "ease-of-use" for end users (1 to 5, 5 being best)? Thoroughness (3) Ease-of-use (5) Thoroughness (4) Ease-of-use (3) Thoroughness (4) Ease-of-Use (3)
Support for complex workflows Yes, based on Condor DAGMan Yes, based on Condor DAGMan Complex workflows must be implemented manually
Support for Job prioritization Based on Condor. Can only prioritize jobs per submit node - not for the whole VO Full featured, based on Condor infrastructure Basic prioritization in the brokerage module, gradual throttling of user job priorities depending on submission volume
Error handling of ill configured sites, CEs, or environments Regular test/maintenance jobs are sent to make sure sites are up and running. Retries are automatically enabled to minimize end user error messages User jobs don't start unless a Pilot job is already running at the site, errors are stored in logfiles for manual inspection.
Information systems used to schedule jobs ReSS (queried every few minutes) One time use of BDII/OIM to install sites; localized infrastructures keep track of dynamic data based on pilot submissions
User information systems Command line tools Command line tools; limited monitoring capabilities for end users Portal Interface Viewer provides a good interactive user view of the entire system
VOs that have used or are using this service in production Engagement, SBGrid, GPN CMS, DZero, CDF, IceCUBE, GPN, Minos, Minerva, NOVA Atlas, CHARMM
Site requirements for job submission (other than a CE and basic correct configuration) None Outbound network connections on each worker node if Condor CCB is installed; otherwise requires bi-directional network Outgoing connectivity on each worker node
Typical number of systems that a VO needs to set up in order to manage 5000 simultaneous jobs One dual core, 8GB system 1-2 systems, 8GB memory total One system, very lightweight job submission requirements
Is Root typically required for VOs to set up the submit hosts? Yes Yes No
*In each case assistance is available for VOs to set up the required services.

Summary and Conclusions

The OSG MatchMaker implements a proven approach for submitting jobs to the Grid. It is a direct extension of the familiar model of submitting jobs to a local batch scheduler. It is straightforward for many users to comprehend. Further, it is relatively easy to setup as the package can be deployed from the OSG VDT. In addition to basic job management capabilities, it builds-in sophisticated tests of remote sites and can run "preparation" jobs that set up sites with needed software packages (increasing the probability of success for user jobs). The OSG Matchmaker offers many benefits, especially for new and small VOs, including the rerouting of jobs to different sites when they fail to start after a specified amount of time. VOs can also choose to bypass setting up and operating their own OSGMM and submit jobs via the Trash/Engagement VO OSGMM system that is managed by the Trash/Engagement team.

Many of the larger VOs in OSG use an alternative job submission mode: pilot jobs. This model increases the probability user job success as well as offer a more "real time" framework for submitting jobs. Pilot based systems hide many grid-related failures from the users. However, each system still has components which require an expert grid operator to run and debug - the glideinWMS has the pilot factory and PanDA has the server and pilot factory. The key here is the grid-related failures are shifted from user to operator. For glideinWMS, OSG VOs have the opportunity to use a centralized pilot factory run by OSG Operations. For PanDA, OSG VOs can work with ATLAS to have the PanDA server operated on their behalf. Hence, it is possible to get started with a pilot system by installing the client side job submission frameworks. Currently, the client side systems are non-trivial and require assistance from the software developers to get running and working properly. The software developers are more than happy to help new VOs get started with these technologies.

The glideinWMS and PanDA models are quite different in approach although both are pilot based. GlideinWMS was built on top of the Condor infrastructure, which has been rigorously time tested in both scalability and reliability for remote systems. One a pilot is running and ready to accpet a user job, it appears to the user just like any local Condor batch slot. Being Condor based also means that glideinWMS can readily accept non-trivial workflows created around Condor DAGs. Condor provides an inherent ability for managing and prioritizing jobs that are being submitted. Despite being moderately difficult to setup initially, it is the most widely used job submission infrastructure on the OSG because of these capabilities.

The PanDA system was designed using standard web service and database components. All job management is handled by the central server so the client installation is minimal, although the VO must learn to submit and manage their own pilot jobs. The centralized database of all PanDA jobs running on the system can be easily queried using standard RDP techniques to see the status and location of all jobs on the server. The PanDA system has been thoroughly tested on Atlas production workloads and is also running CHARMM VO production jobs. While not as widely used as glideinWMS, it has promising features, especially for VOs that use very simple workflows and do not need job submission priority management.


Backlinks


Twiki topics in Documentation web containing an "INCLUDE" of this page:
Section Topic Last Updated by
Number of topics: 0

Twiki topics in all others webs containing an "INCLUDE" of this page:

Section Topic Last Updated by

All references to this document in the Documentation web only
All references to this document in all webs

Child Topics

Immediate children of this topic include the following:

    Major Updates

    -- DanFraser - 14 Jan 2010

    Topic attachments
    I Attachment Action Size Date Who Comment
    pngpng OSGMM-Overview.png manage 130.8 K 05 Mar 2010 - 16:18 MatsRynge  
    pngpng glideinWMS_at_a_glance_medium.png manage 45.1 K 03 Feb 2010 - 20:57 ParagMhashilkar glideinWMS Architecture and use case
    Topic revision: r47 - 06 Dec 2016 - 18:12:38 - KyleGross
    Hello, TWikiGuest!
    Register

     
    TWIKI.NET

    TWiki | Report Bugs | Privacy Policy

    This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..