Campus Grid Technology
This document describes the technology being developed for a campus grid.
The campus grid prototype will use Condor to Flock
between on campus resources. Authentication and Authorization will be handled through non-certificate Condor security mechanisms. Flocking between non-Condor resources will require a glidein mechanism. The glidein mechanism will be handled by a simple glidein factory.
The Goals of the Technology are:
- Transparent - Execution on different clusters will be transparent to the user.
- Simple - Service operation will be easily understandable and well documented. The prototype should have a very simple basic case, but be able to grow with options.
- Maintainable - Service will have a simple configuration and well packaged.
Model 1: Only Campus computing
In this model clusters on campus would flock to each other. The campus grid can be thought of as a mini-OSG. Each cluster can decide what other clusters and users to flock from. Granularity of security can be determined by the cluster (each user gets a uid? each department?). We must be careful not to list too many options, they can easily overwhelm.
In the example, the Chemistry and Engineering clusters flock to the Physics cluster. And the Physics cluster can flock back to the Chemistry and Engineering clusters. But, the Engineering and Chemistry clusters do not flock to each other. The Chemistry and Engineering clusters could have small or no administration staff, while the Physics cluster could have paid staff. In this case, flocking could even be one way, from the chemistry or engineering clusters to the physics cluster.
Model 2: Campus with GlideinWMS
This model is a extension of model 1. In this situation, the clusters have the option to flock to a GlideinWMS frontend. This situation would require a knowledgeable admin to run the GlideinWMS frontend.
In order to run on the OSG, a certificate is required. This is already built into GlideinWMS.
Model 3: Campus with GlideinWMS and OSG Gatekeeper
In this model, the campus allows grid jobs to run on the campus grid from the OSG. This would be the natural progression from campus grid to the Open Science Grid. It is expected that only a subset of the campus will allow grid jobs.
Model 4: Multi Campus with GlideinWMS
This model combines multiple campus grids. It also has the option of going to the grid through GlideinWMS. This is currently implemented at Nebraska.
The Campus Factory is on sourceforge
Pilot based submissions have been selected primarily because of the ease-of-use and flexibility they offer the submitter. Once a users job has been set up to run in a pilot environment, the user no longer needs to understand the complexities of submitting to different clusters and can also readily transition to submitting their jobs to the OSG. Pilot submissions create an overlay network to submit grid jobs. The user is provided a consistent interface across multiple resources. For this reason the OSG is encouraging all of its VOs to use this model. The Campus Factory will use lessons learned from the GlideinWMS factory, but implementing only a small subset of it's features.
The Campus Factory should be:
- Well Documented
- Simple to maintain
The factory should be O(100) lines, with a simple configuration consistent with configurations used by system services (key = value). To run within Condor, I will implement the factory as a persistent local job run within Condor's scheduler.
- 07 Sep 2010