Grids are making it possible for any group of users to run hundreds of thousands of jobs in a matter of days. However, the batch slots are not organized in a common pool, but are grouped in independent pools at hundreds of Grid sites distributed among the five continents. That’s where a higher-level Workload Management System (WMS) comes in.
True to its name, glideinWMS is a WMS that uses Condor
glideins to provision resources. A glideinWMS instance operated by a group dynamically aggregates a fraction of Grid resources into a seemingly private Condor pool for that group. Although glideinWMS uses Condor, it does not require Condor to be pre-installed on the Grid resources – any grid resource may be used.
Any user of the group can submit jobs to the local Condor scheduler, and through standard Condor matchmaking mechanisms, the job will eventually start on one of the nodes of the above-mentioned pool. The user is not exposed to the Grid environment, with the possible exception of owning a Grid proxy for enhanced security and remote file access.
Installing and configuring a glideinWMS instance can be daunting, but its advantages are significant. Once configured, the ease of use of Grid resources is vastly improved, and the group can better prioritize job execution. In addition, glideins remove several Grid limitations, for example, efficiently executing short jobs. They also dramatically reduce the error rate for user jobs, since malfunctioning Grid resources are caught at the provisioning stage (see animation
for more information or to download the current release.
Many OSG sites are using BeStMan to provide the SRM interface on top of
their storage system.
BeStMan works with disk-based file systems (such as NFS, PVFS, AFS, GFS,
GPFS, PNFS, HFS+, HDFS, Ibrix, Lustre and XrootdFS,) mass storage systems
such as HPSS, and file servers such as Xrootd as well as using different
transfer protocols including GSIFTP, FTP, BBFTP, HTTP, HTTPS. End users may
have their own personal BeStMan that manages and provides an SRM interface
to their local disks or storage systems.
BeStMan in Full mode provides all the functions related to space
reservations, dynamic space allocation, directory management, and pinning of
files in space for a specified lifetime. It manages queues of multiple
requests to get or put files into spaces it manages, where each request can
be for multiple files or entire directories.
When managing multiple files, BeStMan can take advantage of the available
network bandwidth by scheduling multiple concurrent file transfers. BeStMan
in Gateway mode provides high performance and the same SRM interface on any
existing file systems without queuing or space management.
Hadoop is an open-source data processing framework that includes a scalable, fault-tolerant distributed file system, HDFS. Although HDFS was designed to work in conjunction with Hadoop's job scheduler, the US Compact Muon Solenoid (CMS) experiment has re-purposed it to serve as a grid storage element by adding GridFTP and SRM servers. A full review of the capability and operational stability was performed for the collaboration. HDFS is now installed at six sites on OSG; five are a data storage solution for CMS at the LHC, and one is a test site for LIGO at Caltech. Hadoop is being prepared for inclusion in the Virtual Data Toolkit (VDT). Apache hadoop, 2009, More information.
Generic Information Provider
The Generic Information Provider (GIP) provides information services for the OSG. It collects information--such as the number of CPUs available and the amount of disk space remaining--about your system, provides it to CEMon, and publishes it to the OSG Grid Operations Center in a standard format. GIP is grid-agnostic; it can be integrated into the configuration and installation of middleware. More information
Past Technology Highlights
- 01 Feb 2010