Documentation
StorageInfrastructureSoftware
Review Passed
by NehaSharma
Released
by DouglasStrain

Storage Infrastructure Software

The Storage Architecture

Storage software contains multiple components that work together to provide the storage service. This list shows definitions of the major components as categories -specific implementations are described in the remainder of this topic.

  • Distributed Storage. A cluster deployed with software to provide a unified storage area.
  • Namespace. A data base connecting logical filenames to physical locations.
  • Data transfer. A service to move data in the form of files.
  • Replication. The practice of making multiple copies of data to protect against hardware failure.
  • Resource Management. A software layer that controls storage access.
  • Archiving. The practice of making permanent copies of the data.

Some or all of these six are typically bundled together and make up what is known as a "Storage Element". The selection could range from only one, for example, data transfer in the form of gridftp on an ordinary filesystem, to all six, as in a LHC Tier 1 site with tape-archive backend.

A further set of services relating to information are realized as independent components. Some of theses also pertain to the job execution stack and allow coordination of compute and storage resources.

  • Information Services
    • Catalogs. A database containing metadata for files, especially their locations and paths.
    • Monitoring. A system to determine and display the health and activity of storage resources.
    • Discovery. A searchable database containing site configuration information, including service endpoints, capabilities, and authorization.
    • Accounting. A database viewed with preconfigured reports that show historical grid activity from various aspects such as site, virtual organization, or access type.

In addition, experiments with very intensive data transfer requirements use dedicated software to manage the movement of files.

  • File Transfer Services. Automates the data movement process and provides an organized view of the status of file transfers.

The latter two architectural components, Information Services and File Transfer Services, are discussed in their own topics; see the links to them in Other Storage Infrastructure Topics, below.

Distributed Storage

In grid computing, just as computational power in the form of CPUs may be distributed over many computers, or "worker nodes", storage in the form of hard disks may be distributed over many computers in order to provide a large, unified storage area. Often, the same computers that serve as worker nodes for a Compute Element also hold storage for a Storage Element. There are many implementations of distributed storage, several of which may be found on the Open Science Grid. In general, the emphasis of storage on the OSG is towards high throughput based on scalability, rather than low-latency based on highly performant hardware. For a table comparing the features of some of the above software, please see the Storage Implementations Table.

  • Hadoop. HDFS, or the Hadoop Distributed File System, is a distributed storage system based on the Hadoop implementation of Google's map-reduce algorithm. A key element of HDFS is robust support for Replication, allowing the use of low-cost hard drives while maintaining reliability. HDFS has a highly scalable architecture, in which the basic unit of storage is a block. Hadoop HDFS is provided using the Release 3 of the OSG, and community support through an osg-hadoop mailing list is available for operational issues. For more information, please see the Bestman-hadoop section of the storage site administrator page.
  • xrootd was designed to provide storage for physics analysis programs based on the software package named "root" and includes an "Cluster Management System" daemon (cmsd) by which distributed storage clusters may be composed. Developed at the Stanford Linear Accelerator Center, xrootd is written in C++ with highly-optimized algorithms to provide fast and deterministically bounded processing times, resulting in low latencies even when a large number of files is present. xrootd is provided by the OSG through VDT packaging, please see the Bestman-xrootd section of the storage site administrator page for details.
  • dCache. A major part of the dCache Storage Element implementation is its distributed storage system, based on components known as "pools". Storage is file-based, and replication is supported, allowing the use of commodity hardware. Access to pools is controlled through a "Pool Manager", which allows logical storage areas to be created and access to be granted based on user identity or role, client IP address, operation (read or write), or transfer protocol. OSG provides packaging and support for dCache, please see the dCache section of the storage site administrator page.
  • DPM Of interest to OSG users because of its deployment on the European grid EGEE, the Disk Pool Manager is a lightweight solution for managing disk storage. It can be accessed via SRM 1 & 2, and also provides data access through the GridFTP, rfio transfer protocols.
  • Other distributed file systems include Lustre, ZFS, ReDDNet, L-Store, and NFS 4.1. Of these, only Lustre and ZFS may be found on the Open Science Grid, though their use may increase in the future. There are plans to support NFS 4.1 in dCache and DPM.

Note that it is not required that a Storage Element use a distributed file system. Storage appliances can provide tens of terabytes of storage in a single unit. The globus implementation of the gridftp data transfer mechanism can serve files from any mounted file system, and may be used in combination with the Bestman Storage Element for SRM access. For further information, see Storage for Site Administrators.

Namespace

All distributed file systems rely on the use of a namespace component which allows the logical name and path of a file to be separated from its physical location. Typically, a database is maintained with the needed "metadata" for each file. Since there is typically just one instance of this component in a distributed file system, it can represent a single point of failure. In dCache, frequent backups of the database In addition, for large systems, a performance bottleneck may occur at the namespace node.

Data transfer

While most distributed storage systems have their own access protocols, they do allow for other file server mechanisms. These are of particular use for interoperability, when a client at a remote site may not be using the native protocol of the storage service. The most commonly-used data transfer software for this purpose is gridftp.

  • gridftp is used for serving files over the wide area network. Security options include the Grid Security Infrastructure, the authentication framework adopted by the OSG. A major feature of gridftp is the ability to transfer files over multiple data channels, which can increase throughput compared to one channel by a factor of ten. There are two implementations of gridftp: by Globus, and by dCache. The dCache implementation is bundled with SRM-dCache and is not installable as a separate component. For an introduction to gridftp, please see Overview of GridFTP in the OSG.
  • Other file servers may be categorized by the protocols they support. Data transfer protocols include dcap, gsidcap, xroot, http, https, bbftp and ftp. The protocol for gridftp is gsiftp.

Replication

Replication of files is used to mitigate data loss in the case of hard disk failure. In implementations that support replication (see the Storage implementations Table). Replication occurs automatically, with the number of copies being detected by the replication service. When a disk is lost the system automatically creates additional replicas for the affected files and in the interim uses existing replicas for uninterrupted service. In dCache the Replica Manager creates replicas of whole files, among a specified subset of pools. The Hadoop HDFS storage service does replication at the block level, and allows specifying that block replication not occur within a set of storage nodes, such as all those situated in one rack.

An alternative to file or block replication is the use of RAID arrays, typically RAID-5, by which data redundancy is executed at the hardware level. Upon loss of a disk, the vendor-supplied rebuilding process restores the redundancy.

Resource Management

SRM is a software specification for access to mass storage systems. The specification allows for interoperability among clients and servers of various storage implementations. Any client which satisfies the specification can operate with any server which also does so. The specification supports commonly-used storage operations such as get, put, copy (for moving files from one SRM storage element to another), bring-online (to cause a file to be moved from a tape archive to the disk cache for later transfer), and space reservation. SRM also supports protocol negotiation, so the client may request a data transfer protocol or state which protocols it supports, allowing the SRM service to connect it to a suitable file server endpoint.

This diagram shows how a gridftp client would access storage, and how a srm client would access storage. In the case of gridftp the client contacts the file server directly. In the case of SRM, the client contacts the SRM server, which communicates to the client the file server to be used, based on availability and requested protocol. In each case, the file server uses the namespace component of the storage system to determine the pool or pools to be involved in the transfer.
storage-architecture.GIF

Bestman

For more information, please see Documentation.BestmanStorageElement.

dCache

Dcache is no longer supported by the Open Science Grid. For more information, please see Dcache homepage.

Other SRM Implementations

There are other implementations of the Storage Resource Manager specification. While these implementations are not supplied or directly supported by the OSG, there are interactions with these storage systems when data is moved from one grid to another.

  • CASTOR. The CERN Advanced STORage manager is a tape-backed hierarchical storage management (HSM) system developed at CERN used to store physics production files and user files.
  • DPM The Disk Pool Manager is a lightweight storage system that supports GSI and SRM.
  • StoRM is an SRM implementation from EGRID, INFN, and GRID.IT that can run on top of any posix filesystem.

Archiving

Some storage systems have magnetic tape drive components which allow files to be stored for long periods. Storage on tape at a large site is on the order of 10 petabytes. Files are staged to and from the tape drives via a hard-disk caching system. In the SRM specification, files that are on tape but on in the cache are said to have an "access latency" of OFFLINE, and files that are in the cache have an access latency of NEARLINE. SRM clients have an option of specifying the final access latency of a file. For more information on storage clients, please see Storage for the End User.

Among the storage software provided and supported by the OSG, only dCache includes the option of having a tape backend. Sites on the OSG that have tape archival capability are Brookhaven ATLAS Tier 1, Fermilab CMS Tier 1, Fermilab CDF, and Fermilab public dCache.

Table of Storage implementations used in the Open Science Grid

The following table summarizes the capabilities of various storage software implementations.

Software Distributed Storage Resource Management Data Transfer Protocols Replication Archiving Namespace
gridftp any mounted   gsiftp      
xrootd XrdOss cmsd xroot,posix+   XrdOss XrdSfs
Bestman any mounted Bestman SRM gsiftp,posix      
Bestman-xrootd xrootd Bestman SRM Gateway gsiftp,xroot,posix+      
Hadoop SE HDFS Bestman SRM Gateway gsiftp Block Replication   NameNode/fuse
SRM-dCache dCache Fermi SRM gsiftp,dcap,posix+,gsidap,xroot Replica Manager HMS pnfs or chimera

+with preloaded libraries

Other Storage Infrastructure Topics

For discussions of the other two subtopics on the OSG Storage Infrastructure, please click on the links below.

-- DouglasStrain - 11 Oct 2011

Topic attachments
I Attachment Action Size Date Who Comment
bmpbmp InfoStorage.bmp manage 3379.9 K 06 Apr 2010 - 18:27 TedHesselroth Using information services in a file transfer
gifgif InfoStorage.gif manage 264.3 K 06 Apr 2010 - 18:17 TedHesselroth Using information services in a file transfer
jpgjpg InfoStorage.jpg manage 94.7 K 06 Apr 2010 - 18:19 TedHesselroth Using information services in a file transfer
pngtif InfoStorage.tif manage 264.3 K 06 Apr 2010 - 18:18 TedHesselroth Using information services in a file transfer
jpgjpg InfoStorageXfer.jpg manage 96.2 K 06 Apr 2010 - 18:31 TedHesselroth Using information services in a file transfer
jpgjpg bestman-gateway-howitworks.jpg manage 49.7 K 16 Feb 2010 - 19:11 TedHesselroth Bestman - How it works.
jpgjpg bestman-gateway-xrootd-howitworks.jpg manage 50.9 K 16 Feb 2010 - 19:45 TedHesselroth BeStMan Gateway with Xrootd - How it works
jpgjpeg bestman_gateway_arch.jpeg manage 9.7 K 16 Feb 2010 - 19:11 TedHesselroth BeStMan?-gateway architecture
jpgjpeg betsman_gateway_xrootd.jpeg manage 34.6 K 16 Feb 2010 - 19:48 TedHesselroth BeStMan?-gateway/Xrootd architecture
gifGIF storage-architecture.GIF manage 108.2 K 24 Mar 2010 - 17:23 TedHesselroth Diagram of Components of a Storage Element
Topic revision: r43 - 15 Feb 2012 - 20:59:08 - KyleGross
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..