You are here: TWiki > Storage Web>Hadoop (10 Jan 2012, DouglasStrain?)

Hadoop Distributed File System

WARNING! This page is for an older version of Hadoop. For newer versions, please visit Hadoop Overview

The Hadoop Distributed File System (HDFS) is a highly scalable, very reliable distributed file system developed by the Apache project as a part of the Hadoop data processing system. The primary contributor (and largest user) is Yahoo. HDFS is based on the design of the Google File System. HDFS's strengths is in its ability to use commodity hard drives in worker nodes; it can turn a large amount of semi-reliable hardware into a system which is very reliable.

To find out more information about HDFS, visit its home page. If you are thinking about installing Hadoop, we also recommend reading the HDFS architecture page.

This page covers the OSG's usage of Hadoop, and includes instructions for installing a grid-enabled HDFS system.

Information for Site Admins

Preparation

WARNING! This page is for an older version of Hadoop. For newer versions, please visit Hadoop Release 3 Installation

If you plan on installing a Hadoop SE on the OSG, we recommend starting off with the planning document.

Just curious about HDFS? The planning document includes a section on a minimal install. For the HDFS core components, this would require at least 3 nodes (1 namenode and 2 datanodes), but this will not give you full functionality. It will take 5 nodes to enable all the components.

Installation

Once you have read the planning document and feel you understand the general architecture, follow these guides (in order).

%WARNING% WARNING! These guides are for an older version of Hadoop. For newer versions, please visit Hadoop Release 3 Installation

Major components:

  • Hadoop and FUSE. This guide covers installation of the core HDFS components, including the FUSE-based mounts. Once completed, you will be able to store files and interact with the file system locally.
    • Hadoop and FUSE. This guide covers installation of the next version of Hadoop, 0.20.
  • GridFTP. This guide covers installation of the HDFS-aware GridFTP server. Once completed, you should be able to copy files in and out of HDFS through the grid-standard WAN protocol, GridFTP.
    • GridFTP. This guide covers installation of the GridFTP server compatible with the next version of Hadoop, 0.20.
  • SRM. This covers the installation of a BeStMan SRM server on top of HDFS. Once completed, you should be able to interact with HDFS via SRM, a grid-standard webservices protocol for doing metadata operations remotely.
    • SRM. This guide covers installation of the BeStMan2 SRM server compatible with the next version of Hadoop, 0.20.
  • Xrootd server. Using Xrootd to export your data over the WAN and allow quick and secure ROOT access to files.

Minor components:

  • Gratia Probe. The Gratia probe instruments the GridFTP servers running on HDFS; it uses their log files to send records of all completed transfers to a central server. Once completed, you should see transfers at your site show up in the central OSG accounting.
    • Gratia Probe. This guide covers installation of the Gratia probe compatible with the next version of Hadoop, 0.20.
  • Hadoop Chronicle Storage Reports. These are gratia-based storage reports that give you a daily and historical view of the status of your Hadoop cluster.
  • Apache integration. Configuring the Apache web server to serve up files from HDFS

Validation

These guides provide simple tests you can perform to see if your install is functioning.

Operations and Troubleshooting

HDFS, while relatively easy to administrate, is not completely headache free! The pages below offer tips and tricks for operating and maintaining HDFS.

Get Involved! Contact Us!

Mailing list
osg-hadoop@fnal.gov
Chat
uscms-t2@conference.fnal.gov (Jabber Multi-User Chat)

Information for developers

Topic revision: r34 - 10 Jan 2012 - 21:21:41 - DouglasStrain?
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..