You are here: TWiki > Storage Web>Hadoop>HadoopGratia (25 Oct 2011, DouglasStrain?)

Hadoop Gratia

WARNING! This page is for an older version of Hadoop. For newer versions, please visit Hadoop Release 3 Installation

The grid-enabled Hadoop DFS utilizes normal Globus GridFTP servers with a specialized plugin in order to do transfers. Because it uses this server software, we are able to take advantage of the OSG's accounting system. The system, Gratia, has a plugin which monitors the Globus GridFTP logs and reports any transfer data to a central collector.

The instructions on this page must be followed for every GridFTP server on Hadoop.

Assumptions and Prerequisites

This page assumes:

Yum-based Installation

We assume that you are using the Caltech Hadoop repository from the caltech-hadoop RPM, as documented here. Run the following command:

yum install gratia-probe-gridftp-transfer gums-client

When an update is released, you may run the following to upgrade:

yum upgrade gratia-probe-gridftp-transfer gums-client

Files and Directories Used

This RPM does not using Linux-standard file locations. Here are the most relevant file and directory locations:

Purpose Needs Editing? Location
Probe Configuration Yes /opt/vdt/gratia/probe/gridftp-transfer/ProbeConfig
Probe Executables No /opt/vdt/gratia/probe/gridftp-transfer
Log files No /opt/vdt/gratia/var/logs
Temporary files No /opt/vdt/gratia/var/tmp
Gums configuration Yes /etc/gums/gums-client.properties

Configuration

The RPM installs the Gratia probe into the system crontab, but does not configure it. The configuration of the probe is controlled by the file

/opt/vdt/gratia/probe/gratia-transfer/ProbeConfig

This is usually one XML node spread over multiple lines. Note that comments (#) have no effect on this file. You will need to edit the following:

Attribute Needs Editing Value
ProbeName Maybe This should be set to "gridftp-transfer:<hostname>", where <hostname> is the fully-qualified domain name of your gridftp host.
CollectorHost Maybe Set to the hostname and port of the central collector. By default it sends to the OSG collector. See below.
SiteName Yes Set to the resource group name of your site as registered in OIM.
GridftpLogDir Yes Set to /var/log, or wherever your current gridftp logs are located
Grid Maybe Set to "ITB" if this is a test resource; otherwise, leave as OSG.
UserVOMapFile Maybe Set to the location of your osg-user-vo-map.txt; see below for information about this file.
SuppressUnknownVORecords Maybe Set to 1 to suppress any records that can't be matched to a VO; 0 is strongly recommended.
SuppressNoDNRecords Maybe Set to 1 to suppress records that can't be matched to a DN; 0 is strongly recommended.
EnableProbe Yes Set to 1 to enable the probe.

The primary configuration file for the gums-client utilities is located in /etc/gums/gums-client.properties. The two properties that you must change are:

Attribute Needs Editing Value
gums.location Yes This should be set to the admin URL for your gums server, usually of the form gums.location=https://GUMS_HOSTNAME:8443/gums/services/GUMSAdmin
gums.authz Yes This should be set to the authorization interface URL for your gums server, usually of the form gums.authz=https://GUMS_HOSTNAME:8443/gums/services/GUMSXACMLAuthorizationServicePort

Selecting a collector host

The collector is the central server which logs the GridFTP transfers into a database. There are usually three options:

  1. OSG Transfer Collector: This is the primary collector for transfers in the OSG. Use CollectorHost="gratia-osg-transfer.opensciencegrid.org:80".
  2. OSG-ITB Transfer Collector: This is the test collector for transfers in the OSG. Use CollectorHost="gratia-osg-transfer.opensciencegrid.org:8881".
  3. Site local collector: If your site has set up its own collector, then your admin will be able to give you an endpoint to use. Typically, this is along the lines of CollectorHost="collector.example.com:8880".

Generating osg-user-vo-map.txt

The osg-user-vo-map.txt is a simple, space-separated format that contains 2 columns; the first is a unix username and the second is the VO which that username correspond to. It can be created with the gums-client tools. It is also auto-generated by OSG utilities on the OSG CE.

To create the osg-user-vo-map.txt, you can run the gums client:

gums --host generateOsgUserVoMap -f /etc/grid-security/osg-user-vo-map.txt

Alternately, you can use a cron job to automatically copy over this file from your CE (can be found in $VDT_LOCATION/osg/etc/osg-user-vo-map.txt) into some directory on your gridftp server, and then updating the attribute UserVOMapFile in ProbeConfig to point at it. This may be a preferable solution if you have many gridftp servers so that your gridftp servers don't all hit the gums server at the same time.

Without this file, all gridftp transfers will show up as belonging to the VO "Unknown".

Validation

Run the Gratia probe once by hand to check for functionality:

/opt/vdt/gratia/probe/gridftp-transfer/gridftp-transfer_meter.cron.sh

Look for any abnormal termination and report it if it is a non-trivial site issue. Look in the log files in /opt/vdt/gratia/var/logs/<date>.log and make sure there are no error messages printed.

Topic revision: r9 - 25 Oct 2011 - 21:13:25 - DouglasStrain?
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..