Hadoop 2.0.0 (CDH4)

Purpose: The purpose of this document is to provide Hadoop based SE administrators the information on how to prepare, install and validate the SE.

on on

If you are installing Hadoop/Bestman from OSG 3.1, you will need to use this guide instead. This guide details installing Hadoop 2.0 from the OSG 3.2 repositories.



Hadoop Distributed File System (HDFS) is a scalable reliable distributed file system developed in the Apache project. It is based on map-reduce framework and design of the Google file system. The VDT distribution of Hadoop includes all components needed to operate a multi-terabyte storage site. Included are:

The OSG packaging and distribution of Hadoop is based on YUM. All components are packaged as RPMs and are available from the OSG repositories. It is also recommended that you enable EPEL repos.

Note on upgrading from Hadoop 0.20

If upgrading, make sure to follow these instructions BEFORE any other instructions in this document.

  1. First, you must upgrade to the newest version of Hadoop-0.20. Older versions may have dependency and upgrade problems. Make sure that your version is at least hadoop-0.20-0.20.2+737-26 (or newer) on all nodes. (The important number is the 26. Older release numbers may have upgrade problems). You may need to specify this version specifically to ensure that the correct version is installed. ie. yum upgrade hadoop-0.20-0.20.2+737-26.
  2. Next, make sure all configuration and important files are backed up in case of catastrophic failure. In particular, backup a copy of hdfs-site.xml, core-site.xml and important namenode files.
  3. Now, upgrade to hadoop-2.0.0 using yum upgrade hadoop
  4. Also, make sure to bring in any new packages using the relevant meta-package, such as yum install osg-se-hadoop-namenode, yum install osg-se-hadoop-datanode or yum install osg-se-hadoop-srm.
  5. On the namenode, run hadoop namenode -upgrade to upgrade the meta-data catalog.
  6. Follow the configuration instructions below for each node. In particular, restore or modify hdfs-site.xml and core-site.xml then copy to all nodes. For any nodes using fuse mounts, note that "hdfs#" should be changed to "hadoop-fuse-dfs#" in /etc/fstab.



There are several important components to a storage element installation. Throughout this document, it will be stated which node the relevant installation instructions apply to. It can apply to one of the following:
  • Namenode: You will have at least one namenode. The name node functions as the directory server and coordinator of the hadoop cluster. It houses all the meta-data for the hadoop cluster. The namenode and secondary namenode need to have a directory that they can both access on a shared filesystem so that they can exchange filesystem checkpoints.
  • Secondary Namenode: This is a secondary machine that periodically merges updates to the HDFS file system back into the fsimage. This dramatically improves startup and restart times.
  • Datanode: You will have many datanodes. Each data node stores large blocks of files to be stored on the hadoop cluster.
  • Client: This is a documentation shorthand that refers to any machine with the hadoop client commands and FUSE mount. Any machine that needs a FUSE mount to access data in a POSIX-like fashion will need this.
  • GridFTP node: This is a node with Globus GridFTP. The GridFTP server for Hadoop can be very memory-hungry, up to 500MB/transfer in the default configuration. You should plan accordingly to provision enough GridFTP servers to handle the bandwidth that your site can support.
  • SRM node: This node will contain the BeStMan SRM frontend for accessing the Hadoop cluster via the SRM protocol. BeStMan2 SRM

Note that these components are not necessarily mutually exclusive. For instance, you may consider having your GridFTP server co-located on the SRM node. Alternatively, you can locate a client (or even a GridFTP node) co-located on each data node. That way, each data node also acts as an access point to the hadoop cluster.

Please read the planning document to understand different components of the system.

Total installation time, on an average, should not exceed 8 to 24 man-hours. If your site needs further assistance to help expedite, please email osg-storage@opensciencegrid.org and osg-hadoop@opensciencegrid.org.

Host and OS

Hadoop will run anywhere that Java is supported (including Solaris). However, these instructions are for RedHat derivants (including Scientific Linux) because of the RPM based installation. The current supported Operating Systems supported by the OSG are Red Hat Enterprise Linux 6, 7, and variants (see details...).

The HDFS prerequisites are:

  • Minimum of 1 headnode (the namenode)
  • At least one node which will hold data, preferably at least 2. Most sites will have 20 to 200 datanodes.
  • Working Yum and RPM installation on every system.
  • fuse kernel module and fuse-libs.
  • Java RPM. If java isn't already installed we supply the Oracle jdk 1.6.0 rpm and it will come in as a dependency. Oracle jdk is currently the only jdk supported by OSG so we highly recommend you use the version supplied.

Compatibility Note Note that versions of OpenAFS less than 1.4.7 and greater than 1.4.1 create nameless groups on Linux; these groups confuse Hadoop and prevent its components from starting up successfully. If you plan to install Hadoop on a Linux OpenAFS client, make sure you're running at least OpenAFS 1.4.7.


This installation will create following users unless they are already created.

User Comment
bestman Used by Bestman SRM server (needs sudo access).
hdfs Used by Hadoop to store data blocks and meta-data

For this package to function correctly, you will have to create the users needed for grid operation. Any user that can be authenticated should be created.

For grid-mapfile users, each line of the grid-mapfile is a certificate/user pair. Each user in this file should be created on the server.

For gums users, this means that each user that can be authenticated by gums should be created on the server.

Note that these users must be kept in sync with the authentication method. For instance, if new users or rules are added in gums, then new users should also be added here.


Certificate User that owns certificate Path to certificate
Host certificate root /etc/grid-security/hostcert.pem
Bestman service certificate bestman /etc/grid-security/bestman/bestmancert.pem

Instructions to request a service certificate.

You will also need a copy of CA certificates (see below). Note that the osg-se-hadoop-srm and osg-se-hadoop-gridftp package will automatically install a certificate package but will not necessarily pick the cert package you expect. For instance, certain installs will prefer the osg-ca-scripts package to fulfill this requirement, which installs a set of scripts to automatically update the certificates, but does not initialize the CA certs by default (you have to run it first). For this reason, you may want to specifically install the cert package of your choice first, before installing Hadoop.


For more details on overall Firewall configuration, please see our Firewall documentation.

Service Name Protocol Port Number Inbound Outbound Comment
GRAM callback tcp GLOBUS_TCP_PORT_RANGE Y   contiguous range of ports
GRAM callback tcp GLOBUS_TCP_SOURCE_RANGE   Y contiguous range of ports
GridFTP tcp 2811 and GLOBUS_TCP_SOURCE_RANGE Y   contiguous range of ports
Storage Resource Manager tcp 8080 Y    
Storage Resource Manager tcp 8443 Y    

NOTE: The versions of Hadoop in OSG series 3.1 and 3.2 (ie, Hadoop 0.20 and 2.0.0) do not inter-operate. In order to use Hadoop 2.0.0, all nodes in the hadoop system (namenode, secondary namenode, datanodes, srm/gridftp nodes and all client nodes) must update to OSG 3.2 and Hadoop 2.0.0.

Initializing Certificate Authority

This is needed by GridFTP and SRM nodes, but it is recommended for all nodes in the cluster.

Enable and Start fetch-crl

To enable fetch-crl (fetch Certificate Revocation Lists) services by default on the node:
# For RHEL 5, CentOS 5, and SL5 
[root@client ~]$ /sbin/chkconfig fetch-crl3-boot on
[root@client ~]$ /sbin/chkconfig fetch-crl3-cron on
# For RHEL 6, CentOS 6, and SL6, or OSG 3 _older_ than 3.1.15 
[root@client ~]$ /sbin/chkconfig fetch-crl-boot on
[root@client ~]$ /sbin/chkconfig fetch-crl-cron on
# For RHEL 7, CentOS 7, and SL7 
[root@client ~]$ systemctl enable fetch-crl-boot
[root@client ~]$ systemctl enable fetch-crl-cron
To start fetch-crl:
# For RHEL 5, CentOS 5, and SL5 
[root@client ~]$ /sbin/service fetch-crl3-boot start
[root@client ~]$ /sbin/service fetch-crl3-cron start
# For RHEL 6, CentOS 6, and SL6, or OSG 3 _older_ than 3.1.15 
[root@client ~]$ /sbin/service fetch-crl-boot start
[root@client ~]$ /sbin/service fetch-crl-cron start
# For RHEL 7, CentOS 7, and SL7 
[root@client ~]$ systemctl start fetch-crl-boot
[root@client ~]$ systemctl start fetch-crl-cron
NOTE: while it is necessary to start fetch-crl-cron in order to have it active, fetch-crl-boot is started automatically at boot time if enabled. The start command will run fetch-crl-boot at the moment when it is invoked and it may take some time to complete.

Configure fetch-crl

To modify the times that fetch-crl-cron runs, edit /etc/cron.d/fetch-crl (or /etc/cron.d/fetch-crl3 depending on the version you have).

By default, fetch-crl connects directly to the remote CA; this is inefficient and potentially harmful if done simultaneously by many nodes (e.g. all the worker nodes of a big cluster). We recommend you provide a HTTP proxy (such as squid) the worker nodes can connect to. Here are instructions to install a squid proxy.

To configure fetch-crl to use an HTTP proxy server:

  • If using fetch-crl version 2 (the fetch-crl package on RHEL5 only), then create the file /etc/sysconfig/fetch-crl and add the following line:
    export http_proxy=http://your.squid.fqdn:port
    Adjust the URL appropriately for your proxy server.
  • If using fetch-crl version 3 on RHEL5 via the fetch-crl3 package or on RHEL6/RHEL7 via the fetch-crl package, then create or edit the file /etc/fetch-crl3.conf (RHEL5) or /etc/fetch-crl.conf (RHEL6/RHEL7) and add the following line:
    Again, adjust the URL appropriately for your proxy server.

Note that the nosymlinks option in the configuration files refers to ignoring links within the certificates directory (e.g. two different names for the same file). It is perfectly fine if the path of the CA certificates directory itself (infodir) is a link to a directory.

Any modifications to the configuration file will be preserved during an RPM update.

For more details, please see our fetch-crl documentation.

Current versions of fetch-crl and fetch-crl3 produce more output. It is possible to send the output to syslog instead of the default email system. To do so:

  1. Change the configuration file to enable syslog:
    logmode = syslog
    syslogfacility = daemon
  2. Make sure the file /var/log/daemon exists, e.g. touching the file
  3. Change /etc/logrotate.d files to rotate it


Installation depends on the node you are installing:

Namenode Installation

[root@client ~]$ yum install osg-se-hadoop-namenode

Secondary Namenode Installation

[root@client ~]$ yum install osg-se-hadoop-secondarynamenode

Datanode Installation

[root@client ~]$ yum install osg-se-hadoop-datanode

Client/FUSE Installation

[root@client ~]$ yum install osg-se-hadoop-client

Standalone Gridftp Node Installation

[root@client ~]$ yum install osg-se-hadoop-gridftp

If you are using GUMS authorization, the follow rpms need to be installed as well:

[root@client ~]$ yum install lcmaps-plugins-gums-client
[root@client ~]$ yum install lcmaps-plugins-basic

SRM Node Installation

[root@client ~]$ yum install osg-se-hadoop-srm

If you are using a single system to host the SRM software and the gridftp node, you'll also need to install the osg-se-hadoop-gridftp rpm as well.


Hadoop Configuration

Needed by: Hadoop namenode, Hadoop datanodes, Hadoop client, GridFTP, SRM

Hadoop configuration is needed by every node in the hadoop cluster. However, in most cases, you can do the configuration once and copy it to all nodes in the cluster (possibly using your favorite configuration management tool). Special configuration for various special components is given in the below sections.

Hadoop configuration is stored in /etc/hadoop/conf. However, by default, these files are mostly blank. OSG provides a sample configuration in /etc/hadoop/conf.osg with most common values filled in. You will need to copy these into /etc/hadoop/conf before they become active. Please let us know if there are any common values that should be added/changed across the whole grid. You will likely need to modify hdfs-site.xml and core-site.xml. Review all the settings in these files, but listed below are common settings to modify:

File Setting Example Comments
core-site.xml fs.default.name hdfs://namenode.domain.tld.:9000 This is the address of the namenode
core-site.xml hadoop.tmp.dir /data/scratch Scratch temp directory used by Hadoop
core-site.xml hadoop.log.dir /var/log/hadoop-hdfs Log directory used by Hadoop
core-site.xml dfs.umaskmode 002 umask for permissions used by default
hdfs-site.xml dfs.block.size 134217728 Block size: 128MB by default
hdfs-site.xml dfs.replication 2 Default replication factor. Generally the same as dfs.replication.min/max
hdfs-site.xml dfs.datanode.du.reserved 100000000 How much free space hadoop will reserve for non-Hadoop usage
hdfs-site.xml dfs.datanode.handler.count 20 Number of server threads for datanodes. Increase if you have many more client connections
hdfs-site.xml dfs.namenode.handler.count 40 Number of server threads for namenodes. Increase if you need more connections
hdfs-site.xml dfs.http.address namenode.domain.tld.:50070 Web address for dfs health monitoring page

See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml for more parameters to configure.

Namenodes must have a /etc/hosts_exclude present

Special namenode instructions for brand new installs

If this is a new installation (and only if this is a brand new installation), you should run the following command as the hdfs user. (Otherwise, be sure to chown your storage directory to hdfs after running):

hadoop namenode -format

This will initialize the storage directory on your namenode

FUSE Client Configuration

Needed by: Hadoop client and SRM node. Recommended but not neccessary for GridFTP nodes.

A FUSE mount is required on any node that you would like to use standard POSIX-like commands on the Hadoop filesystem. FUSE (or "file system in user space") is a way to access Hadoop using typical UNIX directory commands (ie POSIX-like access). Note that not all advanced functions of a full POSIX-compliant file system are necessarily available.

FUSE is typically installed as part of this installation, but, if you are running a customized or non-standard system, make sure that the fuse kernel module is installed and loaded with modprobe fuse.

You can add the FUSE to be mounted at boot time by adding the following line to /etc/fstab:

hadoop-fuse-dfs# /mnt/hadoop fuse server=namenode.host,port=9000,rdbuffer=131072,allow_other 0 0
Be sure to change the /mnt/hadoop mount point and namenode.host to match your local configuration. To match the help documents, we recommend using /mnt/hadoop as your mountpoint.

Once your /etc/fstab is updated, to mount FUSE run:

[root@client ~]$ mkdir /mnt/hadoop
[root@client ~]$ mount /mnt/hadoop

When mounting the HDFS FUSE mount, you will see the following harmless warnings printed to the screen:

# mount /mnt/hadoop
INFO fuse_options.c:162 Adding FUSE arg /mnt/hadoop
INFO fuse_options.c:110 Ignoring option allow_other

If you have troubles mounting FUSE refer to Running FUSE in Debug Mode in the Troubleshooting section.

Creating VO and User Areas

Grid Users are needed by GridFTP and SRM nodes. VO areas are common to all nodes.

For this package to function correctly, you will have to create the users needed for grid operation. Any user that can be authenticated should be created.

For grid-mapfile users, each line of the grid-mapfile is a certificate/user pair. Each user in this file should be created on the server.

For gums users, this means that each user that can be authenticated by gums should be created on the server.

Note that these users must be kept in sync with the authentication method. For instance, if new users or rules are added in gums, then new users should also be added here.

Prior to starting basic day-to-day operations, it is important to create dedicated areas for each VO and/or user. This is similar to user management in simple UNIX filesystems. Create (and maintain) usernames and groups with UIDs and GIDs on all nodes. These are maintained in basic system files such as /etc/passwd and /etc/group.

In the examples below It is assumed a FUSE mount is set to /mnt/hadoop. As an alternative hadoop fs commands could have been used.

For clean HDFS operations and filesystem management:

(a) Create top-level VO subdirectories under /mnt/hadoop.


[root@client ~]$ mkdir /mnt/hadoop/cms
[root@client ~]$ mkdir /mnt/hadoop/dzero
[root@client ~]$ mkdir /mnt/hadoop/sbgrid
[root@client ~]$ mkdir /mnt/hadoop/fermigrid
[root@client ~]$ mkdir /mnt/hadoop/cmstest
[root@client ~]$ mkdir /mnt/hadoop/osg

(b) Create individual top-level user areas, under each VO area, as needed.

[root@client ~]$ mkdir -p /mnt/hadoop/cms/store/user/tanyalevshina
[root@client ~]$ mkdir -p /mnt/hadoop/cms/store/user/michaelthomas
[root@client ~]$ mkdir -p /mnt/hadoop/cms/store/user/brianbockelman
[root@client ~]$ mkdir -p /mnt/hadoop/cms/store/user/douglasstrain
[root@client ~]$ mkdir -p /mnt/hadoop/cms/store/user/abhisheksinghrana

(c) Adjust username:group ownership of each area.

[root@client ~]$ chown -R cms:cms /mnt/hadoop/cms
[root@client ~]$ chown -R sam:sam /mnt/hadoop/dzero

[root@client ~]$ chown -R michaelthomas:cms /mnt/hadoop/cms/store/user/michaelthomas

GridFTP Configuration

gridftp-hdfs reads the Hadoop configuration file to learn how to talk to Hadoop. By now, you should have followed the instruction for installing hadoop as detailed in the previous section as well as created the proper users/directories.

The default settings in /etc/gridftp.conf along with /etc/gridftp.d/gridftp-hdfs.conf are used by the init.d script and should be ok for most installations. The file /etc/gridftp-hdfs/gridftp-debug.conf is used by /usr/bin/gridftp-hdfs-standalone for starting up the GridFTP server in a testing mode. Any additional config files under /etc/gridftp.d will be used for both the init.d and standalone GridFTP server. /etc/sysconfig/gridftp-hdfs contains additional site-specific environment variables that are used by the gridftp-hdfs dsi module in both the init.d and standalone GridFTP server. Some of the environment variables that can be used in /etc/sysconfig/gridftp-hdfs include:

Option Name Needs Editing? Suggested value
GRIDFTP_HDFS_REPLICA_MAP No File containing a list of paths and replica values for setting the default # of replicas for specific file paths
GRIDFTP_BUFFER_COUNT No The number of 1MB memory buffers used to reorder data streams before writing them to Hadoop
GRIDFTP_FILE_BUFFER_COUNT No The number of 1MB file-based buffers used to reorder data streams before writing them to Hadoop
GRIDFTP_SYSLOG No Set this to 1 in case if you want to send transfer activity data to syslog (only used for the HadoopViz? application)
GRIDFTP_HDFS_MOUNT_POINT Maybe The location of the FUSE mount point used during the Hadoop installation. Defaults to /mnt/hadoop. This is needed so that gridftp-hdfs can convert fuse paths on the incoming URL to native Hadoop paths. Note: this does not imply you need FUSE mounted on GridFTP nodes!
GRIDFTP_LOAD_LIMIT No GridFTP will refuse to start new transfers if the load on the GridFTP host is higher than this number; defaults to 20.
TMPDIR Maybe The temp directory where the file-based buffers are stored. Defaults to /tmp.

/etc/sysconfig/gridftp-hdfs is also a good place to increase per-process resource limits. For example, many installations will require more than the default number of open files (ulimit -n).

Lastly, you will need to configure an authentication mechanism for GridFTP.

Configuring authentication

For information on how to configure authentication for your GridFTP installation, please refer to the configuring authentication section of the GridFTP guide.

GridFTP Gratia Transfer Probe Configuration

Needed by GridFTP node only.

The Gratia probe requires the file user-vo-map to exist and be up to date. This file is created and updated by the gums-client package that comes in as a dependency of osg-se-hadoop-gridftp or osg-gridftp-hdfs. Assuming you installed GridFTP using the osg-se-hadoop-gridftp rpm, the Gratia Transfer Probe will already be installed.

Here are the most relevant file and directory locations:

Purpose Needs Editing? Location
Probe Configuration Yes /etc/gratia/gridftp-transfer/ProbeConfig
Probe Executables No /usr/share/gratia/gridftp-transfer
Log files No /var/log/gratia
Temporary files No /var/lib/gratia/tmp
Gums configuration Yes /etc/gums/gums-client.properties

The RPM installs the Gratia probe into the system crontab, but does not configure it. The configuration of the probe is controlled by the file


This is usually one XML node spread over multiple lines. Note that comments (#) have no effect on this file. You will need to edit the following:

Attribute Needs Editing Value
ProbeName Maybe This should be set to "gridftp-transfer:<hostname>", where <hostname> is the fully-qualified domain name of your gridftp host.
CollectorHost Maybe Set to the hostname and port of the central collector. By default it sends to the OSG collector. See below.
SiteName Yes Set to the resource group name of your site as registered in OIM.
GridftpLogDir Yes Set to /var/log, or wherever your current gridftp logs are located
Grid Maybe Set to "ITB" if this is a test resource; otherwise, leave as OSG.
UserVOMapFile No This should be set to /var/lib/osg/user-vo-map; see below for information about this file.
SuppressUnknownVORecords Maybe Set to 1 to suppress any records that can't be matched to a VO; 0 is strongly recommended.
SuppressNoDNRecords Maybe Set to 1 to suppress records that can't be matched to a DN; 0 is strongly recommended.
EnableProbe Yes Set to 1 to enable the probe.

Selecting a collector host

The collector is the central server which logs the GridFTP transfers into a database. There are usually three options:

  1. OSG Transfer Collector: This is the primary collector for transfers in the OSG. Use CollectorHost="gratia-osg-prod.opensciencegrid.org:80".
  2. OSG-ITB Transfer Collector: This is the test collector for transfers in the OSG. Use CollectorHost=" gratia-osg-itb.opensciencegrid.org:80".
  3. Site local collector: If your site has set up its own collector, then your admin will be able to give you an endpoint to use. Typically, this is along the lines of CollectorHost="collector.example.com:8880".

Note: if you are installing on an itb site, use gratia-osg-itb.opensciencegrid.org instead of "gratia-osg-transfer.opensciencegrid.org* above.

Using GUMS authorization mode

The user-vo-map file is a simple, space-separated format that contains 2 columns; the first is a unix username and the second is the VO which that username correspond to. In order to create it you need to configure the gums client.

The primary configuration file for the gums-client utilities is located in /etc/gums/gums-client.properties. The two properties that you must change are:

Attribute Needs Editing Value
gums.location Yes This should be set to the admin URL for your gums server, usually of the form gums.location=https://GUMS_HOSTNAME:8443/gums/services/GUMSAdmin
gums.authz Yes This should be set to the authorization interface URL for your gums server, usually of the form gums.authz=https://GUMS_HOSTNAME:8443/gums/services/GUMSXACMLAuthorizationServicePort

After the gums client is configured to generate the file run the following once by hand:

[root@client ~]$ gums-host-cron

user-vo-map should be created in the following location:


To have cron regularly update this file start the following service:

[root@client ~]$ service gums-client-cron start

Make sure the UserVOMapFile field is set to this location in


Without user-vo-map , all gridftp transfers will show up as belonging to the VO "Unknown".

Using Gridmap based authorization mode

Note: If you are using this mode for authorization, make sure the files /etc/grid-security/gsi-authz.conf and /etc/grid-security/prima-authz.conf do not exist.

In order to enable generation of grid-mapfile and osg-user-vo-map.txt by using the edg-mkgridmap cron process to get information form VOMS servers do the following:

If you have not installed this package, you will need to run yum install edg-mkgridmap first.


Run the Gratia probe once by hand to check for functionality:

[root@client ~]$ /usr/share/gratia/gridftp-transfer/GridftpTransferProbeDriver

Look for any abnormal termination and report it if it is a non-trivial site issue. Look in the log files in /var/log/gratia/<date>.log and make sure there are no error messages printed.

BeStMan Configuration

BeStManHadoop-specific configuration

BeStMan2 SRM uses the Hadoop FUSE mount to perform namespace operations, such as mkdir, rm, and ls. As per the Hadoop install instructions, edit /etc/sysconfig/hadoop and run service hadoop-firstboot start. It is not necessary (or even recommended) to start any hadoop services with service hadoop start.

Make sure that you modify localPathListAllowed to use the Hadoop mount in /etc/bestman2/conf/bestman2.rc.

Modify /etc/sudoers

Copy certificates to bestman location

BeStMan2 is preconfigured to look for the host certificate and key in /etc/grid-security/bestman/bestman*.pem. Either, these files must exist and be owned by the bestman user, or you must change the settings in bestman2.rc. Note that you must use host certificates here or lcg-utils may experience issues.

Hadoop Storage Probe Configuration

This is only needed by the Hadoop Namenode

Here are the most relevant file and directory locations:

Purpose Needs Editing? Location
Probe Configuration Yes /etc/gratia/hadoop-storage/ProbeConfig
Probe Executable No /usr/share/gratia/hadoop-storage/hadoop_storage_probe
Log files No /var/log/gratia
Temporary files No /var/lib/gratia/tmp

The RPM installs the Gratia probe into the system crontab, but does not configure it. The configuration of the probe is controlled by two files



This is usually one XML node spread over multiple lines. Note that comments (#) have no effect on this file. You will need to edit the following:

Attribute Needs Editing Value
CollectorHost Maybe Set to the hostname and port of the central collector. By default it sends to the OSG collector. You probably do not want to change it.
SiteName Yes Set to the resource group name of your SE as registered in OIM.
Grid Maybe Set to "ITB" if this is a test resource; otherwise, leave as OSG.
EnableProbe Yes Set to 1 to enable the probe.


This file controls which paths in HDFS should be monitored. This is in the Windows INI format.

Note: for the current version of the storage.cfg, there is an error, and you may need to delete the "probe/" subdirectory for the ProbeConfig location

ProbeConfig = /etc/gratia/probe/hadoop-storage/ProbeConfig

For each logical "area" (arbitrarily defined by you), specify both a given name and a list of paths that belong to that area. Unix globs are accepted.

To configure an area named "CMS /store" that monitors the space usage in the paths /user/cms/store/*, one would add the following to the storage.cfg file.

[Area CMS /store]
Name = CMS /store
Path = /user/cms/store/*
Trim = /user/cms

For each such area, add a section to your configuration file.

Example file

Below is a configuration file that includes three distinct areas. Note that you shouldn't have to touch the [Gratia] section if you edited the ProbeConfig above:

gratia_location = /opt/vdt/gratia
ProbeConfig = %(gratia_location)s/probe/hadoop-storage/ProbeConfig

[Area /store]
Name = CMS /store
Path = /store/*

[Area /store/user]
Name = CMS /store/user
Path = /store/user/*

[Area /user]
Name = Hadoop /user
Path = /user/*

*NOTE These lines in the [gratia] section are wrong and need to be changed to the following by hand for now until the rpm is updated:

gratia_location = /etc/gratia
ProbeConfig = %(gratia_location)s/hadoop-storage/ProbeConfig

Running Services


#Starting namenode
service hadoop-hdfs-namenode start
#Stopping namenode
service hadoop-hdfs-namenode stop

Secondary Namenode:

#Starting secondary namenode
service hadoop-hdfs-secondarynamenode start
#Stopping secondary namenode
service hadoop-hdfs-secondarynamenode stop


#Starting namenode
service hadoop-hdfs-datanode start
#Stopping namenode
service hadoop-hdfs-datanode stop


[root@client ~]$ service globus-gridftp-server start

To start Gridftp automatically at boot time

[root@client ~]$ chkconfig globus-gridftp-server on

Stopping GridFTP:

[root@client ~]$ service globus-gridftp-server stop

SRM (BeStMan):


The first thing you may want to do after installing and starting your Namenode is to verify that the web interface works. In your web browser go to:


Get familiar with Hadoop commands. Run hadoop with no arguments to see the list of commands.

[user@client ~]$ hadoop
Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility fetchdt fetch a delegation token from the NameNode jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs queue get information regarding JobQueues version print the version jar run a jar file distcp copy file or directories recursively archive -archiveName NAME -p * create a hadoop archive oiv apply the offline fsimage viewer to an fsimage classpath prints the class path needed to get the Hadoop jar and the required libraries daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.

For a list of supported filesystem commands:

[user@client ~]$ hadoop fs
Usage: java FsShell [-ls ] [-lsr ] [-df []] [-du ] [-dus ] [-count[-q] ] [-mv ] [-cp ] [-rm [-skipTrash] ] [-rmr [-skipTrash] ] [-expunge] [-put ... ] [-copyFromLocal ... ] [-moveFromLocal ... ] [-get [-ignoreCrc] [-crc] ] [-getmerge [addnl]] [-cat ] [-text ] [-copyToLocal [-ignoreCrc] [-crc] ] [-moveToLocal [-crc] ] [-mkdir ] [-setrep [-R] [-w] ] [-touchz ] [-test -[ezd] ] [-stat [format] ] [-tail [-f] ] [-chmod [-R] PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-chgrp [-R] GROUP PATH...] [-help [cmd]] Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a job tracker -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions]

An online guide is also available at Apache Hadoop commands manual. You can use Hadoop commands to perform filesystem operations with more consistency.

Example, to look into the internal hadoop namespace:

[user@client ~]$ hadoop fs -ls /
Found 1 items
drwxrwxr-x   - engage engage          0 2011-07-25 06:32 /engage

Example, to adjust ownership of filesystem areas (there is usually no need to specify the mount itself /mnt/hadoop in Hadoop commands):

[root@client ~]$ hadoop fs -chown -R engage:engage /engage

Example, compare hadoop fs command vs. using FUSE mount:

[user@client ~]$ hadoop fs -ls /engage
Found 3 items
-rw-rw-r--   2 engage engage  733669376 2011-06-15 16:55 /engage/CentOS-5.6-x86_64-LiveCD.iso
-rw-rw-r--   2 engage engage  215387183 2011-06-15 16:28 /engage/condor-7.6.1-x86_rhap_5-stripped.tar.gz
-rw-rw-r--   2 engage engage    9259360 2011-06-15 16:32 /engage/glideinWMS_v2_5_1.tgz

[user@client ~]$ ls -l /mnt/hadoop/engage
total 935855
-rw-rw-r-- 1 engage engage 733669376 Jun 15 16:55 CentOS-5.6-x86_64-LiveCD.iso
-rw-rw-r-- 1 engage engage 215387183 Jun 15 16:28 condor-7.6.1-x86_rhap_5-stripped.tar.gz
-rw-rw-r-- 1 engage engage   9259360 Jun 15 16:32 glideinWMS_v2_5_1.tgz

GridFTP Validation

The commands used to verify GridFTP below assume you have access to a node where you can first generate a valid proxy using voms-proxy-init or grid-proxy-init. Obtaining grid credentials is beyond the scope of this document.

[user@client ~]$ globus-url-copy file:///home/users/jdost/test.txt gsiftp://devg-7.t2.ucsd.edu:2811/mnt/hadoop/engage/test.txt

If you are having troubles running GridFTP refer to Starting GridFTP in Standalone Mode in the Troubleshooting section.

BeStMan Validation

There are three ways of validating BeStMan: * SrmTester: BeStMan testing application * InstallRSV: RSV monitoring tools * BestMan client tools

See the relevant pages for the first two options. This section will detail some basic client commands to validate. You will need grid credentials in order to test using client tools.

srm-ping srm://BeStMan_host:secured_http_port/srm/v2/server
srm-copy file:////tmp/test1  srm://BeStMan_host:secured_http_port/srm/v2/server\?SFN=/mnt/hadoop/VONAME/test_1

The srm-ping tool should return a valid mapping gumsIDMapped that is not null

Installing Hadoop Storage Reports (Optional)

*NOTE the GratiaReporting rpm has not yet been migrated to the new osg repos and this section is subject to change. Please skip this section until this warning goes away or request Nebraska to host your reports

The Hadoop Storage Reports may be installed on any node that has access to a local Gratia Collector

The Hadoop storage reports provides a daily report on the status and usage of your SE. This serves as a handy tool for both site administrators and site executives. An example report is copied at the end of this guide.


  1. A working HDFS installation
  2. A local Gratia Collector installed
  3. A Hadoop Storage Probe installed and configured to point to the local Gratia Collector


[root@client ~]$ yum install GratiaReporting

Updates can be installed with:

[root@client ~]$ yum upgrade GratiaReporting


This RPM uses Linux-standard file locations. Here are the most relevant file and directory locations:

Purpose Needs Editing? Location
Report Configuration Yes /etc/gratia_reporting
Cron template Yes /etc/gratia_reporting/gratia_reporting/gratia_reporting.cron (move to /etc/cron.d)
Logging Configuration No /etc/gratia_reporting/logging.cfg
Log files No /var/log/gratia_reporting.log

Configuration file

Copy the file /etc/gratia_reporting/reporting.cfg to a new filename in /etc/gratia_reporting (for example, /etc/gratia_reporting/reporting_cms.cfg). You will do this once for every report you want to send out.

Attribute Needs Editing Value
SiteName Yes Set to the resource group name of your SE as registered in OIM.
database Maybe Set to the database section containing the login details for your Gratia Collector (a few, non-functioning examples sections are included). Installing a Gratia Collector is covered here, but ask around on osg-hadoop: Nebraska will usually run these reports for you if requested.
toNames Yes Python list for the "to names" for the report email.
toEmails Yes Python list for the "to emails" for the report email.
smtphost Maybe Hostname of a SMTP server that accepts email from this host.
fromName Maybe Set to the "from name" for the report email.
fromEmail Maybe Set to the "from email" for the report email.


Copy the file /etc/gratia_reporting/gratia_reporting.cron to /etc/cron.d. There is one line per report; comment out all except the hadoop report. It is the line containing -n hadoop. Update the line to point at your new configuration file.

This is a sample report from the Nebraska HDFS instance.
  The Hadoop Chronicle | 85 % | 2009-09-25

| Global Storage   |
|                  |  Today  | Yesterday | One Week |
| Total Space (GB) | 311,470 |   357,818 |  368,711 |
| Free Space (GB)  |  47,304 |    93,719 |  128,391 |
| Used Space (GB)  | 264,166 |   264,100 |  240,320 |
| Used Percentage  |     85% |       74% |      65% |
| CMS /store |
|           Path           | Size(GB) | 1 Day Change | 7 Day Change | Remaining | # Files | 1 Day Change | 7 Day Change | Remaining |
| /store/user              |      771 |            0 | UNKNOWN      | NO QUOTA  |   4,859 |            0 | UNKNOWN      | NO QUOTA  |
| /store/mc                |   95,865 |         -353 | UNKNOWN      | NO QUOTA  |  86,830 |         -171 | UNKNOWN      | NO QUOTA  |
| /store/test              |        0 |            0 | UNKNOWN      | NO QUOTA  |     569 |           25 | UNKNOWN      | NO QUOTA  |
| /store/results           |      237 |            0 | UNKNOWN      | NO QUOTA  |     198 |            0 | UNKNOWN      | NO QUOTA  |
| /store/phedex_monarctest |      729 |            0 | UNKNOWN      | NO QUOTA  |     257 |            0 | UNKNOWN      | NO QUOTA  |
| /store/unmerged          |    3,681 |            3 | UNKNOWN      | NO QUOTA  |  35,687 |           23 | UNKNOWN      | NO QUOTA  |
| /store/CSA07             |        0 |            0 | UNKNOWN      | NO QUOTA  |       0 |            0 | UNKNOWN      | NO QUOTA  |
| /store/data              |        0 |            0 | UNKNOWN      | NO QUOTA  |       0 |            0 | UNKNOWN      | NO QUOTA  |
| /store/PhEDEx_LoadTest07 |        0 |          -21 | UNKNOWN      | NO QUOTA  |       1 |          -22 | UNKNOWN      | NO QUOTA  |

| CMS /store/user |
|          Path         | Size(GB) | 1 Day Change | 7 Day Change | Remaining | # Files | 1 Day Change | 7 Day Change | Remaining |
| /store/user/hpi       |        0 |            0 | UNKNOWN      |     1,099 |      15 |            0 | UNKNOWN      |     9,985 |
| /store/user/gattebury |        0 |            0 | UNKNOWN      |     1,100 |       1 |            0 | UNKNOWN      |     9,999 |
| /store/user/mkirn     |        0 |            0 | UNKNOWN      |     1,100 |       3 |            0 | UNKNOWN      |     9,997 |
| /store/user/spadhi    |       12 |            0 | UNKNOWN      |     1,062 |   1,114 |            0 | UNKNOWN      |     8,886 |
| /store/user/creed     |        0 |            0 | UNKNOWN      |     1,100 |       0 |            0 | UNKNOWN      |    10,000 |
| /store/user/rossman   |        0 |            0 | UNKNOWN      |     1,099 |       5 |            0 | UNKNOWN      |     9,995 |
| /store/user/eluiggi   |        0 |            0 | UNKNOWN      |     1,099 |       6 |            0 | UNKNOWN      |     9,994 |
| /store/user/ewv       |        7 |            0 | UNKNOWN      |     1,081 |     284 |            0 | UNKNOWN      |     9,716 |
| /store/user/test      |        0 |            0 | UNKNOWN      | NO QUOTA  |     167 |            0 | UNKNOWN      |     9,833 |
| /store/user/schiefer  |      751 |            0 | UNKNOWN      |     1,044 |   3,264 |            0 | UNKNOWN      |     6,736 |

| Hadoop /user |
|       Path      | Size(GB) | 1 Day Change | 7 Day Change | Remaining | # Files | 1 Day Change | 7 Day Change |    Remaining    |
| /user/djbender  |        0 |            0 | UNKNOWN      | NO QUOTA  |       1 |            0 | UNKNOWN      | NO QUOTA        |
| /user/lhcb      |        0 |            0 | UNKNOWN      |        54 |       0 |            0 | UNKNOWN      | NO QUOTA        |
| /user/dzero     |      897 |            0 | UNKNOWN      |       347 |  89,376 |            0 | UNKNOWN      |         410,624 |
| /user/bloom     |      454 |            0 | UNKNOWN      | NO QUOTA  |   1,410 |            0 | UNKNOWN      | NO QUOTA        |
| /user/uscms01   |  101,384 |         -362 | UNKNOWN      | NO QUOTA  | 129,739 |         -141 | UNKNOWN      | NO QUOTA        |
| /user/cdf       |        0 |            0 | UNKNOWN      | NO QUOTA  |       6 |            0 | UNKNOWN      | 536,870,911,994 |
| /user/osg       |        1 |            0 | UNKNOWN      | NO QUOTA  |       3 |            0 | UNKNOWN      |   5,368,709,117 |
| /user/dweitzel  |       20 |            0 | UNKNOWN      | NO QUOTA  |   2,282 |            0 | UNKNOWN      | NO QUOTA        |
| /user/gattebury |        5 |            0 | UNKNOWN      | NO QUOTA  |  10,002 |            0 | UNKNOWN      | NO QUOTA        |
| /user/brian     |       72 |            0 | UNKNOWN      | NO QUOTA  |   2,697 |            0 | UNKNOWN      | NO QUOTA        |
| /user/usatlas   |        0 |            0 | UNKNOWN      | NO QUOTA  |       0 |            0 | UNKNOWN      | NO QUOTA        |
| /user/powers    |        1 |            1 | UNKNOWN      | NO QUOTA  |     211 |          211 | UNKNOWN      | NO QUOTA        |
| /user/ifisk     |        0 |            0 | UNKNOWN      | NO QUOTA  |       1 |            0 | UNKNOWN      | NO QUOTA        |
| /user/gpn       |      261 |           -5 | UNKNOWN      |     1,360 |   3,805 |            1 | UNKNOWN      |         996,195 |
| /user/engage    |      461 |          367 | UNKNOWN      | NO QUOTA  |      16 |           13 | UNKNOWN      |         999,984 |
| /user/clundst   |        0 |            0 | UNKNOWN      | NO QUOTA  |       6 |            0 | UNKNOWN      | NO QUOTA        |
| /user/che       |        0 |            0 | UNKNOWN      | NO QUOTA  |      13 |            0 | UNKNOWN      | NO QUOTA        |
| /user/store     |        0 |            0 | UNKNOWN      | NO QUOTA  |       0 |            0 | UNKNOWN      | NO QUOTA        |
| /user/dteam     |        0 |            0 | UNKNOWN      |        53 |      18 |            0 | UNKNOWN      | NO QUOTA        |
| /user/root      |        0 |            0 | UNKNOWN      | NO QUOTA  |       1 |            0 | UNKNOWN      | NO QUOTA        |

| FSCK Data |
 Total size:	114592906796932 B (Total open files size: 38923141120 B)
 Total dirs:	41293
 Total files:	295431 (Files currently being written: 38)
 Total blocks (validated):	1356788 (avg. block size 84458962 B) (Total open file blocks (not validated): 297)
 Minimally replicated blocks:	1356788 (100.0 %)
 Over-replicated blocks:	1 (7.370348E-5 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	2.2943976
 Corrupt blocks:		0
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		101
 Number of racks:		1
The filesystem under path '/' is HEALTHY



To view all of the currently configured settings of Hadoop from the web interface, enter the following url in your browser:


You will see the entire configuration in XML format, for example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><!--Loaded from core-default.xml--><name>fs.s3n.impl</name><value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.cache.levels</name><value>2</value></property>
<property><!--Loaded from mapred-default.xml--><name>map.sort.class</name><value>org.apache.hadoop.util.QuickSort</value></property>
<property><!--Loaded from core-site.xml--><name>hadoop.tmp.dir</name><value>/data1/hadoop//scratch</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.native.lib</name><value>true</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.namenode.decommission.nodes.per.interval</name><value>5</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.https.need.client.auth</name><value>false</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.client.idlethreshold</name><value>4000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.system.dir</name><value>${hadoop.tmp.dir}/mapred/system</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.data.dir.perm</name><value>755</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.persist.jobstatus.hours</name><value>0</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.namenode.logging.level</name><value>all</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.address</name><value></value></property>
<property><!--Loaded from core-default.xml--><name>io.skip.checksum.errors</name><value>false</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.block.access.token.enable</name><value>false</value></property>
<property><!--Loaded from Unknown--><name>fs.default.name</name><value>hdfs://nagios.t2.ucsd.edu:9000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.child.tmp</name><value>./tmp</value></property>
<property><!--Loaded from core-default.xml--><name>fs.har.impl.disable.cache</name><value>true</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.skip.reduce.max.skip.groups</name><value>0</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.safemode.threshold.pct</name><value>0.999f</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.heartbeats.in.second</name><value>100</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.namenode.handler.count</name><value>40</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.blockreport.initialDelay</name><value>0</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.instrumentation</name><value>org.apache.hadoop.mapred.JobTrackerMetricsInst</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.dns.nameserver</name><value>default</value></property>
<property><!--Loaded from mapred-default.xml--><name>io.sort.factor</name><value>10</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.timeout</name><value>600000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.max.tracker.failures</name><value>4</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.jobhistory.lru.cache.size</name><value>5</value></property>
<property><!--Loaded from core-default.xml--><name>fs.hdfs.impl</name><value>org.apache.hadoop.hdfs.DistributedFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.skip.map.auto.incr.proc.count</name><value>true</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.block.access.key.update.interval</name><value>600</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.job.complete.cancel.delegation.tokens</name><value>true</value></property>
<property><!--Loaded from core-default.xml--><name>io.mapfile.bloom.size</name><value>1048576</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.reduce.shuffle.connect.timeout</name><value>180000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.safemode.extension</name><value>30000</value></property>
<property><!--Loaded from mapred-site.xml--><name>tasktracker.http.threads</name><value>50</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.shuffle.merge.percent</name><value>0.66</value></property>
<property><!--Loaded from core-default.xml--><name>fs.ftp.impl</name><value>org.apache.hadoop.fs.ftp.FTPFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.output.compress</name><value>false</value></property>
<property><!--Loaded from core-site.xml--><name>io.bytes.per.checksum</name><value>4096</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.healthChecker.script.timeout</name><value>600000</value></property>
<property><!--Loaded from core-default.xml--><name>topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.https.server.keystore.resource</name><value>ssl-server.xml</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.reduce.slowstart.completed.maps</name><value>0.05</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.reduce.max.attempts</name><value>4</value></property>
<property><!--Loaded from core-default.xml--><name>fs.ramfs.impl</name><value>org.apache.hadoop.fs.InMemoryFileSystem</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.block.access.token.lifetime</name><value>600</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.skip.map.max.skip.records</name><value>0</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.name.edits.dir</name><value>${dfs.name.dir}</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.persist.jobstatus.dir</name><value>/jobtracker/jobsInfo</value></property>
<property><!--Loaded from core-site.xml--><name>hadoop.log.dir</name><value>/var/log/hadoop</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.block.size</name><value>134217728</value></property>
<property><!--Loaded from mapred-default.xml--><name>job.end.retry.attempts</name><value>0</value></property>
<property><!--Loaded from core-default.xml--><name>fs.file.impl</name><value>org.apache.hadoop.fs.LocalFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.output.compression.type</name><value>RECORD</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.local.dir.minspacestart</name><value>0</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.ipc.address</name><value></value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.permissions</name><value>true</value></property>
<property><!--Loaded from core-default.xml--><name>topology.script.number.args</name><value>100</value></property>
<property><!--Loaded from core-default.xml--><name>io.mapfile.bloom.error.rate</name><value>0.005</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.max.tracker.blacklists</name><value>4</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.profile.maps</name><value>0-2</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.https.address</name><value></value></property>
<property><!--Loaded from core-site.xml--><name>dfs.umaskmode</name><value>002</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.userlog.retain.hours</name><value>24</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.secondary.http.address</name><value>gratia-1:50090</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.replication.max</name><value>32</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.persist.jobstatus.active</name><value>false</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.security.authorization</name><value>false</value></property>
<property><!--Loaded from core-default.xml--><name>local.cache.size</name><value>10737418240</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.min.split.size</name><value>0</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.namenode.delegation.token.renew-interval</name><value>86400000</value></property>
<property><!--Loaded from mapred-site.xml--><name>mapred.map.tasks</name><value>7919</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.child.java.opts</name><value>-Xmx200m</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.https.client.keystore.resource</name><value>ssl-client.xml</value></property>
<property><!--Loaded from Unknown--><name>dfs.namenode.startup</name><value>REGULAR</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.queue.name</name><value>default</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.retiredjobs.cache.size</name><value>1000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.https.address</name><value></value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.balance.bandwidthPerSec</name><value>2000000000</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.server.listen.queue.size</name><value>128</value></property>
<property><!--Loaded from mapred-default.xml--><name>job.end.retry.interval</name><value>30000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.inmem.merge.threshold</name><value>1000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.skip.attempts.to.start.skipping</name><value>2</value></property>
<property><!--Loaded from hdfs-site.xml--><name>fs.checkpoint.dir</name><value>/var/hadoop/checkpoint-a</value></property>
<property><!--Loaded from mapred-site.xml--><name>mapred.reduce.tasks</name><value>1543</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.merge.recordsBeforeProgress</name><value>10000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.userlog.limit.kb</name><value>0</value></property>
<property><!--Loaded from core-default.xml--><name>webinterface.private.actions</name><value>false</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.max.objects</name><value>0</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.shuffle.input.buffer.percent</name><value>0.70</value></property>
<property><!--Loaded from mapred-default.xml--><name>io.sort.spill.percent</name><value>0.80</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.map.tasks.speculative.execution</name><value>true</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.util.hash.type</name><value>murmur</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.dns.nameserver</name><value>default</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.blockreport.intervalMsec</name><value>3600000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.map.max.attempts</name><value>4</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.job.acl-view-job</name><value> </value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.handler.count</name><value>10</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.client.block.write.retries</name><value>3</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.max.reduces.per.node</name><value>-1</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.reduce.shuffle.read.timeout</name><value>180000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.expiry.interval</name><value>600000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.https.enable</name><value>false</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.maxtasks.per.job</name><value>-1</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.job.history.block.size</name><value>3145728</value></property>
<property><!--Loaded from mapred-default.xml--><name>keep.failed.task.files</name><value>false</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.failed.volumes.tolerated</name><value>0</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.profile.reduces</name><value>0-2</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.client.tcpnodelay</name><value>false</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.output.compression.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value></property>
<property><!--Loaded from mapred-default.xml--><name>io.map.index.skip</name><value>0</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.server.tcpnodelay</name><value>false</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.namenode.delegation.key.update-interval</name><value>86400000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.running.map.limit</name><value>-1</value></property>
<property><!--Loaded from mapred-default.xml--><name>jobclient.progress.monitor.poll.interval</name><value>1000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.default.chunk.view.size</name><value>32768</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.logfile.size</name><value>10000000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.reduce.tasks.speculative.execution</name><value>true</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.tasktracker.outofband.heartbeat</name><value>false</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3n.block.size</name><value>67108864</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.datanode.du.reserved</name><value>10000000000</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.security.authentication</name><value>simple</value></property>
<property><!--Loaded from hdfs-site.xml--><name>fs.checkpoint.period</name><value>3600</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.running.reduce.limit</name><value>-1</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.reuse.jvm.num.tasks</name><value>1</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.web.ugi</name><value>webuser,webgroup</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.completeuserjobs.maximum</name><value>100</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.df.interval</name><value>60000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.tracker.task-controller</name><value>org.apache.hadoop.mapred.DefaultTaskController</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.data.dir</name><value>/data1/hadoop//data</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3.maxRetries</name><value>4</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.dns.interface</name><value>default</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.support.append</name><value>true</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.job.acl-modify-job</name><value> </value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.local.dir</name><value>${hadoop.tmp.dir}/mapred/local</value></property>
<property><!--Loaded from core-default.xml--><name>fs.hftp.impl</name><value>org.apache.hadoop.hdfs.HftpFileSystem</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.permissions.supergroup</name><value>root</value></property>
<property><!--Loaded from core-default.xml--><name>fs.trash.interval</name><value>0</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3.sleepTimeSeconds</name><value>10</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.submit.replication</name><value>10</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.replication.min</name><value>1</value></property>
<property><!--Loaded from core-default.xml--><name>fs.har.impl</name><value>org.apache.hadoop.fs.HarFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.map.output.compression.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.dns.interface</name><value>default</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.namenode.decommission.interval</name><value>30</value></property>
<property><!--Loaded from Unknown--><name>dfs.http.address</name><value>nagios:50070</value></property>
<property><!--Loaded from mapred-site.xml--><name>mapred.job.tracker</name><value>nagios:9000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.heartbeat.interval</name><value>3</value></property>
<property><!--Loaded from core-default.xml--><name>io.seqfile.sorter.recordlimit</name><value>1000000</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.name.dir</name><value>${hadoop.tmp.dir}/dfs/name</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.line.input.format.linespermap</name><value>1</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.taskScheduler</name><value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.instrumentation</name><value>org.apache.hadoop.mapred.TaskTrackerMetricsInst</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.datanode.http.address</name><value></value></property>
<property><!--Loaded from mapred-default.xml--><name>jobclient.completion.poll.interval</name><value>5000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.max.maps.per.node</name><value>-1</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.local.dir.minspacekill</name><value>0</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.replication.interval</name><value>3</value></property>
<property><!--Loaded from mapred-default.xml--><name>io.sort.record.percent</name><value>0.05</value></property>
<property><!--Loaded from core-default.xml--><name>fs.kfs.impl</name><value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.temp.dir</name><value>${hadoop.tmp.dir}/mapred/temp</value></property>
<property><!--Loaded from mapred-site.xml--><name>mapred.tasktracker.reduce.tasks.maximum</name><value>4</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.replication</name><value>2</value></property>
<property><!--Loaded from core-default.xml--><name>fs.checkpoint.edits.dir</name><value>${fs.checkpoint.dir}</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.tasks.sleeptime-before-sigkill</name><value>5000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.reduce.input.buffer.percent</name><value>0.0</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.indexcache.mb</name><value>10</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.job.split.metainfo.maxsize</name><value>10000000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.skip.reduce.auto.incr.proc.count</name><value>true</value></property>
<property><!--Loaded from core-default.xml--><name>hadoop.logfile.count</name><value>10</value></property>
<property><!--Loaded from core-default.xml--><name>fs.automatic.close</name><value>true</value></property>
<property><!--Loaded from core-default.xml--><name>io.seqfile.compress.blocksize</name><value>1000000</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.hosts.exclude</name><value>/etc/hadoop-0.20/conf/hosts_exclude</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3.block.size</name><value>67108864</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.tasktracker.taskmemorymanager.monitoring-interval</name><value>5000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.acls.enabled</name><value>false</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapreduce.jobtracker.staging.root.dir</name><value>${hadoop.tmp.dir}/mapred/staging</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.queue.names</name><value>default</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.access.time.precision</name><value>3600000</value></property>
<property><!--Loaded from core-default.xml--><name>fs.hsftp.impl</name><value>org.apache.hadoop.hdfs.HsftpFileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.tracker.http.address</name><value></value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.reduce.parallel.copies</name><value>5</value></property>
<property><!--Loaded from core-default.xml--><name>io.seqfile.lazydecompress</name><value>true</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.safemode.min.datanodes</name><value>0</value></property>
<property><!--Loaded from mapred-default.xml--><name>io.sort.mb</name><value>100</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.client.connection.maxidletime</name><value>10000</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.compress.map.output</name><value>false</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.tracker.report.address</name><value></value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.healthChecker.interval</name><value>60000</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.client.kill.max</name><value>10</value></property>
<property><!--Loaded from core-default.xml--><name>ipc.client.connect.max.retries</name><value>10</value></property>
<property><!--Loaded from core-default.xml--><name>fs.s3.impl</name><value>org.apache.hadoop.fs.s3.S3FileSystem</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.job.tracker.http.address</name><value></value></property>
<property><!--Loaded from core-default.xml--><name>io.file.buffer.size</name><value>4096</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.jobtracker.restart.recover</name><value>false</value></property>
<property><!--Loaded from core-default.xml--><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.task.profile</name><value>false</value></property>
<property><!--Loaded from hdfs-site.xml--><name>dfs.datanode.handler.count</name><value>10</value></property>
<property><!--Loaded from mapred-default.xml--><name>mapred.reduce.copy.backoff</name><value>300</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.replication.considerLoad</name><value>true</value></property>
<property><!--Loaded from mapred-default.xml--><name>jobclient.output.filter</name><value>FAILED</value></property>
<property><!--Loaded from hdfs-default.xml--><name>dfs.namenode.delegation.token.max-lifetime</name><value>604800000</value></property>
<property><!--Loaded from mapred-site.xml--><name>mapred.tasktracker.map.tasks.maximum</name><value>4</value></property>
<property><!--Loaded from core-default.xml--><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property>
<property><!--Loaded from core-default.xml--><name>fs.checkpoint.size</name><value>67108864</value></property>

Please refer to OSG Hadoop debug webpage and Apache Hadoop FAQ webpage for answers to common questions/concerns


Notes on Building a FUSE Module

If you are running a custom kernel, then be sure to enable the fuse module with CONFIG_FUSE_FS=m in your kernel config. Building and installing a fuse kernel module for your custom kernel is beyond the scope of this document.

Note: If you cannot find a fuse kernel module to match your kernel, ATRPMs has a guide for using their RPM spec files in order to generate a module. That page mostly works, although sections are a bit out dated. Contact the osg-hadoop@opensciencegrid.org list if you need help.

Running FUSE in Debug Mode

To start the FUSE mount in debug mode, you can run the FUSE mount command by hand:

[root@client ~]$  /usr/bin/hadoop-fuse-dfs  /mnt/hadoop -o rw,server=namenode.host,port=9000,rdbuffer=131072,allow_other -d

Debug output will be printed to stderr, which you will probably want to redirect to a file. Most FUSE-related problems can be tackled by reading through the stderr and looking for error messages.


Starting GridFTP in Standalone Mode

If you would like to test the gridftp-hdfs server in a debug standalone mode, you can run the command:

[root@client ~]$ gridftp-hdfs-standalone

The standalone server runs on port 5002, handles a single GridFTP request, and will log output to stdout/stderr.

File Locations

Component File Type Location Needs editing?
Hadoop Log files /var/log/hadoop/* No
Hadoop PID files /var/run/hadoop/*.pid No
Hadoop init scripts /etc/init.d/hadoop No
Hadoop init script config file /etc/sysconfig/hadoop Yes
Hadoop runtime config files /etc/hadoop/conf/* Maybe
Hadoop System binaries /usr/bin/hadoop No
Hadoop JARs /usr/lib/hadoop/* No
Hadoop runtime config files /etc/hosts_exclude Yes, must be present on namenodes
GridFTP Log files /var/log/gridftp-auth.log, /var/log/gridftp.log No
GridFTP init.d script /etc/init.d/globus-gridftp-server No
GridFTP runtime config files /etc/gridftp-hdfs/*, /etc/sysconfig/gridftp-hdfs Maybe
GridFTP System binaries /usr/bin/gridftp-hdfs-standalone, /usr/sbin/globus-gridftp-server No
GridFTP System libraries /usr/lib64/libglobus_gridftp_server_hdfs.so* No
GridFTP GUMS client (called by LCMAPS) configuration /etc/lcmaps.db Yes
GridFTP CA certificates /etc/grid-security/certificates/* No

Known Issues


You may need to change the following line in /usr/share/gridftp-hdfs/gridftp-hdfs-environment:


copyFromLocal java IOException

When trying to copy a local file into Hadoop you may come across the following java exception:

11/06/24 11:10:50 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 11/06/24 11:10:50 WARN hdfs.DFSClient: Could not get block locations. Source file "/osg/ddd" - Aborting... copyFromLocal: java.io.IOException: File /osg/ddd could only be replicated to 0 nodes, instead of 1 11/06/24 11:10:50 ERROR hdfs.DFSClient: Exception closing file /osg/ddd : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /osg/ddd could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1415) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:588) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)

This can occur if you try to install a Datanode on a machine with less than 10GB of disk space available. This can be changed by lowering the value of the following property in /usr/lib/hadoop-0.20/conf/hdfs-site.xml:


Hadoop always requires this amount of disk space to be available for non-hdfs usage on the machine.

How to get Help?

If you cannot resolve the problem, there are several ways to receive help:

For a full set of help options, see Help Procedure.




Topic revision: r27 - 05 Jun 2017 - 20:14:25 - BrianLin
Hello, TWikiGuest!


TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..