NEES Requirements for Data Archiving

Introduction

Network for Earthquake Engineering Simulation (NEES) is providing the tools for researcher to learn how earthquakes and tsunami impact the buildings, bridges, utility systems and other critical components. NEES is a shared national network of 15* experimental facilities, collaborative tools, a centralized data repository, and earthquake simulation software. The center of all NEESit services is located at Purdue University. There are currently 6 Terabytes of data stored at Purdue and this is expected to grow to ~30 Terabytes over the next 2 years. It is essential to provide backup and disaster recovery for this data.

This will be an OSG activity supported by Alain Roy and Tanya Levshina in collaboration with Rudi Eigenmann and Thomas Hacker (NEES).We will be working on providing a proof-of-concept procedure that will allow to move data from Purdue to Fermi (tape) and Wisconsin(disk) and upload files back to Purdue in order to meet the needs of the NEES sites.

The deadline for the proof-of-concept is set at April 1st, 2010. Following this a discussion will be arranged across all parties for lessons learned and possible next steps.

The code developed will be maintained in the VDT svn/cvs repository.

Should we be asking about compression?

Planning steps

The task of archiving NEES data at Fermilab Enstore HSM entails involvement of many admin groups as well as multiple steps that are required from users which include getting Fermi principal, requesting Enstore tapes, etc. The list below describes the steps that were done before the test tape was allocated in Ferrmi Enstore HSM and was ready for archiving.

  • The actions done by User (Tom)
    • Request Fermilab account
    • Request tape allocation in Enstore
    • Installed vdt client on the node at Purdue (Tanya)
    • Tested the file transfer and upload
  • The actions done by Representative (Ruth)
    • Approve user's request
  • The actions done by GUMS/VOMRS/VOMS admin (FermiGrid?)
    • Create a special group nees at fermilab VO
    • Assign Tom and Tanya to the group nees
    • Create uid, gid "nees"
    • Map in GUMS group /fermilab/nees to user "nees" * The actions done by dCache/Enstore admin
    • configure a special storage group "nees"
    • configure dcache to map users with proxy attribute "/fermilab/nees" to a particular pnfs path (/pnfs/fnal.gov/usr/nees)
    • allocate 10 GB on tape for this storage group

The next steps could be divided into independent tasks that could be done in parallel.

  1. Negotiation with Enstore management to provide and configure tapes for production archival (Responsible: Ruth)
  2. Creation of the archiving tool (Responsible: Tanya, Tom)
  3. Installation of condor DAG at Purdue (Responsible: Alain, Tom)
  4. Creation of the file packaging tool that split archiving area into set of manageable size files (10Gb - 20Gb) ready for archiving ( (Responsible: Alain, Tom)
  5. Create DAG to execute the file packaging and archiving tool (Responsible: Alain, Tom)
  6. Create archive verification tool that verify presence of each archived file on tape and check checksum against local record (Responsible: Tanya, Tom)
  7. Create DAG to execute archive verification tool (Responsible: Alain, Tom)
  8. Create retrieval tool that allows to upload any archived file from tapes (Responsible: Tanya, Tom)
  9. Provide bookkeeping framework that allow to log all the the previous step in machine readable format (Responsible: Tanya, Tom)
  10. Provide tools to generate reports from the logs (Responsible: Tanya, Tom)
  11. Negotiation with Enstore management to provide "yearly" report of tapes condition and statistic (Responsible: Ruth)

Task Name Responsible Design Development Integration Testing
Start Date End Date Effort Start Date End Date Effort Start Date End Date Effort Start Date End Date Effort
Neg. Enstore Management Ruth                        
Archiver Tanya, Tom 02/15/10 02/19/10 2days 02/19/2010 02/25/10 3days 02/26/10   3 days      
Retrieval Tool   02/15/10 02/19/10 2days 02/19/2010 02/25/10 3days 02/26/10   3 days      
Verificator   02/15/10 02/19/10 2days 02/19/2010 02/25/10 3days 02/26/10 3 days      
Report Generator   02/24/10 03/02/10 2 days                  
FilePacker Alain, Tanya, Tom     2 days?     ask after design. Multivolume tar script has been written by M. Crawford and is ready for integration.            
Archiver DAG       1-2 days     < 1 week            
Verificator DAG       1-2 days     < 1 week            
Condor Installation       1 day     1 day            

Negotiation with Enstore management

The allocation and usage of tapes for NEES can be viewed at http://www-stken.fnal.gov/enstore/tape_inventory/VOLUME_QUOTAS This shows that as of 2/17/10 4.5 Gigabytes is used and a Quota of 10 tapes are available to be allocated as tapes are written and filled. Here is the current process in place by the Enstore administration to ensure DataIntegrity of the data on tape

As part of this proof of concept we will prototype the aspects of a TapeArchiveSLA?.

Storage Tools

Various tools are needed in order to automate the process of storing , verifying and retrieving data from Storage. The brief description of proposed tools are listed below. The design document for Storage Tools could be found here. The draft of the current installation procedure at Purdue could be found here. Alan's notes on his work are at NEESStorageToolsAlanNotes.

Archiving Tool

Purpose: Archiving tool is needed to perform transfer of the local file to Fermi Enstore HSM.

The tool will be executed with user proxy certificate. A user should be a member of Fermilab VO and be be a member of "/fermilab/nees" group. The tool will accept file name as a command line argument. It will perform the following tasks:

  • verify the file existence and permission
  • calculate adler32 checksum for the local file (adler32 is the only checksum supported by Enstore)
  • log: date, file name, checksum, size
  • copy file to dCache
  • log: date, file name,status of transfer
  • verify that file is in dcache and checksum
  • log: date, file name, locality of the file, size, checksum

Archive Verification Tool

There is some time delay between the time when file that is stored in dcache (disk) and the time this file is transferred to a tape. This delay depends on Enstore configuration. That is why the independent tool is needed that a file has been archived to tape successfully.

Purpose: Archive Verification tool is needed to verify that a specific archived test is on tape and has a right checksum.

The tool will be executed with user proxy certificate. A user should be a member of [[https://vomrs.fnal.gov:8443/vomrs/vo-fermilab/vomrsFermilab VO? and be be a member of "/fermilab/nees" group. The tool will accept file name and file local checksum as command line arguments. It will perform the following tasks:

  • verify the file exists in dCache and is on tape.
  • compare checksums (local vs dcache)
  • log: date, file, local checksum, remote checksum, status

Retrieval Tool

Purpose: Retrieval tool is needed to retrieve file from tape.

The tool will be executed with user proxy certificate. A user should be a member of Fermilab VO and be be a member of "/fermilab/nees" group. The tool will accept file name as a command line argument. It will perform the following tasks:

  • retrieve file form tape
  • log: date, file, status

Bookkeeping Framework

The proposal is to follow Grid Logging: Best Practices Guide and use NetLogger API to perform all logging tasks. This will provide:
  • Consistently structured, typed, log events
  • A standard high-resolution timestamp
  • Use of logging levels and categories to separate logs by detail and purpose.
  • Consistent use of global and local identifiers.
  • Use of some regular, newline-delimited ASCII text format

This approach will meet some basic practical requirements:

  • human "readable" in a terminal or text editor
  • directly compatible with syslog and grep

Reporting Tools

The following reports generated from logs seems to be useful:
  • list of files that have been attempted to archive during a particular period of time
  • list of failed attempts to archive files
  • list of files that are successfully archived
  • list of files currently on tapes
  • list of files that are in dCache but not on tape
  • list of corrupted files on tape
  • time interval needed for file archiving

One can also make use of the Enstore online monitoring tools at http://www-stken.fnal.gov/enstore/enstore_system.html for checking and consistency.

Condor Installation at Purdue

File Packaging Tool

DAG for Packaging and Archival

GNU Tar V1.22 allows an option to create a multi-volume/file archive with a certain volume size limit A callout script is invoked by tar to change volumes. A script will be written to modify the file name so that a multi-volume file archive of the size best for the tape archiving system will be created. Files can be retrieved from such multi-volume archives. The Fermilab storage group will write the call out script.

DAG for Archive Verification Tool

Integration and Deployment

The proposal is to have an integration and deployment meeting at Fermilab on 16th and 17th March.

Alain's notes from initial planning discussion

Archiving for NEES, From Purdue to Fermilab, initially. Goes to tape. May also do Purdue to UW later, but that would be disk.

Need test account (at Fermilab) so we can verify that we can do it.

COPY_FERMI(File, Credential)
   0. Begin Transaction
   1. Checksum File
   2. Copy to Fermilab(File, Credential)
   ((3. Checksum remote file))
   3. Verify file is at Fermi correctly (but may not be on tape)
   4. Bookkeeping at Purdue
   4a. Bookkeeping needs to be machine-readable, so we can verify file existence and checksum later.
   5. End Transaction

Tanya & Alain are contacts for this work. Miron will give some UW effort to run this as a DAG.

Ruth has ownership of document and will get back to NEES.

Need full backup by end of March.

Good to know earlier that the file is transferred, before the checksum is created.

On top of COPY_FERMI, might need to pass in directory, and "tar" it up.

Tanya will be gone March 5-12, March 29-April 2.

Tanya's obligations:

  • Steps 1-4 above
  • Provide tool that takes file name, verifies it is at Fermilab and uncorrupted
Alain/Condor teams obligations:
  • Tool to create DAG to copy one or more files with Tanya's tools
  • Tool to decide how to copy set of files/directories.
Tom/Alain/Tanyas obligations

  • Install scripts at Purdue
  • Integrate and check the end-to-end archiving proof-of-concept
  • Run and validate retrieval.

Protected BackgroundDocuments - see Alain, Tanya, Ruth to be added to see these.

Topic revision: r14 - 19 Mar 2010 - 21:41:39 - AlanDeSmet
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..