Gratia Service Operation Guide

List of Services Operated by FermiGrid.

List of services operated by FermiGrid.

Operational Architecture Overview.

As seen in the production machine deployment description, the OSG and FNAL-local Gratia services each consist of a pair of machines each with 6 VMs, the "collector" machine and the "reporter" machine. The collector VMs use the collector DB and the reporter VMs the reporter DB, which is slaved from the collector DB using MySQL replication. The Gratia services on the reporting VMs consist of the reporting services only: no data collection services are installed on these nodes. Similarly, under normal conditions the collector VMs do not have any reporting services. Any requests for reports directed at the collector services are redirected by tomcat to the correct reporting service.

HA operation.

The IP by which the reporting services access the reporting DB is handled by heartbeat and may be failed over to the collector DB if necessary; the converse is not true. Similarly, heartbeat can be used to fail an individual reporting service over to be managed by the corresponding collector service: the collector service is restarted by hibernate in a mode that serves the reports directly rather than redirecting them to the reporting VM. Again, the converse is not true.

Installation / upgrade guide.

Installation Guide (FNAL Specific).

Log file description.

All the Collector and Reporting service log files are located in $CATALINA_HOME/logs. The main log files are as follows:

  • catalina.out: contains tomcat messages emitted outside the auspices of log4j (before its initialization for instance) or messages sent directly to stdout. This file is not rotated.
  • messages.log: contains all messages not otherwise directed by the log4j configuration file or Gratia's logging system (controlled by log4j config file).
  • hibernate.log: contains all messages emitted by hibernate (controlled by log4j config file).
  • gratia.log: Messages from the record processor code, including details of records saved, errors found processing records any information about clean-up or replication.
  • glite-security-trustmanager.log: messages from the authorization infrastructure (controlled by the log4j config file).
  • gratia-administration.log: messages from the administration service.
  • gratia-registration.log: messages from registration activities.
  • gratia-reporting.log: All messages emitted by the Gratia reporting infrastructure (as opposed to BIRT messages, which appear in catalina.out).
  • gratia-rmi-servlet.log: messages from the servlets receiving information from remote probes.
  • access/access.log: tomcat access log (controlled by tomcat valve configuration in conf/server.xml).

Unless otherwise specified, logs are rotated daily and are controlled by the relevant clause in service-configuration.properties, the main Gratia configuration file.

Monitoring Gratia.

Guide to monitoring a collector.

Incident Reporting

In case a problem need to be investigate in detailed by the developer the following information can be of use:

  • Content of $CATALINA_HOME/logs
  • Content of $CATALINA_HOME/gratia: the sub-diretory $CATALINA_HOME/gratia/data contains information about the incoming data
    • $CATALINA_HOME/gratia/data/thread? contains the yet to be processed incoming message
    • $CATALINA_HOME/gratia/data/old-* contains a copy of all the already processed messages, including duplicates and error case
    • $CATALINA_HOME/gratia/data/history-* contains a marked up copy of all the successfully processed records.

Some service-related activities (common or otherwise).

Information on backups.

Backups are made of several different aspects of the Gratia system:

  • Daily backups of XML files from gratia/data/ via rsync (~gratia/gratia-ops/backup/ddback/ddback.cron). This is required on collector machines only.
    • Installation is from RPM (~gratia/gratia-ops/backup/ddback/RPMS): version >=1.0.2 is required.
    • ddback project home page is on SourceForge.
    • To configure, run ~gratia/gratia-ops/backup/update-ddback-xml on the target collector machine.=
  • Monthly backups of XML files to Enstore via encp (~gratia/gratia-ops/backup/gratia-archive.sh). Again, required on collector machines only. Machine should have access to a UPS installation of encp and have enstore write permissions. Invocation of the gratia-archive.sh file should be in root's crontab.
  • Daily backups of the DB schemata via ZRM (~gratia/gratia-ops/backup/gratia_backup.cron.sh).
  • XML files are sent to splunk hourly (~gratia/gratia-ops/splunk/*). See Neha for installation details.
  • Gratia Sourceforge code base is backed up nightly to gr6x3:/data/svn-backup (~gratia/gratia-ops/backup/svn-backup), installed in gratia user's crontab on gr6x3.

Note that the ~gratia/gratia-ops area is a checked-out copy of :ext:cvsuser@cdcvs.fnal.gov:/cvs/cd/fermigrid/gratia-ops.

Schema optimization.

Each database should be optimized regularly to maintain query and insert efficiency in the face of serious churn due to housekeeping activities. The configuration file for each DB instance is to be found as ~gratia/gratia-ops/gratia-optimize-db/XXXXXX_optimize_commands. Create a new one as necessary using an existing one as template.

An example line:

07 11 * 1-12/3 6 ~gratia/gratia-ops/gratia-optimize-db -v ~gratia/gratia-ops/XXXXXX_optimize_commands > /var/log/gratia_optimize.log 2>&1

Note that with the above line, the script is run every Saturday of the first month of each quota. Whether the optimization actually occurs on any given invocation may be further limited by shell code in the configuration file, for example:

MAILTO="gratia-operation@fnal.gov"
(( day_of_week = `date +%w` ))
(( month = `date +%_m` ))
(( day_of_month = `date +%_d` ))
# Only optimize on the 3rd Saturday of the month.
if (( month % 3 == 1 )) && \
   (( day_of_week == 6 )) && \
   (( day_of_month / 7 == 2 )); then
  run_db_optimize gratia gratia-osg-prod.opensciencegrid.org 443
  run_db_optimize gratia_osg_daily gratia-osg-daily.opensciencegrid.org 443
  run_db_optimize gratia_osg_transfer gratia-osg-transfer.opensciencegrid.org 443
  run_db_optimize gratia_itb gratia-osg-itb.opensciencegrid.org 443
fi

Additional Information.

See also the general installation notes

-- ChrisGreen - 11 Jul 2008

Topic revision: r8 - 04 Aug 2010 - 15:39:48 - ChrisGreen
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..