Gratia Service Operation Guide
List of services operated by FermiGrid
As seen in the production machine deployment description
, the OSG and FNAL-local Gratia services each consist of a pair of machines each with 6 VMs, the "collector" machine and the "reporter" machine. The collector VMs use the collector DB and the reporter VMs the reporter DB, which is slaved from the collector DB using MySQL replication. The Gratia services on the reporting VMs consist of the reporting services only
: no data collection services are installed on these nodes. Similarly, under normal conditions the collector VMs do not have any reporting services. Any requests for reports directed at the collector services are redirected by tomcat to the correct reporting service.
The IP by which the reporting services access the reporting DB is handled by heartbeat and may be failed over to the collector DB if necessary; the converse is not
true. Similarly, heartbeat can be used to fail an individual reporting service over to be managed by the corresponding collector service: the collector service is restarted by hibernate in a mode that serves the reports directly rather than redirecting them to the reporting VM. Again, the converse is not
Installation Guide (FNAL Specific)
All the Collector and Reporting service log files are located in
. The main log files are as follows:
catalina.out: contains tomcat messages emitted outside the auspices of log4j (before its initialization for instance) or messages sent directly to
stdout. This file is not rotated.
messages.log: contains all messages not otherwise directed by the log4j configuration file or Gratia's logging system (controlled by log4j config file).
hibernate.log: contains all messages emitted by hibernate (controlled by log4j config file).
gratia.log: Messages from the record processor code, including details of records saved, errors found processing records any information about clean-up or replication.
glite-security-trustmanager.log: messages from the authorization infrastructure (controlled by the log4j config file).
gratia-administration.log: messages from the administration service.
gratia-registration.log: messages from registration activities.
gratia-reporting.log: All messages emitted by the Gratia reporting infrastructure (as opposed to BIRT messages, which appear in
gratia-rmi-servlet.log: messages from the servlets receiving information from remote probes.
access/access.log: tomcat access log (controlled by tomcat valve configuration in
Unless otherwise specified, logs are rotated daily and are controlled by the relevant clause in
, the main Gratia configuration file.
Guide to monitoring a collector
In case a problem need to be investigate in detailed by the developer the following information can be of use:
- Content of
- Content of
$CATALINA_HOME/gratia: the sub-diretory $CATALINA_HOME/gratia/data contains information about the incoming data
$CATALINA_HOME/gratia/data/thread? contains the yet to be processed incoming message
$CATALINA_HOME/gratia/data/old-* contains a copy of all the already processed messages, including duplicates and error case
$CATALINA_HOME/gratia/data/history-* contains a marked up copy of all the successfully processed records.
Backups are made of several different aspects of the Gratia system:
- Daily backups of XML files from
gratia/data/ via rsync (
~gratia/gratia-ops/backup/ddback/ddback.cron). This is required on collector machines only.
- Installation is from RPM (
~gratia/gratia-ops/backup/ddback/RPMS): version >=1.0.2 is required.
- ddback project home page is on SourceForge.
- To configure, run
~gratia/gratia-ops/backup/update-ddback-xml on the target collector machine.=
- Monthly backups of XML files to Enstore via encp (
~gratia/gratia-ops/backup/gratia-archive.sh). Again, required on collector machines only. Machine should have access to a UPS installation of
encp and have enstore write permissions. Invocation of the
gratia-archive.sh file should be in root's
- Daily backups of the DB schemata via ZRM (
- XML files are sent to
splunk hourly (
~gratia/gratia-ops/splunk/*). See Neha for installation details.
- Gratia Sourceforge code base is backed up nightly to
~gratia/gratia-ops/backup/svn-backup), installed in gratia user's
Note that the
area is a checked-out copy of
Each database should be optimized regularly to maintain query and insert efficiency in the face of serious churn due to housekeeping activities. The configuration file for each DB instance is to be found as
. Create a new one as necessary using an existing one as template.
An example line:
07 11 * 1-12/3 6 ~gratia/gratia-ops/gratia-optimize-db -v ~gratia/gratia-ops/XXXXXX_optimize_commands > /var/log/gratia_optimize.log 2>&1
Note that with the above line, the script is run every Saturday of the first month of each quota. Whether the optimization actually occurs on any given invocation may be further limited by shell code in the configuration file, for example:
(( day_of_week = `date +%w` ))
(( month = `date +%_m` ))
(( day_of_month = `date +%_d` ))
# Only optimize on the 3rd Saturday of the month.
if (( month % 3 == 1 )) && \
(( day_of_week == 6 )) && \
(( day_of_month / 7 == 2 )); then
run_db_optimize gratia gratia-osg-prod.opensciencegrid.org 443
run_db_optimize gratia_osg_daily gratia-osg-daily.opensciencegrid.org 443
run_db_optimize gratia_osg_transfer gratia-osg-transfer.opensciencegrid.org 443
run_db_optimize gratia_itb gratia-osg-itb.opensciencegrid.org 443
See also the general installation notes
- 11 Jul 2008