Gratia RSV Monitoring Probe

This is a probe written for RSV that examines the Gratia probes on the system and ascertains whether they are reporting properly and can provide some diagnostics if problems are found.

Probe Operation

Under normal circumstances this probe is packaged with the RSV system and is installed on a CE along with all the other RSV probes. It will be executed regularly and its results posted along with those of other probes. It may also be invoked manually and can give better diagnostics if run as root.

Test Description Probe Possible results Root required?
Check crontab entry. ALL CRITICAL
UNKNOWN (if non-root)
Y
Check Enabled attribute in ProbeConfig. ALL CRITICAL N
Check for generic MeterName in ProbeConfig. ALL CRITICAL N
Check permissions on DataFolder (should be 04177 for batch probes) PBS, LSF, Condor, SGE CRITICAL N
Check for missing or multiple `:' in MeterName. ALL WARNING N
Check for generic SiteName in ProbeConfig. ALL WARNING N
Verify condor_config has PER_JOB_HISTORY_DIR set correctly. Condor WARNING
UNKNOWN (can't run condor_config_val)
N
Can urCollector config module be loaded? PBS, LSF CRITICAL N
Can urCollector configuration file be loaded? PBS, LSF CRITICAL N
Check LRMS type setting in urCollector.conf is recognized. PBS, LSF CRITICAL N
Check LRMS type stetting matches probe type. PBS, LSF CRITICAL N
Check PBS log directory exists. PBS CRITICAL
WARNING (if no entries for current day)
Y
Check LSF log directory and lsb.events file exists. LSF CRITICAL
WARNING (if no lsb.events file)
Y
Check files waiting to be sent from <workdir>/urCollector. PBS, LSF WARNING (if > 2000 files) Y
Check files waiting to be re-sent from <workdir>/gratiafiles after communication problem with collector. ALL WARNING (if > 1000 records
including those archived in files; or if at least
1 file is >7d old)
N
Check files in top level <datadir>. ALL WARNING (if > 10000 files) N
Check probe has contacted configured collector. ALL CRITICAL (never seen, not seen for >24h)
UNKNOWN (can't contact collector)
WARNING (not seen for >1h)
N
Check collector's probe -> site name translation matches local configuration. ALL WARNING N

Manual invocation

To invoke manually:

$VDT_LOCATION/osg-rsv/bin/probes/worker-scripts/gratia-config-probe-helper -p <probe> -X

The -X option produces human-readable output instead of the default XML. Invoke as root for more diagnostics.

-- ChrisGreen - 14 Dec 2009

Topic revision: r6 - 14 Dec 2009 - 19:51:32 - ChrisGreen
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..