This document is in Draft Status

Post Upgrade Functionality Checks

This document is to provide a list of checks that will test the functionality of the BDII services after an upgrade has been done.

Checks Immediately after service is brought up

  • login to is1.grid.iu.edu or is2.grid.iu.edu and become root.
    • Execute /opt/service-monitor/is/test.sh, if a problem is found by this test you will get e-mail.
  • Check Web Page Display (This is not a vital part of the BDII Service but allows a manual check of the incoming data.)

Checks ~5 Minutes after the service is brought up

  • From Status Page at http://is1.grid.iu.edu/cgi-bin/status.cgi or http://is2.grid.iu.edu/cgi-bin/status.cgi
    • Check freshness of Raw Incoming Data most resources should be < 5 Minutes
    • Check freshness of Data Feeds to OSG and WLCG these should also be < 5 Minutes
  • Checks from Command Line of LDAP Server Functionality (Do not use Mac OS X 10.6.*, test will fail even with a functional BDII)
    • Run ldapsearch -h is1.grid.iu.edu -p 2170 -x -b mds-vo-name=local,o=grid - this will return several thousand lines of GLUE Schema information
    • Run ldapsearch -h is1.grid.iu.edu -p 2180 -x -b o=grid - this will also return several thousand lines of GLUE Schema information
    • Possible Error Examples Below

Ongoing Monitoring Checks

  • Several scripted checks run on these services including:
    • Check of BDII Freshness at CERN Top Level BDII and SAM BDII available at http://tinyurl.com/2u3xl8q only WLCG resources will be listed here as they are the only resources publishing to CERN BDIIs, these tests are run each hour on the 30 minute mark. So may be behind up to an hour after upgrade has completed.
  • Email alerts are sent to the GOC-ALERTS mailing list for the following conditions
    • More than 10% of resources are not updating BDII information (either WLCG or OSG)
    • FNAL or BNL is not available from the CERN Top Level BDIIs
    • RSV Probes check timestamps of information in the BDII failure are reported via mail
  • The BDII Service also reports many system level metrics via Munin, these should be checked continuously for anomaly after an upgrade.

LDAP Errors

LDAP Server Not Running

This error will happen in the LDAP Server is not responding.

ldap_bind: Can't contact LDAP server (-1)

Data not found

This type of error will happen if no data is found matching your query, first check the ldapsearch syntax if it is correct data is missing.

# extended LDIF
#
# LDAPv3
# base  with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# local, grid
dn: Mds-Vo-name=local,o=grid
objectClass: GlueTop
objectClass: Mds
Mds-Vo-name: local

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
-- RobQ - 11 Aug 2010

-- RobQ - 24 May 2011

Topic revision: r5 - 14 Feb 2012 - 15:01:12 - ScottTeige
Hello, TWikiGuest
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..