Installing, Configuring, Using, and Troubleshooting RSV

About This Guide

The Resource and Service Validation (RSV) software helps a site administrator verify that certain site resources and services are working as expected. OSG recommends that sites install and run RSV, but it is optional; further, each site selects which specific tests (called probes) to run.

Use this page to learn more about RSV in general, and how to install, configure, run, test, and troubleshoot RSV from the OSG software repositories. For documentation on specific probes or on how to write your own probes, please check the Reference section.

Introduction to RSV

The Resource and Service Validation (RSV) software provides OSG site administrators a scalable and easy-to-maintain resource and service monitoring infrastructure. The components of RSV are:

  • RSV Client. The client tools allow a site administrator to run tests against their site by providing a set of tests (which can run on the same or other hosts within a site), HTCondor-Cron for scheduling, and tools for collecting and storing the results (using Gratia). The client package is not installed by default and may be installed on a CE or other host. Generally, you configure the RSV client to run tests at scheduled time intervals and then it makes results available on a local website. Also, the client can upload test results to a central collector (see next item).
  • RSV Collector/Server. The central OSG RSV Collector accepts and stores results from RSV clients throughout OSG, which can be viewed in MyOSG, on the “Current RSV Status” page and under the “Resource Group” menu.
  • Periodic Availability Reports. The availability of all active registered OSG resources and the services running on each of those resources is calculated using the results received for critical metrics. Once a day, these availability numbers are published online and via email as explained here (More information: Outline of reports, Installation guide for GOC staff).
  • RSV-SAM Transport. The WLCG RSV-SAM Transport infrastructure pushes out RSV results, for resources that are flagged to be part of the WLCG Interoperability agreement, from the GOC collector to WLCG's Service Availability Monitoring (SAM) system. More information on viewing these results is available here.
  • MyOSG and OIM Links. RSV picks up resource information, WLCG interoperability information, etc., from a MyOSG resource group summary listing, which is in turn based on the OSG Information Management (OIM) (topology) system (Requires registration). Resource maintenance scheduled on OIM, are forwarded to WLCG SAM, if applicable.

Before Starting

Before starting the installation process, consider the following points (consulting the Reference section below as needed):

  • User IDs: If they do not exist already, the installation will create the Linux user IDs rsv and cndrcron
  • Service certificate: The RSV service requires a service certificate (/etc/grid-security/rsv/rsvcert.pem) and matching key (/etc/grid-security/rsv/rsvkey.pem)
  • Network ports: To view results, port 80 must accept incoming requests; outbound connectivity to tested services must work, too
  • Host choice: Install RSV on your site CE unless you have specific reasons (e.g., performance) for installing on a separate host

As with all OSG software installations, there are some one-time (per host) steps to prepare in advance:

Installing RSV

An installation of RSV at a site consists of the RSV client software, the Apache web server, parts of HTCondor (for its cron-like scheduling capabilities), and various other small tools. To simplify installation, OSG provides a convenience RPM that installs all required software with a single command.

  1. Consider updating your local cache of Yum repository data and your existing RPM packages:
    [root@rsv ~]$ yum clean all --enablerepo=*
    [root@rsv ~]$ yum update

    Note: The update command will update all packages on your system.

  2. If you have installed HTCondor already but not by RPM, install a special empty RPM to make RSV happy:
    [root@rsv ~]$ yum install empty-condor --enablerepo=osg-empty
  3. Install RSV and related software:
    [root@rsv ~]$ yum install rsv

(Optional) Special one-time clean-up instructions for RSV perfSONAR 1.1.2 or later

If you run rsv-perfsonar and have upgraded from version 1.1.1 or earlier to version 1.1.2 or later, there is a clean-up step you should take to fix an unnecessary symlink. This optional, one-time procedure is recommended if it applies to your installation.

  1. Check to see if you need to perform this step:
    [root@rsv ~]$ ls -l /usr/share/rsv/www

    If www is a symlink to /var/www/html/rsv, then continue with the procedure; if not, then you are done!

  2. Stop (only) the RSV service using the instructions below
  3. Verify that RSV is not running:
    [root@rsv ~]$ condor_cron_q
       0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
  4. Remove the symbolic link:
    [root@rsv ~]$ unlink /usr/share/rsv/www
  5. Move the formerly linked directory into place:
    [root@rsv ~]$ mv -f /var/www/html/rsv /usr/share/rsv/www
  6. Make sure that all RSV files are owned by RSV:
    [root@rsv ~]$ chown -R rsv:rsv /usr/share/rsv/www
  7. Restart the RSV service using the instructions below

This procedure can be done before or after upgrading the rsv-perfsonar package to version 1.1.2 or later.

Configuring RSV

After installation, there are some one-time configuration steps to tell RSV how to operate at your site.

  1. Edit /etc/osg/config.d/30-rsv.ini and follow the instructions in the file. There are detailed comments for each setting. In the simplest case — to monitor only your CE — set the htcondor_ce_hosts variable (or gram_ce_hosts for a GRAM CE) to the fully qualified hostname of your CE. For a sample rsv.ini file, see the complete installation output below.

  2. If you have installed HTCondor already but not by RPM, specify the location of the Condor installation in 30-rsv.ini in the condor_location setting. If an HTCondor RPM is installed, you do not need to set condor_location.

  3. Complete the configuration using the osg-configure tool:

    [root@rsv ~]$ osg-configure -v
    [root@rsv ~]$ osg-configure -c

    Note: The osg-configure tool produces a lot of output; see below for an example.

Optional configuration

The following configuration steps are optional and will likely not be required for setting up a small or typical site. If you do not need any of the following special configurations, skip to the section on using RSV.

Generally speaking, read the ConfigureRsv page for more advanced configuration options. Or see below for specific advanced configuration scenarios.

Configuring RSV to run probes using a remote server

RSV monitors systems by running probes, which can run on the RSV host itself (the default case), via a separate batch system like HTCondor, or via a remote batch system using a Globus gatekeeper and its job manager. The last two options both can count those jobs and report them to, for example, Gratia.

In this case, remember to:

  • Add the RSV user rsv on all the systems where the probes may run, and
  • Map the RSV service certificate to the user you intend to use for RSV. This should be a local user used exclusively for RSV and not belonging to an institutional VO to avoid for the RSV probes to be accounted as regular VO jobs in Gratia. This can be done in GUMS or using a grid-mapfile-local (if you use a grid-mapfile). MapServiceCertToRsvUser explains how to configure GUMS or the grid-mapfile. Also see the CE installation document for more information.

Configuring the RSV web server to use HTTPS instead of HTTP

If you would like your local RSV web server to use HTTPS instead of the default HTTP (for compatibility or security reasons), complete the steps below. This procedure assumes that you already have an HTTP service certificate (or a copy of the host certificate) in /etc/grid-security/http/. If not, omit the SSLCertificate* modifications below, and your web server will start with its own, self-signed certificate.

  1. Install mod_ssl:
    [root@rsv ~]$ yum install mod_ssl
  2. Make an alternate set of HTTP service certificate files:
    [root@rsv ~]$ cp -p /etc/grid-security/http/httpcert.pem /etc/grid-security/http/httpcert2.pem
    [root@rsv ~]$ cp -p /etc/grid-security/http/httpkey.pem /etc/grid-security/http/httpkey2.pem
    [root@rsv ~]$ chown apache:apache /etc/grid-security/http/http*2.pem
  3. Back up existing Apache configuration files:
    [root@rsv ~]$ cp -p /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.orig
    [root@rsv ~]$ cp -p /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.d/ssl.conf.orig
  4. Change the default port for HTTP connections to 8000 by editing /etc/httpd/conf/httpd.conf:
    Listen 8000
  5. Set up HTTPS access by editing /etc/httpd/conf.d/ssl.conf:
    Listen 8443
    <VirtualHost _default_:8443>
    SSLCertificateFile /etc/grid-security/http/httpcert2.pem
    SSLCertificateKeyFile /etc/grid-security/http/httpkey2.pem

After these changes, when you start the Apache service, it will listening on ports 8000 (for HTTP) and 8443 (for HTTPS), rather than the default port 80 (for HTTP only).

ALERT! WARNING!
If you make the changes above, you must restart the Apache server after each CA certificate update to pick up the changes.

Using RSV

Managing RSV and associated services

In addition to the RSV service itself, there are a number of supporting services in your installation. The specific services are:

Software Service name Notes
Fetch CRL On EL 6: fetch-crl-boot and fetch-crl-cron
On EL 5: fetch-crl3-boot and fetch-crl3-cron
See CA documentation for more info
Apache httpd  
HTCondor-Cron condor-cron  
RSV rsv  

Start the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root):

To … Run the command …
Start a service service SERVICE-NAME start
Stop a service service SERVICE-NAME stop
Enable a service to start during boot chkconfig SERVICE-NAME on
Disable a service from starting during boot chkconfig SERVICE-NAME off

Running RSV manually

Normally, the HTCondor-Cron scheduler runs RSV periodically. However, you can run RSV probes manually at any time:

[root@rsv ~]$ rsv-control --run --all-enabled

If successful, results will be available from your local RSV web server (e.g., http://localhost/rsv) and, if enabled (which is the default) on MyOSG.

You can also run the metrics individually or pass special parameters as explained in the rsv-control document.

Troubleshooting RSV

You can find more information on troubleshooting RSV in the rsv-control documentation and in TroubleshootRSV.

Important file locations

Logs and configuration:

File Description Location Comment
Metric log files /var/log/rsv/metrics  
Consumer log files /var/log/rsv/consumers  
HTML files /usr/share/rsv/www/ Available at http://your.host.example.com/rsv

File Description Location Comment
Initial configuration /etc/osg/config.d/30-rsv.ini Read by osg-configure
RSV configuration /etc/rsv Generally files in this directory should not be edited directly. Use osg-configure instead.
Metric configuration /etc/rsv/metrics/HOSTNAME/METRICNAME.conf To change arguments and environment

To find the metrics and the other files in RSV you can use also the RPM commands: rpm -ql rsv-metrics and rpm -ql rsv.

Getting more information from rsv-control

The first step to getting more information is to run rsv-control with more verbosity. Use the --verbose (-v) flag. This flag can be used with any of rsv-control's abilities (run, enable, list, etc). The verbosity levels are:
  • 0 = print nothing
  • 1 = print warnings and errors along with usual output of command being run (1 is the default level)
  • 2 = adds informational messages
  • 3 = full debugging output
For example, here is the output when running a metric with -v2.
[root@fermicloud016 condor]# rsv-control -r org.osg.general.osg-version -v 2 -u osg-edu.cs.wisc.edu
INFO: Reading configuration file /etc/rsv/rsv.conf
INFO: Reading configuration file /etc/rsv/consumers.conf
INFO: Validating configuration:
INFO: Validating user:
INFO:     Invoked as root.  Switching to 'rsv' user (uid: 100 - gid: 102)
INFO: Registered consumers: html-consumer, gratia-consumer
INFO: Loading config file '/etc/rsv/meta/metrics/org.osg.general.osg-version.meta'
INFO: Loading config file '/etc/rsv/metrics/org.osg.general.osg-version.conf'
INFO: Optional config file '/etc/rsv/metrics/osg-edu.cs.wisc.edu/org.osg.general.osg-version.conf' does not exist
INFO: Checking proxy:
INFO:     Using service certificate proxy
INFO: Running command with timeout (1200 seconds):
        /usr/bin/openssl x509 -in /tmp/rsvproxy -noout -enddate -checkend 21600
INFO: Exit code of job: 0
INFO:     Service certificate valid for at least 6 hours.
INFO: Pinging host osg-edu.cs.wisc.edu:
INFO: Running command with timeout (1200 seconds):
        /bin/ping -W 3 -c 1 osg-edu.cs.wisc.edu
INFO: Exit code of job: 0
INFO:     Ping successful

Running metric org.osg.general.osg-version:

INFO: Executing job remotely using Condor-G
INFO: Setting up job environment:
INFO:     No environment setup declared
INFO: Condor-G working directory: /var/tmp/rsv/condor_g-JiQthF
INFO: Forming arguments:
INFO:     Arguments: ''
INFO: List of files to transfer: /usr/libexec/rsv/probes/RSVMetric.pm
INFO: Condor submission: Submitting job(s).
1 job(s) submitted to cluster 2.
INFO: Trimming data to 10000 bytes because details-data-trim-length is set
INFO: Creating record for html-consumer consumer at '/var/spool/rsv/html-consumer/org.osg.general.osg-version.7rgLfn'
INFO: Creating record for gratia-consumer consumer at '/var/spool/rsv/gratia-consumer/org.osg.general.osg-version.-qelnL'
INFO: Result:

metricName: org.osg.general.osg-version
metricType: status
timestamp: 2012-01-25 16:12:40 CST
metricStatus: OK
serviceType: OSG-CE
serviceURI: osg-edu.cs.wisc.edu
gatheredAt: fermicloud016.fnal.gov
summaryData: OK
detailsData: OSG 1.2.26

EOT

Using the RSV verify tool

The --verify flag will run some basic checks for your RSV installation.
$> rsv-control --verify
Testing if Condor-Cron is running...
OK

Testing if metrics are running...
OK (98 running metrics)

Testing if consumers are running...
OK (1 running consumers)

Checking which consumers are configured...
The following consumers are enabled: html-consumer
WARNING: The gratia-consumer is not enabled.  This indicates that your
         resource is not reporting to OSG.

This tool is still under development and it does only basic checks, but it is a good first step when debugging issues.

Getting Help

To get assistance, please use this page.

RSV has a tool to collect information useful for troubleshooting. For most problems the RSV support team will ask you to generate a profile tarball to share information (including log and configuration files). You can save some time by doing so with your original request:

$> rsv-control --profile
Running the rsv-profiler...
OSG-RSV Profiler
Analyzing...
Making tarball (rsv-profiler.tar.gz)

Reference

Here are some other RSV documents that might be helpful:

Users

The RSV installation will create two users unless they are already created. The users are created when the rsv and condor-cron packages are installed.

User Comment
rsv Runs the RSV tests; the RSV certificate (below) will need to be owned by this user
cndrcron Runs the Condor Cron processes to schedule the running of the tests

Note that if you pre-create the RSV user, it should have a working shell. That is, it shouldn't have a default shell of /sbin/nologin.

WARNING: If you manage your /etc/passwd file with configuration management software such as Puppet, CFEngine or 411, make sure the UID and GID in /etc/condor-cron/config.d/condor_ids matches the UID and GID of the cndrcron user and group in /etc/passwd. If it does not, create a file named /etc/condor-cron/config.d/condor_ids_override with the contents:

CONDOR_IDS=UID.GID
where UID and GID are the UID and GID of the cndrcron user and group.

Certificates

Certificate User that owns certificate Path to certificate
RSV service certificate rsv /etc/grid-security/rsv/rsvcert.pem
/etc/grid-security/rsv/rsvkey.pem

Ensure an RSV service certificate is installed in /etc/grid-security/rsv/ and the certificate files are owned by the rsv user. Adjust the permissions if necessary (cert needs to be readable by all, key needs to be readable by nobody but owner).

You may need another certificate owned by apache if you'd like an authenticated web server; see Configuring the RSV web server to use HTTPS instead of HTTP above.

See instructions to request a service certificate.

Networking

For more details on overall Firewall configuration, please see our Firewall documentation.

Service Name Protocol Port Number Inbound Outbound Comment
HTTP tcp 80 Y   RSV runs an HTTP server (Apache) that publishes a page with the RSV testing results
HTTP tcp 80   Y RSV pushes testing results to the OSG Gratia Collectors at opensciencegrid.org
various various various   Y Allow outbound network connection to all services that you want to test

Or, if you'd rather have your RSV web page appear as https://...:8443/rsv/ like it used to in OSG 1.2, the first column above would be HTTPS / tcp / 8443. See above for how to configure this.

Topic revision: r63 - 18 Jul 2017 - 22:11:27 - MatyasSelmeci
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..