Gratia Interfaces - APEL/WLCG - Notes
Description
This topic is a working document to keep track of notes, to-dos, et al. There is not likely to be much organization to it.
The purpose of this little project is to tighten up the procedures for reporting CMS/ATLAS Tier 1/2 reporting to WLCG. The TOC sections address these areas.
Although, this topic mainly references Tier 2 sites, it is intended to cover Tier 1, as well.
There is a little logic (fuzzy in nature) to the font colors used:
- normal - is kind of a general statement of what exists today (sortof)
- green - items that are already in place in some manner
- blue - to-do type items
- red- areas in which I have no clue about what I am saying (although that could overlap with any other area).
Description of change process
The following is the text of an email from Sue Foffano of the LCG Office (
Lcg.Office@cern.ch) describing the process
changing/migrating Site reported to APEL for view at EGEE.
The recommended route is an email to lcg.office@cern.ch with a
meaningful title e.g. Change to the USA Purdue CMS T2 Federation.
This will then trigger the change to be made and the relevant services
(APEL, GRIDVIEW) to be alerted. There is no ticketing system for this as
although changes do and will continue to occur, there are not enough to
warrant one.
The monthly accounting and availability reports which are sent from
lcg.office give people the chance to cross-check and signal any
anomalies which can either be in the data which is corrected at the
source, or the Federation/site mapping or pledge information which we
correct before the reports are published. The earlier reply from
Cristina hopefully confirms that work is on-going to correct the Purdue
sites in the portal.
The other groups/people involved in this at this time are:
Determining when WLCG data is "frozen"
Currently, the interface is being executed and updating the APEL database for the previous month (eg. March) up through the 15th of the current month (e.g. April 15). This was an arbitrary date in order to accommodate sites that may be down for maintenance (or some other reason) and allow them to 'catch-up' when they came back on-line.
It appears that there is a
cutoff or
freeze date based on the Accounting report that follows.
Tier 2 Accounting report
There is a draft report that is issued via email (
???? days before the final). This was from Ruth on 4/16. It looks like this is sent
3 days prior to the cutoff .
Subject: March 2008 - Tier2 Accounting Report
Date: Fri, 11 Apr 2008 16:20:08 +0200
From: Lcg Office (Lcg.Office@cern.ch)
To: project-wlcg-tier2 (project-wlcg-tier2@cern.ch)
CC: worldwide-lcg-management-board (LCG Management Board)
Dear WLCG Collaboration Board Member and associated collaborators,
Attached is the draft Tier2 Accounting report for March 2008, which will be published
on the LCG Planning Page at the end of this week.
Please therefore signal any anomalies that you find to LCG office (lcg.office@cern.ch),
before Friday 14 April 2008 pm
Regards,
Fabienne Baud-Lavigne
**********************
Fabienne Baud-Lavigne
LHC Computing Grid Project
Assistant to Les Robertson
IT Department
CERN
European Organization for Nuclear Research
Tier-2 Accounting Report March 2008.pdf
To do:
Based on this, it appears that an afternoon execution of the interface cron script (lcg.sh) might be necessary.
- we are currently running at 01:00 daily
- need to account for time differential between here and Cern when the email stated 4/14 pm. So it may be a mid-afternoon run.
- this appears to be the date those numbers are frozen. We should NOT update prior months anymore.
Providing visibility to Gratia-APEL interface data
The only visibility anyone has had to the Gratia-APEL interface data was either via:
or
- The Monthly Accounting Report mentioned previously.
On April 15th, changes were made to the Gratia-APEL interface (LCG.py) to generate 2 files (xml and html formats) containing the APEL database's Gratia data. The data is extracted immediately after the daily updates are made. This data is currently (and temporarily) viewable here on Brian Bockleman's Nebraska web site:
To do:
The following items need to be addressed to incorporate them in Gratia or a module of Gratia:
- The xml file used by Brian's site is temporarily being made available on http://home.fnal.gov/~weigand/apel-wlcg/ . This was done to expedite and test the viability of the process.
This could be made available on the Gratia tomcat server in a directory similar to where the static (pdf) reports are: http://gratia.opensciencegrid.org:8880/gratia-reports/apel-wlcg
This may require the cron process for the interface be moved to the gratia09 host thus requiring a change to the authorized user in the APEL database which is currently only the gratia06 host. A very nice feature of Brian's web site is that it accesses the xml file directly at the source. No data transfer is required. So this web could be on another machine so long as the data is available external to the host it is on.
- Brian's code needs to be moved to the fnal cvs repository as its own module or a module of Gratia.
Visibility / Maintenance of Normalization factors
The current method of updating of Normalization factors is completely manual, time consuming and not very timely. It is usually not until the Monthly Accounting Reports are issued that anyone realizes they are outdated and then it is a mad rush to get them updated. Additionally, there is no visibility into them other than the documentation here:
https://twiki.grid.iu.edu/twiki/bin/view/Accounting/GratiaInterfacesApelLcg#Normalization_Factor . The EGEE portal does not display this information even though it is included in the Gratia-APEL interface data.
Note: This will not address the value/validity of using SI2K? or any other SpecINT type number used in determining the Normalization Factor. That is entirely outside the scope of this interface.
Since the normalization factor is a part of the Gratia-APEL interface data, it is now (as of 4/15) visible at the Nebraska site with the other metrics:
Additionally, at the above url, if you scroll down the page, you will see 2 additional tables related to normalization factors:
- GIP Subcluster Information
- Site Normalization Calculation Comparison
In place of the current method of updating the normalization factors, the site admins will advertise the make-up of each subcluster for the their site using the GIP. The normalization factor for each subcluster will be displayed in the GIP Subcluster Information table.
Brian, I need you to elaborate on this part as to where the SI2K? will come from and the use of the alter-attributes.conf file whatever it is. The only place I find documentation on it is here (it is the 0.8.0 doc as well): https://twiki.grid.iu.edu/twiki/bin/view/Integration/ITB090/GenericInformationProviders
The Site Normalization Calculation Comparison table is temporary in nature to show the differences between the manually maintained factor and the GIP calculated one. If agreement can be obtain from all sites to use the GIP generated value, then the need for this table should go away.
To do:
- Get agreement to use the GIP generated number from all sites/VOs
- Define the process/procedure for determining the SI2K? number when new processors come online.
- Develop the process to mechanically use the GIP generated number in the interface
- Develop (in-progress already by Brian) the capability to display the details of the subcluster calculations.
- Determine what data needs to be maintained in the Gratia database or if a separate database is required.
- Determine if the gratia probes need to be enhanced to capture better data for recording
HostDescription which is intended to contain information on the processor used when a job was run. This data field is very incomplete in the production gratia database.
Insuring all Tier 1/2 Sites are included in the interface.
It became apparent during the past few days (4/11 - 4/15) that the sites being selected for the Gratia-APEL interface were out-of-date. After reviewing the EGEE Tier1 and Tier2 views manually by drilling down on each "Site", I saw the following discrepancies:
- For CMS sites,
- UFlorida-HPC and UFlorida-IHEPA were not being reported by the interface.
- HEPGRID_UERJ and SPRACE are being reported, but do not appear in the Tier2 drill down and, hence, should not be reported.
- For ATLAS sites,
- UC_Teraport and OU_OSCER_ATLAS are being reported, but do not appear in the Tier2 drill down and, hence, should not be reported.
The 2 Florida sites have been added to the interface (4/15).
The 4 sites that we are reporting, but do not show in the EGEE view, are still in the interface. I left them in the event my manual evaluation was in error.
- The problem with leaving them is that the view of the interface data shown at http://t2.unl.edu/gratia/wlcg_reporting will be misleading.
- Additionally, if we provide the capability of tracking actual vs pledged data (next section), it will likely be even more misleading unless we take this into account.
There are 2 mapping tables in the APEL accounting database ( goc-accounting.grid-support.ac.uk ) that identify the Tier 1 and Tier 2 groups/sites:
The tables are identical in the terms of columns:
| Field |
Description |
| Name |
Name of the Group/Site that appears in the View on the EGEE portal |
| Path |
Hierarchy used in displaying on the EGEE portal |
| RefID |
In the org_Tier2 table, indicates a site as opposed to a grouping In the org_Tier1 table, indicates a reference id into another database tables. |
| MapID |
Not Applicable |
The data relevant to the OSG sites for each table is:
| org_Tier1 |
org_Tier2 |
| select * from org_Tier1 where Path like '1.4%' or Path like '1.10%' order by Path; |
select * from org_Tier2 where Path like '1.31%' order by Path; |
+-------------------+--------+-------+-------+
| Name | Path | RefID | MapID |
+-------------------+--------+-------+-------+
| US-FNAL-CMS | 1.10 | 0 | 0 |
| USCMS-FNAL-WC1-CE | 1.10.1 | 16 | 0 |
| US-T1-BNL | 1.4 | 0 | 0 |
| BNL | 1.4.1 | 9 | 0 |
| BNL_ATLAS_1 | 1.4.2 | 9998 | 0 |
+-------------------+--------+-------+-------+
|
+-----------------+-----------+-------+-------+
| Name | Path | RefID | MapID |
+-----------------+-----------+-------+-------+
| USA | 1.31 | 0 | 0 |
| US-AGLT2 | 1.31.1 | 0 | 0 |
| AGLT2 | 1.31.1.1 | 1 | 0 |
| T2_US_Purdue | 1.31.10 | 0 | 0 |
| Purdue-Lear | 1.31.10.1 | 1 | 0 |
| Purdue-RCAC | 1.31.10.2 | 1 | 0 |
| T2_US_UCSD | 1.31.11 | 0 | 0 |
| UCSDT2 | 1.31.11.1 | 1 | 0 |
| T2_US_Wisconsin | 1.31.12 | 0 | 0 |
| GLOW | 1.31.12.1 | 1 | 0 |
| US-NET2 | 1.31.2 | 0 | 0 |
| BU_ATLAS_Tier2 | 1.31.2.1 | 1 | 0 |
| US-MWT2 | 1.31.3 | 0 | 0 |
| MWT2_UC | 1.31.3.1 | 1 | 0 |
| MWT2_IU | 1.31.3.2 | 1 | 0 |
| UC_ATLAS_MWT2 | 1.31.3.3 | 1 | 0 |
| IU_OSG | 1.31.3.4 | 1 | 0 |
| US-SWT2 | 1.31.4 | 0 | 0 |
| OU_OCHEP_SWT2 | 1.31.4.1 | 1 | 0 |
| UTA_SWT2 | 1.31.4.2 | 1 | 0 |
| SWT2_CPB | 1.31.4.3 | 1 | 0 |
| US-WT2 | 1.31.5 | 0 | 0 |
| PROD_SLAC | 1.31.5.1 | 1 | 0 |
| T2_US_Caltech | 1.31.6 | 0 | 0 |
| CIT_CMS_T2 | 1.31.6.1 | 1 | 0 |
| T2_US_Florida | 1.31.7 | 0 | 0 |
| Uflorida-PG | 1.31.7.1 | 1 | 0 |
| Uflorida-IHEPA | 1.31.7.2 | 1 | 0 |
| UFlorida-HPC | 1.31.7.3 | 1 | 0 |
| T2_US_MIT | 1.31.8 | 0 | 0 |
| MIT_CMS | 1.31.8.1 | 1 | 0 |
| T2_US_Nebraska | 1.31.9 | 0 | 0 |
| Nebraska | 1.31.9.1 | 1 | 0 |
+-----------------+-----------+-------+-------+ |
Tracking actual vs pledged usage
Currently, there is no means to track actual vs pledged usage for the current month. I believe (but do not know for sure) that the emailed
Tier 2 Accounting Report pdf file, mentioned in a previous section, is the 1st view of this metric.
Pledge information (From Ruth (4/16):
The location of the Tier-2 pledge information we should use is:
To Do:
Brian is currently working on providing this capability for his web site.
Brian, can you give me some details here.
Contact information on Tier1/2 Site administrators
To Do:
Need some form of contact list for the Tier1/2 site administrators.
Usage of non-LCG VOs at Tier 1/2 sites
To Do:
Need some help here.
Include Tier 3 sites
To Do:
Need some help here.
Other Issues
Major updates
--
JohnWeigand - 17 Apr 2008