Network Performance Toolkit

About this Document

hand This document is for System Administrators and advanced Grid Users. It describes the usage of tools provided by the Virtual Data Toolkit to evaluate the network performance between resources.

Introduction

The Network Performance Toolkit is a collection of applications provided by the perfSONAR project and distributed by the Open Science Grid. The server components of the Network Performance Toolkit have been installed on dedicated resources of the Open Science Grid. Following client tools are described in this document:

  • Network Diagnostic Tool (NDT)
  • One Way Active Measurement Protocol (OWAMP)
  • Bandwidth Control tool (BWCTL)
  • Network Path and Application Diagnosis (NPAD)

Installation

Client Site Installation

The Network Performance Toolkit is installed with the OSG Client. Specifically, the tools included are: BWCTL, NDT and OWAMP (bwctl-client, bwctl-server, bwctl, ndt, owamp-client). NPAD is currently not in OSG client.

If you just want to install the OSG command line clients you can do the following::

yum install bwctl-client
yum install owamp-client
yum install ndt-client
yum install npad-client

You may install these utilities separately as RPM using yum by following the perfSONAR instructions. The packages are in the OSG repository, some of them with a separate client or server version, available for the OSG supported platforms:

  • NDT: ndt
  • OWAMP: owamp, owamp-client, owamp-server
  • BWCTL: bwctl, bwctl-client, bwctl-server
  • NPAD: npad

Server Site Installation

The perfSONAR-based tools and services support the following tasks for OSG VO's:

  1. monitor site-to-site network paths and ensure that these paths remain operational
  2. troubleshoot performance problems quickly and efficiently

The server site components can be brought-up on demand using the netinstall provided by perfSONAR project downloads. Source packages are provided on the perfSONAR home page.

Once the Toolkit server has booted you may begin on-demand testing. The server tools will use a generic set of configuration files. The intent is to make it easy to stand up a temporary server when and where it is needed. However, it is expected that a permanently installed server will be customized/configured, allowing it to support both on-demand testing and regularly scheduled monitoring. See the perfSONAR home page for step-by-step instructions on how to complete this customization process.

Finding Target Servers

Finding servers against which to run on-demand tests can be a major impediment to effectively using these tools. The perfSONAR project tackles this problem by running a registration service for participating tools. The Performance Node ISO automatically uses this Lookup Service to advertise the tools' existence. You can also create custom views by making web-service calls to retrieve the data of interest. See the perfSONAR service page for more details.

We also have requested ALL OSG sites register their perfSONAR Toolkit installations in OIM (See https://www.opensciencegrid.org/bin/view/Documentation/RegisterPSinOIM ). You can use the MyOSG -> Resource Group -> Resource Group Summary page to see a list of perfSONAR Toolkit hosts that are installed at http://tinyurl.com/mxfmutg. Using this list you can select a "closest" relevant instance to use for running on-demand tests. Alternately if you have a perfSONAR toolkit install, the web interface has a "Global Services" link you can visit to see ALL perfSONAR instances that have updated the perfSONAR lookup service.

Using the Client Tools

The client site components are installed with the OSG Client (see above). These tools support delay measurements (OWAMP), throughput measurements (BWCTL), and advanced diagnostics (NDT and NPAD). The command syntax for each tool is described in the following sub-sections. Each of the client tools listed above communicates with a companion server process to perform a measurement/test.

Network Diagnostic Tool (NDT)

The Network Diagnostic Tool (NDT) runs a series of short tests to determine what the current performance is and what, if anything, is limiting that performance. It can distinguish between host configuration and network infrastructure problems. To diagnose the CE/SE configuration and network connection run the web100clt command:

[user@client /opt/npt]$ web100clt n <Target Server for Measurement>

[user@client /opt/npt]$  web100clt -n uct2-net1.uchicago.edu
Testing network path for configuration and performance problems  --  Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
running 10s outbound test (client to server) . . . . .  939.43 Mb/s
running 10s inbound test (server to client) . . . . . . 940.29 Mb/s
The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit Ethernet/OC-192 subnet
Information: Other network traffic is congesting the link
Information: The receive buffer should be 11780 kbytes to maximize throughput
Server 'uct2-net1.uchicago.edu' is not behind a firewall. [Connection to the ephemeral port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was successful]
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End

More details can be obtained by using the -l command line option to web100ctl:

[user@client /opt/npt]$ web100clt n <Target Server for Measurement> -l

[user@client /opt/npt]$ web100clt -l -n uct2-net1.uchicago.edu
Testing network path for configuration and performance problems  --  Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
running 10s outbound test (client to server) . . . . .  940.90 Mb/s
running 10s inbound test (server to client) . . . . . . 940.33 Mb/s
The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit Ethernet/OC-192 subnet
Information: Other network traffic is congesting the link
Information: The receive buffer should be 11743 kbytes to maximize throughput
Server 'uct2-net1.uchicago.edu' is not behind a firewall. [Connection to the ephemeral port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was successful]

	------  Web100 Detailed Analysis  ------

Web100 reports the Round trip time = 9.62 msec;the Packet size = 1448 Bytes; and 
No packet loss was observed.
This connection is network limited 99.27% of the time.

    Web100 reports TCP negotiated the optional Performance Settings to: 
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: ON
RFC 1323 Window Scaling: ON; Scaling Factors - Server=13, Client=7
The theoretical network limit is 597.89 Mbps
The NDT server has a 32768 KByte buffer which limits the throughput to 26616.76 Mbps
Your PC/Workstation has a 3060 KByte buffer which limits the throughput to 2485.67 Mbps
The network based flow control limits the throughput to 1302.53 Mbps

Client Data reports link is '  9', Client Acks report link is '  9'
Server Data reports link is '  9', Server Acks report link is '  8'
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End

To increase the output further use:

[user@client /opt/npt]$ web100clt n <Target Server for Measurement> -ll

[user@client /opt/npt]$ web100clt -ll -n uct2-net1.uchicago.edu
Testing network path for configuration and performance problems  --  Using IPv4 address
Checking for Middleboxes . . . . . . . . . . . . . . . . . .  Done
checking for firewalls . . . . . . . . . . . . . . . . . . .  Done
running 10s outbound test (client to server) . . . . .  935.88 Mb/s
running 10s inbound test (server to client) . . . . . . 910.21 Mb/s
The slowest link in the end-to-end path is a 10 Gbps 10 Gigabit Ethernet/OC-192 subnet
Information: Other network traffic is congesting the link
Server 'uct2-net1.uchicago.edu' is not behind a firewall. [Connection to the ephemeral port was successful]
Client is not behind a firewall. [Connection to the ephemeral port was successful]

	------  Web100 Detailed Analysis  ------

Web100 reports the Round trip time = 3.72 msec;the Packet size = 1448 Bytes; and 
No packet loss was observed.
This connection is sender limited 72.74% of the time.
This connection is network limited 27.26% of the time.

    Web100 reports TCP negotiated the optional Performance Settings to: 
RFC 2018 Selective Acknowledgment: ON
RFC 896 Nagle Algorithm: ON
RFC 3168 Explicit Congestion Notification: OFF
RFC 1323 Time Stamping: ON
RFC 1323 Window Scaling: ON; Scaling Factors - Server=13, Client=7
The theoretical network limit is 2638.96 Mbps
The NDT server has a 32768 KByte buffer which limits the throughput to 68872.75 Mbps
Your PC/Workstation has a 2717 KByte buffer which limits the throughput to 5710.68 Mbps
The network based flow control limits the throughput to 3349.58 Mbps

Client Data reports link is '  9', Client Acks report link is '  9'
Server Data reports link is '  9', Server Acks report link is '  8'
Packet size is preserved End-to-End
Server IP addresses are preserved End-to-End
Client IP addresses are preserved End-to-End
CurMSS: 1448
X_Rcvbuf: 33554432
X_Sndbuf: 33554432
AckPktsIn: 79187
AckPktsOut: 0
BytesRetrans: 0
CongAvoid: 0
CongestionOverCount: 0
CongestionSignals: 1
CountRTT: 79188
CurCwnd: 1227904
CurRTO: 202
CurRwinRcvd: 2782208
CurRwinSent: 24576
CurSsthresh: 815224
DSACKDups: 0
DataBytesIn: 0
DataBytesOut: 1163175456
DataPktsIn: 0
DataPktsOut: 788529
DupAcksIn: 0
ECNEnabled: 0
FastRetran: 0
MaxCwnd: 1631896
MaxMSS: 1448
MaxRTO: 251
MaxRTT: 82
MaxRwinRcvd: 2782208
MaxRwinSent: 24576
MaxSsthresh: 815224
MinMSS: 1448
MinRTO: 201
MinRTT: 0
MinRwinRcvd: 14720
MinRwinSent: 17896
NagleEnabled: 1
OtherReductions: 0
PktsIn: 79187
PktsOut: 788529
PktsRetrans: 0
RcvWinScale: 13
SACKEnabled: 3
SACKsRcvd: 0
SendStall: 1
SlowStart: 0
SampleRTT: 2
SmoothedRTT: 2
SndWinScale: 7
SndLimTimeRwin: 0
SndLimTimeCwnd: 2746623
SndLimTimeSender: 7328191
SndLimTransRwin: 0
SndLimTransCwnd: 2
SndLimTransSender: 3
SndLimBytesRwin: 0
SndLimBytesCwnd: 330034080
SndLimBytesSender: 833141376
SubsequentTimeouts: 0
SumRTT: 294370
Timeouts: 0
TimestampsEnabled: 1
WinScaleRcvd: 7
WinScaleSent: 13
DupAcksOut: 0
StartTimeUsec: 806483
Duration: 10076637
c2sData: 9
c2sAck: 9
s2cData: 9
s2cAck: 8
half_duplex: 0
link: 100
congestion: 1
bad_cable: 0
mismatch: 0
spd: 923.63
bw: 2638.96
loss: 0.000001268
avgrtt: 3.72
waitsec: 0.00
timesec: 10.00
order: 0.0000
rwintime: 0.0000
sendtime: 0.7274
cwndtime: 0.2726
rwin: 21.2266
swin: 256.0000
cwin: 12.4504
rttsec: 0.003717
Sndbuf: 33554432
aspd: 0.00000
CWND-Limited: 70431.00
minCWNDpeak: -1
maxCWNDpeak: -1
CWNDpeaks: -1

One Way Active Measurement Protocol (OWAMP)

The One Way Active Measurement Protocol (OWAMP) is an advanced version of the common ping program. The OWAMP client owping communicates with an OWAMP server and measures the delay in each direction using NTP based time stamps. OWAMP can be used to identify delay, loss, and packet reordering problems inside the network. To measure the delay between the CE/SE and the remote server use the owping command:

[user@client /opt/npt]$ owping <Target Server for Measurement>

[user@client /opt/npt]$ owping uct2-net1.uchicago.edu
owping: FILE=time.c, LINE=112, NTP: Status UNSYNC (clock offset issues likely)
Approximately 13.1 seconds until results available

--- owping statistics from [128.135.158.175]:52420 to [uct2-net1.uchicago.edu]:58713 ---
SID:	80879ed8d1c711bc04f850dfe3c2c601
first:	2011-07-12T13:32:29.123
last:	2011-07-12T13:32:40.510
100 sent, 0 lost (0.000%), 0 duplicates
one-way delay min/median/max = 0.287/0.5/1.65 ms, (unsync)
one-way jitter = 0 ms (P95-P50)
TTL not reported
no reordering


--- owping statistics from [uct2-net1.uchicago.edu]:58908 to [128.135.158.175]:41285 ---
SID:	80879eafd1c711bc05be3c10ca8f8d68
first:	2011-07-12T13:32:29.103
last:	2011-07-12T13:32:39.637
100 sent, 0 lost (0.000%), 0 duplicates
one-way delay min/median/max = -0.0234/0.2/0.177 ms, (unsync)
one-way jitter = 0 ms (P95-P50)
TTL not reported
no reordering

Bandwidth Control tool (BWCTL)

The Bandwidth Control tool (BWCTL) is a wrapper for the iperf command, its policy and a daemon. BWCTL improves the usability of iperf by avoiding following problems:

  1. need for remote access to the target host used for measurement
  2. security concerns about leaving an iperf daemon running on the target host

BWCTL supports testing in either direction, or between 2 remote BWCTL servers from a third location. To measure the current throughput from your SE/CE to the remote server use the bwctl command:

[user@client /opt/npt]$ bwctl s <Target Server for Measurement>

[user@client /opt/npt]$ bwctl -a 4 -s uct2-net2.uchicago.edu
bwctl: NTP: Status UNSYNC (clock offset problems likely)
bwctl: Unable to contact a local bwctld: Spawning local tool controller
bwctl: NuttcpAvailable(): We were unable to verify that nuttcp is working. Likely you do not have it installed. exit status: 1: output: exec(nuttcp): No such file or directory
bwctl: Couldn't initialize tool "nuttcp". Disabling it.
bwctl: Using tool: iperf
bwctl: 54 seconds until test results available

RECEIVER START
bwctl: exec_line: iperf -B 128.135.158.227 -s -f b -m -p 5001 -t 10
bwctl: start_tool: 3519488679.844807
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 128.135.158.227
TCP window size: 87380 Byte (default)
------------------------------------------------------------
[ 12] local 128.135.158.227 port 5001 connected with 128.135.158.219 port 5001
[ ID] Interval       Transfer     Bandwidth
[ 12]  0.0-10.0 sec  1181483008 Bytes  941311591 bits/sec
[ 12] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
bwctl: stop_exec: 3519488702.089699

RECEIVER END

Third party tests, between 2 remote BWCTL servers, will let you measure various sections of the end-2-end path, running

[user@client /opt/npt]$ bwctl c <1st Server for Measurement> s <2nd Server for Measurement>

[user@client /opt/npt]$ bwctl -a 4 -s uct2-net2.uchicago.edu -c iut2-net2.iu.edu
bwctl: NTP: Status UNSYNC (clock offset problems likely)
bwctl: Using tool: iperf
bwctl: 16 seconds until test results available

RECEIVER START
bwctl: exec_line: iperf -B 149.165.225.224 -s -f b -m -p 5008 -t 10
bwctl: start_tool: 3519488749.147866
------------------------------------------------------------
Server listening on TCP port 5008
Binding to local address 149.165.225.224
TCP window size: 87380 Byte (default)
------------------------------------------------------------
[ 14] local 149.165.225.224 port 5008 connected with 128.135.158.219 port 5008
[ ID] Interval       Transfer     Bandwidth
[ 14]  0.0-10.2 sec  1199046656 Bytes  936780541 bits/sec
[ 14] MSS size 1448 bytes (MTU 1500 bytes, ethernet)
bwctl: stop_exec: 3519488763.656271

RECEIVER END

Other useful options are -f, format, -t, length of test, and -i, test interval.

Network Path and Application Diagnosis (NPAD)

NOTE: Network Path and Application Diagnosis (NPAD) is deprecated and won't be in future versions of the OSG distribution

The Network Path and Application Diagnosis (NPAD) tool examines a host and its local network infrastructure to determine what problems, if any, would hinder wide area performance. Issues such as small TCP buffers in switches and routers are detected as well as common host configuration errors. To determine if the CE/SE will achieve maximum performance over a WAN path run the command diag-client:

[user@client /opt/npt]$ diag-client <Target Server for Measurement> 8001 10 50

[user@client /opt/npt]$ diag-client iut2-net2.iu.edu 8001 10 50
Using: rtt 10 ms and rate 50
Connected.
Control connection established.
Waiting for test to start.  Currently there are 1 tests ahead of yours.
port = 8003
Starting test.
Parameters based on 5 ms initial RTT
peakwin=64672 minpackets=3 maxpackets=1283 stepsize=128
Target run length is 3802 packets (or a loss rate of 0.02630195%)
Test 1a (11 seconds): Coarse Scan
Test 1b (11 seconds): ...
Test 1c (11 seconds): ...
Test 1d (11 seconds): ...
Test 1e (11 seconds): ...
Test 1f (11 seconds): ...
Test 1g (11 seconds): ...
Test 1h (11 seconds): ...
Test 1i (11 seconds): ...
Test 1j (11 seconds): ...
Test 1k (11 seconds): ...
Test 1l (11 seconds): ...
Test 1m (11 seconds): ...
Test 1n (11 seconds): ...
Test 1o (11 seconds): ...
Test 2a (9 seconds): Search for the knee
Test 2b (9 seconds): ...
Test 2c (17 seconds): ...
Test 3a (9 seconds): Measure static queue space
Test 3b (9 seconds): ...
Test 3c (17 seconds): ...
Accumulate loss statistics, no more than 20 seconds:
Test 4a (10 seconds): Accumulate loss statistics
Test 4b (10 seconds): ...
report url ServerData/Reports-2011-07/vtbv-ce.uchicago.edu:2011-07-12-18:50:07.html

HELP NOTE
The last 2 numeric parameters, after host and port, are the rtt time (in ms) and speed/rate values (in Mbps) you need to achieve.
The reason it works this way is that its meant to 'test' local infrastructure. The idea is that if you were testing to an NPAD server that was 5ms away on a 1G network, you would get close to that speed even with network flaws.
If you were to supply 80ms and 1G to the server and there truly was a flaw, the NPAD test would tell you it wasn't possible, thus enabling you to fix the problem

HELP NOTE
The diag-client commands return a partial URL, enabling easy sharing of results between users and site administrators. To view the results, prepend the Toolkit servers name/port to the returned string. The example above would result in this URL: http://server.this.osg.domain:8002/ServerData/Reports-2011-07/vtbv-ce.uchicago.edu:2011-07-12-18:50:07.html.

Advanced Topic: Scheduled Monitoring

(See http://docs.perfsonar.net/install_quick_start.html for more details.)

In addition to the above on-demand tests, the Performance Toolkit server can be configured to continuously monitor the throughput or delay between your site and peer sites of interest. To begin this monitoring, enter the GUI and ensure that your server is a member of the community or communities of interest. Once that is complete, continue on by selecting either the perfSONAR-BUOY throughput or delay configuration menu item.

pSB-throughput: This utility will run regularly scheduled BWCTL tests between your Toolkit server and the selected peer servers. Results are stored in a database and displayed on the server's web page. You may also use standard web-service calls to retrieve this data for display on remote web servers. This would allow monitoring of a common core infrastructure at a central site, while each site could keep local/customized views.

pSB-delay: This utility will run regularly scheduled OWAMP tests between your Toolkit server and the selected peers. Results are stored in a database and displayed on the servers web page. You may also use standard web-service calls to retrieve this data for display on remote web servers. This would allow monitoring of a common core infrastructure at a central site, while each site could keep local/customized views as required.

Known Issues

Currently none.

References

  1. Network Diagnostic Tool (NDT)
  2. One Way Active Measurement Protocol (OWAMP)
  3. Bandwidth Control tool (BWCTL)
  4. Network Path and Application Diagnosis (NPAD)

See also the OSG/WLCG pages on perfSONAR at https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR

Comments

Topic revision: r21 - 06 Dec 2016 - 18:12:43 - KyleGross
Hello, TWikiGuest!
Register

 
TWIKI.NET

TWiki | Report Bugs | Privacy Policy

This site is powered by the TWiki collaboration platformCopyright by the contributing authors. All material on this collaboration platform is the property of the contributing authors..