pcpicon

    

Home

Portability Guidelines for PCP QA

Ken McDonell <kenj@kenj.id.au>

September 2018

Background

In this document the generic term "component" will be used to describe a piece of functionality in Performance Co-Pilot (PCP) that could be any of the following:

Obviously the functional goals and feature focus of PCP has been to build tools that help understand hard performance problems that are seen in some the the world's most complex systems. Over more than two decades, the delivery of components to address the feature requirements has been build on engineering processes and approaches with a technical focus on:

Robustness

Robustness simply means that every PCP application and service either works correctly or detects that the environment will not support correct execution and is either omitted from the build or omitted from the packaging or reports a warning and exits in an orderly fashion.

Some examples may help illustrate this philosophy.

Source Code and Compile Time Portability

Mostly what's been done here is common and good engineering practice. For example using configure, conditionals in the GNUmakefiles and assorted sed/awk rewriting scripts to ensure the code will compile cleanly on all platforms. Compiler warnings are enabled and builds are expected to complete without warnings. And in the source code we demand thorough error checking and error handling on all system and library calls.

We've extended the normal concept of macros to include a set of globally defined names that are are used for path and directory names, search paths, application names and application options and arguments. These are defined in src/include/pcp.conf.in, bound to values at build time by configure and installed in /etc/pcp.conf. These can then be used in shell scripts and applications in C, C++, Perl, Python have run-time access to these via pmGetConfig() or pmGetOptionalConfig(), see for example src/pmie/src/pmie.c.

Even file pathname separators (/ for the sane world, \ elsewhere) have been abstracted out and pmPathSeparator() is used to construct pathnames from directory components.

At a higher level we don't even try to build code if it is not intended for the target platform.

Packaging Portability

At packaging time we use conditionals to include only those components that can be built and are expected to work for the target platform.

This extends to wrapping some of the prerequisites in conditionals if the prerequisite piece may not be present or may have a different name.

For Debian packaging this means debian/control is build from debian/control.main and the ?{foo} elements of the Build-Depends and Depends clauses are conditionally expanded by the debian/fixcontrol.main script during the build.

For RPM packaging this means using configure to perform macro substitution to create build/rpm/pcp.spec from build/rpm/pcp.spec.in and using %if within the spec file to selectively include packages and adjust the BuildRequires and Requires clauses.

QA Portability

There are purpose-designed QA applications in the qa/src directory and the source code for these applications should follow all of the same guidelines for portability and build resilience as those that apply for the applications and libraries that ship with the main part of PCP. The only possible exception is that error handling can be a little more relaxed for the QA applications as they are used in a more controlled manner.

Use /etc/pcp.conf

All QA test scripts will have the variables set in /etc/pcp.conf placed in the environment using $PCP_DIR/etc/pcp.env which is called from common.rc (with $PCP_DIR typically unset). And all QA tests source common.rc, although the sequence is a little convoluted (and irrelevant for this discussion).

The important thing is that the following environment variables become available to improve the portability of QA test scripts and the output filtering (see below) they must perform.

Use the predefined variables

PCP QA scripts follow a standard template when created with the qa/new script (which is the recommended way to create new QA test scripts) and then the following local shell variables are available:

check-vm is your friend, not the enemy

There is a a very large set of applications and packages outside of PCP that are required to build PCP from source and/or run PCP QA. The script qa/admin/check-vm tries to capture all of these requirements. check-vm should be used as follows:

Avoid exotic commands and non-standard command arguments

Our QA tests scripts are run with sh not bash and on some platforms these are really different!

Things like an == operand used with test (aka [) as in
$ if [ "$foo" == "bar" ] ...
will not work with the Bourne shell. Instead, use the = operator, e.g.
$ sh
$ x=1
$ [ "$x" = 1 ] && echo yippee
yippee
$ [ "$x" == 1 ] && echo boo
sh: 3: [: 1: unexpected operator

Also and any use of the bash [[ operator
$ ... [[ ... ]]
is going to blow up when presented to a real Bourne shell. In most cases test (aka [) can be used for a straight forward rewrite.

Even less portable is any use of the bash $((...)) construct for in-built arithmetic. For these cases, you'll need equivalent logic using expr, e.g.
instead of
x=$(($x+1))
use
x=`expr $x + 1`

Another recurring one is the -i argument for sed; this is not standard and not supported everywhere so just do not use it. The alternative:
$ sed ... <somefile >$tmp.tmp; mv $tmp.tmp somefile
works everywhere for most of the cases in QA. If cross-filesystem linking and a lame mv are in play then the following is even more portable:
$ sed ... <somefile >$tmp.tmp; cp $tmp.tmp somefile; rm $tmp.tmp
If permissions are in play, then you may need:
$ sed ... <somefile >$tmp.tmp; $sudo cp $tmp.tmp somefile; rm $tmp.tmp

Do not use seq as it is not portable. For example, this is not portable:
for i in $(seq 4); do ...; done
but it can be rewritten as follows and this will work everywhere:
i=1; while [ $i -le 4 ]; do ...; i=`expr $i + 1`; done

Output filtering

Each QA test script must produce a standard output stream that is deterministic. The qa/check script that is responsible for running each QA test script (say NNNN) captures the standard output from NNNN and if this matches the expected result in NNNN.out then then test is deemed to have passed, otherwise the standard output from NNNN is saved in NNNN.out.bad and the test is deemed to have failed.

So it is critical that output of NNNN is deterministic across all platforms and timezones.

Applications used in a QA test script will potentially produce output that is irrelevant to the success or otherwise of the test. The most obvious example is the current date from an application run on the test system. Consequently, most QA test scripts include one or more _filter() functions that are responsible to translating the raw output from the applications run in the test into the deterministic output.

These filtering functions in turn often use a set of standard functions in qa/common.filter that may be applied to the output of common PCP commands and operations.

There are way too many filtering cases to describe them all here, but the list below is illustrative of the range of techniques that may be needed. And each QA test script may need employ more than one of the techniques to produce deterministic output for "correct" execution across all platforms.

To the extent that filtering removes or rewrites what would otherwise be the output of a QA test script, there maybe some difficulty in triaging test failures if you have only the expected NNNN.out and the observed NNNN.out.bad files. Most QA test scripts will emit unfiltered output and diagnostics to aid triage and these are written to a NNNN.full file which is retained for inspection after the test has been run.

Not run

If QA test script NNNN decides it cannot or should not be run on the current platform, it should create a NNNN.notrun file. Optionally NNNN.notrun contains the reason the test has not been run.

The convenience function _notrun() defined in qa/common.check may be used to create the file, write the reason to the file and exit.

Some common reasons for a test to be "not run" are:

PMNS ordering

If a QA test script uses an application that traverses the PMNS for a non-leaf metric name (usually from the command line or a configuration file) then depending on the PMDA(s) involved, there is a chance that the names in the PMNS may be processed in different orders on different platforms.

The simplest solution is to enumerate the PMNS nodes of interest in the QA script using pminfo, sort the list and then have the application being tested operate on the leaf nodes of the PMNS one at a time. qa/647 illustrates an example of this approach.

Instance ordering

Some PMDAs maintain their instance domains in a manner that may present instances in non-deterministic order, so while all instances may be present on all platforms, the sequence of the instances within a instance domain or in a pmResult is not the same everywhere.

The QA helper application $here/src/sortinst (source in qa/src/sortinst.c may be use to re-order the reported instances so the sequence is deterministic. See qa/1180 for an example that uses $here/src/sortinst to re-order pminfo -f output.

Sort ordering

Unfortunately sort does not produce the same sorted order on all platforms by default. We need to take control of LC_COLLATE and the decision has been made to standardize on POSIX as the collating sequence for PCP QA.

$LC_COLLATE is set and exported into the environment in common.rc, but this is a relatively recent change so LC_COLLATE=POSIX is liberally sprinkled throughout the QA test suite ahead of running sort.

Filesystem directory order

Some PMDAs use readdir() or similar routines to scan a directory contents to find metrics and/or instances. This practice is certain to expose platform differences as the order of directory entries is unlikely to be deterministic.

Judicious use of sort will be required. See qa/496 for a typical example of how this should be done.

Alternate .out files

For some PMDAs and/or some QA test scripts, no amount of clever engineering will hide the fact that a "correct" test execution on one platform may produce different output to "correct" test execution on another platform. In these cases the simplest choice is to not have a single NNNN.out file, but rather have a set of them and the QA test script chooses the most appropriate one and links this to NNNN.out when it starts.

Alternate NNNN.out files may be required in the following types of cases: