qa/README

   1 Notes on using the PCP QA Suite
   2 ===============================
   3
   4 Preliminaries
   5 -------------
   6
   7     The PCP QA Suite is designed with a philosophy that it is trying to
   8     exercise the code in a context that is as close as possible to that
   9     which an end-user would experience. For this reason, the PCP software to
  10     be tested should be installed in the "usual" places, with the "usual"
  11     permissions and operate on the "usual" ports.
  12
  13     In particular the QA Suite does not execute PCP applications like pmcd,
  14     pmlogger, pminfo, pmie, pmval, etc from the source tree.  Rather they
  15     need to have been built and packaged and installed on the local system
  16     prior to starting any QA.  Refer to the ../Makepkgs script for a recipe
  17     that may be used to build packages for a variety of platforms.
  18
  19     Further the PCP QA Suite exercises and tests aspects of the PCP
  20     packaging, use of certain local accounts, interaction with system
  21     daemons, init systems, a number of PCP-related system administrative
  22     functions, e.g. to stop and start PCP services.  Refer to the notes
  23     on sudo below.
  24
  25     But this also means the QA Suite may alter existing system configuration
  26     files, and this introduces some risk, so PCP QA should not be run
  27     on production systems.  Historically we have used developer systems
  28     and dedicated QA systems for running the full QA Suite - VMs are
  29     particularly well-suited to this task.
  30
  31     In addition to the base PCP package installation, the sample and simple
  32     PMDAs need to be installed (however the QA infrastructure will take
  33     care of this, e.g. by running ./check 0).
  34
  35 Basic getting started
  36 ---------------------
  37
  38     There is some local configuration needed ... check the file
  39     "common.config" ... this script uses heuristics to set a number of
  40     interesting variables, specifically:
  41
  42     $PCPQA_CLOSE_X_SERVER
  43         The $DISPLAY setting for an X server that is willing to accept
  44         connections from X clients running on the local machine.  This is
  45         optional, and if not set any QA tests dependent on this will
  46         be skipped.
  47
  48     $PCPQA_FAR_PMCD
  49         The hostname for a host running pmcd, but the host is preferably
  50         a long way away (over a WAN) for timing test.  This is optional,
  51         and if not set any QA tests dependent on this will be skipped.
  52
  53     $PCPQA_HYPHEN_HOST
  54         The hostname for a host running pmcd, with a hyphen (-) in the
  55         hostname.  This is optional, and if not set any QA tests dependent
  56         on this will be skipped.
  57
  58     Next, mk.qa_hosts is a script that includes heuristics for selecting
  59     and sorting the list of potential remote PCP QA hosts (qa_hosts.master).
  60     Refer to the comments in qa_hosts.master, and make appropriate changes.
  61
  62     For each of the potential remote PCP QA hosts, the following must be
  63     set up:
  64
  65     (a) PCP installed from packages,
  66     (b) pmcd(1) running,
  67     (c) a login for the user "pcpqa" needs to be created, and then set
  68         up in such a way that ssh/scp will work without the need for any
  69         password, i.e. these sorts of commands
  70             $ ssh pcpqa@pcp-qa-host some-command
  71             $ scp some-file pcpqa@pcp-qa-host:some-dir
  72         must work correctly when run from the local host.  The "pcpqa"
  73         user's environment must also be initialized so that their shell's
  74         path includes all of the PCP binary directories (identify these
  75         with $ grep BIN /etc/pcp.conf), so that all PCP commands are
  76         executable without full pathnames.  Of most concern would be
  77         auxilliary directory (usually /usr/pcp/bin, /usr/share/pcp/bin or
  78         /usr/libexec/pcp/bin) where commands like pmlogger(1), pmhostname(1),
  79         mkaf(1), etc.) are installed. And finally, the "pcpqa" user needs
  80         to be included in the group "pcp".
  81
  82     Once you've modified common.config and qa_hosts.master, then run
  83     "chk.setup" to validate the settings.
  84
  85     For test 051 we need five local hostnames that are valid, although PCP
  86     does not need to be installed there, nor pmcd(1) running.  The five
  87     hosts listed in 051.hosts (the comments at the start of this file
  88     explain what is required) should suffice for most installations.
  89
  90     The PCP QA tests are designed to be run by a non-root user.  Where root
  91     privileges are needed, e.g. to stop or start pmcd, install/remove
  92     PMDAs, etc. the "sudo" application is used.  When using sudo for QA,
  93     your current or pcpqa user needs to be able to execute commands as
  94     root without being prompted for a password.  This can be achieved by
  95     adding the following line to the /etc/sudoers file (or in more recent
  96     versions of sudo, a /etc/sudoers.d/pcpqa file):
  97
  98         pcpqa   ALL=(ALL) NOPASSWD: ALL
  99
 100     Some tests are graphical, and wish to make use of your display.
 101     For authentication to success, you may find you need to perform some
 102     access list updates, e.g. "xhost +local:" for such tests to pass
 103     (e.g. test 325).
 104
 105     You can now verify your QA setup, by running:
 106
 107         ./check 000
 108
 109     The first time you run "check" (see below) it will descend into the
 110     src directory (see below) and make all of the QA test programs and
 111     dynamic PCP archives, so some patience may be required.
 112
 113     If test 000 fails, it may be that you have locally developed PMDAs
 114     or optional PMDAs installed.  Edit common.filter, and modify the
 115     _filter_top_pmns() procedure to strip the top-level name components
 116     for any new metric names (there are lots of examples already there)
 117     ... if these are distributed (shipped) PMDAs, please update the list.
 118
 119     Firewalls can get in the way.  In addition to the standard pmcd port(s)
 120     (TCP ports 44321, 44322 and 44323) one needs to open ports to allow
 121     incoming connections and outgoing connections on a range of ports
 122     for pmdatrace, pmlogger connections via pmlc, and some QA tests.
 123     Opening the TCP range 4320 to 4350 (inclusive) should suffice.
 124
 125     If the avahi services are to be tested, then the firewall also needs
 126     to allow mDNS traffic (UDP, port 5353), for both external and internal
 127     connections.
 128
 129
 130 Doing the Real Work
 131 -------------------
 132
 133     check ...
 134         This script runs tests and verifies the output.  In general, test NNN
 135         is expected to terminate with an exit status of 0, no core file and
 136         produce output that matches that in the file NNN.out ... failures
 137         leave the current output in NNN.out.bad, and may leave a more
 138         verbose trace that is useful for diagnosing failures in NNN.full.
 139
 140         The command line options to check are:
 141
 142         NNN     run test NNN (leading zeros will be added as necessary to
 143                 the test sequence number, so 00N and N are equivalent)
 144
 145         NNN-    all tests >= NNN
 146
 147         NNN-MMM all tests in the range NNN ... MMM
 148
 149         -l      diffs in line mode (the default is to use xdiff or similar)
 150
 151         -n      show me, do not run any tests
 152
 153         -q      quick mode, by-pass the initial setup integrity checks
 154                 (recommended that you do not use this the first time, nor
 155                 if the last run test failed)
 156
 157         -g xxx  include tests from a named group (xxx) ... refer to the
 158                 "groups" file
 159
 160         -x xxx  exclude tests from a named group (xxx) ... refer to the
 161                 "groups" file
 162
 163         If none of the NNN variants or -g is specified, then the default
 164         is to run all tests.
 165
 166         Each of the NNN scripts that may be run by check follows the same
 167         basic scheme:
 168
 169         - include some optional shell procedures and set variables to
 170           define the local configuration options
 171         - optionally, check the run-time environment to see if it makes
 172           sense to run the test at all, and if not echo the reason to the
 173           file NNN.notrun and exit ... check will notice the NNN.notrun
 174           file and skip any testing of the exit status or comparison
 175           of output
 176         - define $tmp as a prefix to be used for all temporary files, and
 177           install a trap handler to remove temporary files when the scipt
 178           exits
 179         - optionally, check the run-time environment to choose one of
 180           a number of expected output formats, and link the selected
 181           file to NNN.out ... if the same output is expected in all
 182           environments, the NNN.out file will already exist as part of
 183           the PCP QA distribution
 184         - run the test
 185         - optionally save all the output in the file NNN.full ... this
 186           is only useful for debugging test failures
 187         - filter the output to produce deterministic output that will
 188           match NNN.out if the test has been successful
 189
 190     remake NNN
 191         This script creates a new NNN.out file.  Since the NNN.out files
 192         are precious, and reflect the state of the qualified and expected
 193         output, they should typically not be changed unless some change
 194         has been made to the NNN script or the filters it uses.
 195
 196     new
 197         Make sure "group" is writeable, then run "new" to create the
 198         skeletal framework of a new test.
 199
 200         It is strongly suggested that you base your test on an existing test
 201         ... pay particular attention to making the output deterministic
 202         so the test uses the "not run" protocols (see 009 and check for
 203         examples) to avoid running the test (and hence failing) if an
 204         optional application, feature or platform is not available, and
 205         uses appropriate filters (see common.filter for lots of useful
 206         filters already packaged as shell procedures).
 207
 208     show-me ...
 209         Report differences between the NNN.out and NNN.out.bad files.
 210         By default, uses all of the NNN.out.bad files in the current
 211         directory, but can also specify test numbers or ranges of test
 212         numbers on the command line.
 213
 214         Other options may be used to fetch good and bad output files from
 215         various exotic remote locations (refer to the script).
 216
 217
 218 Make in the src Directory
 219 -----------------------------
 220
 221     The src directory contains a number of test applications that are
 222     designed to exercise some of the more exotic corners of the PCP
 223     functionality.
 224
 225     In making these applications, you may see this ...
 226
 227         Error: trace_dev.h and ../../src/include/trace_dev.h are different!
 228         make: [trace_dev.h] Error 1 (ignored)
 229
 230     this is caused by the source for the pcp_trace library being out of sync
 231     with the src applications.  If this happens, please ...
 232
 233     1. cd src
 234     2. diff -u trace_dev.h ../../src/include/trace_dev.h
 235        and mail the differences to pcp@oss.sgi.com so we can refine the
 236        Makefiles to avoid cosmetic differences
 237     3. mv trace_dev.h trace_dev.h.orig
 238        cp ../../src/include/trace_dev.h trace_dev.h
 239     4. make
 240
 241
 242 008 Issues
 243 ----------
 244
 245     Test 008 depends on the local disk configuration, so you need to
 246     make your own 008.out file (or rather a variant that 008 will link to
 247     008.out when the test is run).  Refer to the 008 script, but here is
 248     the basic recipe:
 249
 250         $ touch touch 008.out.`hostname`
 251         $ remake 008
 252         $ mv 008.out 008.out.`hostname`
 253
 254     Be aware that it can be adversely influenced by temporary disks like
 255     USB sticks, mobile phones, or other transient storage that may come
 256     and go in your test systems.
 257
 258
 259 Fixes
 260 -----
 261
 262     If you find something that does not work, and fix it, or create
 263     additional QA tests, please send the details to pcp@oss.sgi.com.
 264