








                         [1mTesting?  What testing?[0m

                               [4mPeter[24m [4mMiller[0m
                           Platypus Technology


                                 [4mABSTRACT[0m


            This  paper  presents  a  simplistic  yet powerful
            model of what a test is.  When you intend to  test
            your software, you have to design your software to
            be test[4mable[24m.  This paper will  examine  attributes
            of  software implied by this model.  Some examples
            of automated testing will be given.





       [1m1.  What is a test?[0m



       The core thesis of this paper  is  the  idea1  that  a  test
       consists  of  three  things:  a system in a defined state, a
       defined transaction, and  a  confirmation  that  the  system
       arrives in a defined state.
       [40m[0m
                          initial[40m-------destination[0m
                           state[40mtransactionstate[0m



       [40mThis   is   an  overly  simplistic  statement,  but  remains[0m
       [40mremarkable useful.  The  "system"  under  test  could  be  a[0m
       [40msimple object, a collection of interrelated objects, a whole[0m
       [40mapplication,  or  a  distributed  multi-layer  client-server[0m
       [40msystem.   Equally, the transaction could be a single byte of[0m
       [40minput, a single edge of a state  transition  diagram,  or  a[0m
       [40mseries  of  transactions  lumped  together as a single event[0m
       [40mbeing considered.[0m

       [40mConfirming that the system  under  test  has  arrived  in  a[0m
       [40mparticular  state  can be done in may ways.  Some states are[0m
       [40mclearly  visable,  sometimes  they  are  available  but  not[0m
       [40museful,   and   some   internal  states  are  not  for  user[0m

       ____________________

       1. There is a growing body of knowledge called  "Transaction
          Based    testing"   or   sometimes   "Transaction   Based
          Verification".

       Testing?  What testing? Peter Miller                  Page 1





                                   - 2 -



       [40mconsumption and are much  harder  to  access  and  therefore[0m
       [40mharder to confirm.[0m
       [40m[0m
                          [40minitial      destination[0m
                           [40mstate          state[0m



                                          [4m[40moops[0m



       [40mPlease  note that this is a [4msimplistic[24m definition of a test.[0m
       [40mIt does not cover all forms of testing  (such  as  tests  of[0m
       [40musability,  maintainability,  portability, robustness and so[0m
       [40mon  which  make  up  the   other   zillion   software   sub-[0m
       [40mcharacteristics  listed in ISO 9126) and it is no substitute[0m
       [40mfor a well thought out test plan.  It does, however, provide[0m
       [40msome language for talking about functional testing.[0m


       [1m[40m2.  Manual testing is no testing[0m



       [40mHumans  are really bad at boring, repetitive tasks.  If your[0m
       [40mtest plan  is  based  on  the  idea  that  your  staff  will[0m
       [40mfaithfully  execute  a long list of printed instructions, at[0m
       [40mleast once per release, then your testing  is  probably  not[0m
       [40meffective.[0m

       [40mFor  example,  many manual test plans contain long sequences[0m
       [40mof things  the  operator  is  required  to  do,  often  with[0m
       [40minformation  on the screen to be confirmed as correct.  This[0m
       [40mis all very well for successful tests, but what happens when[0m
       [40mone  fails?  Usually, these test scripts cover large numbers[0m
       [40mof behaviors.  There is thus a motivation  to  complete  the[0m
       [40mrest  of  the  script,  rather than stop, and have to do the[0m
       [40mstart of the script again when the software has been  fixed.[0m
       [40m[0m
                                          [40m+[0m
                                           [40m+[0m
                                           [40m+[0m
                                           [40m+[0m



       [40mThere  are  two  themes  here:  (a)  testers  have  to  look[0m
       [40m"productive" or they might not get paid, and (b) redoing the[0m
       [40mfirst bit again and again is boring.[0m

       [40mLet's  look  at  that  definition again, rephrasing what our[0m
       [40mmanual test scripts are doing.  "Usually, these test scripts[0m
       [40mstart  from  a defined state, and define a transaction and a[0m

       
       Testing?  What testing? Peter Miller                  Page 2





                                   - 3 -



       [40mconfirmations  of  the  destination  state,  then  the  next[0m
       [40mtransaction   and  confirmation,  [4mad[24m  [4mnauseum[24m."   Now,  what[0m
       [40mhappens when one of those  confirmations  fails?   Well,  we[0m
       [40mknow  it's  in  [4mthe[24m  [4mwrong[24m [4mstate[24m, so going on to execute the[0m
       [40mrest of the script, we are no longer fulfilling the  initial[0m
       [40mportion  of  our  three-part  definition:  we  aren't in the[0m
       [40mdefined state that the transaction  is  to  be  applied  to.[0m
       [40mAfter  the  first  failure,  the  rest of the results are [4mno[0m
       [4m[40minformation[24m.[0m
       [40m[0m
                                     [4m[40moops[0m










       [40mFor effective testing, then, you need something that is very[0m
       [40mgood  at  accurately repeating the same script over and over[0m
       [40magain, and  reporting  very  promptly  when  something  goes[0m
       [40mwrong.   Computers  are  very  good  at  boring, repetitious[0m
       [40mtasks.  They don't complain when you ask  them  to  run  the[0m
       [40msame stupid scripts tens or even thousands of times.  And if[0m
       [40mthe script breaks, they stop.  For effective testing,  then,[0m
       [40myou need automated testing.  Let the humans [4mwrite[24m the tests,[0m
       [40mand let the computers [4mrun[24m the tests.[0m


       [1m[40m3.  Software Attributes[0m



       [40mAutomated testing requires the ability to automatically  get[0m
       [40mthe  system  under test into a defined state, the ability to[0m
       [40mautomatically apply one or more transaction, and the ability[0m
       [40mto automatically confirm the current state (either read-and-[0m
       [40mcompare, or write-and-diff, usually).[0m

       [40mSome things are easy to test, e.g.[0m
        [40mcat > test.in[0m
        [40mcat > test.sed[0m
        [40mcat > expected-output[0m
        [40msed-clone -f test.sed test.in \[0m
            [40m> test.out[0m
        [40mdiff expected-output test.out[0m

       [40mBut some things require some specific  changes  to  get  the[0m
       [40mthree  properties.   [4mE.g.[24m  a virtual machine simulator needs[0m
       [40mthe ability to set registers and stack, [4metc[24m,  and  later  to[0m
       [40mdump  them do they can be confirmed.  This may be observable[0m

       
       Testing?  What testing? Peter Miller                  Page 3





                                   - 4 -



       [40m[4me.g.[24m  as  some  interesting  opcodes  only  present  in  the[0m
       [40msimulator,  and  not  the  real  machine,  maybe  to get the[0m
       [40msimulator to exit with a success/fail indicator.[0m

       [1m[40m3.1  Initial State[0m

       [40mThe system under test needs a way to be  placed  in  a  well[0m
       [40mdefined initial state.  This is something that most programs[0m
       [40mare reasonably good at.  Word processors can  load  a  file,[0m
       [40mimage processing systems can load an image, databases can be[0m
       [40mcreated and populated with test sets, [4metc[24m.[0m

       [40mIt was mentioned above that transactions can actually  be  a[0m
       [40mseries of transactions.  Sometimes, getting the system under[0m
       [40mtest into a defined state requires starting from the default[0m
       [40mstate  and  applying a series of known-to-work transactions.[0m
       [40mProvided that  you can [4mget[24m the  system  under  test  into  a[0m
       [40mdefined state automatically, it can be tested automatically.[0m

       [1m[40m3.2  Transactions[0m

       [40mAutomating transactions can often be  the  hardest  part  of[0m
       [40mautomated  testing.   Usually,  this  means  automating  the[0m
       [40msimulation of input.  This could be user input, or a network[0m
       [40mconnection,   or  a  hardware  simulation  for  an  embedded[0m
       [40mapplication.[0m

       [4m[40m3.2.1[24m  [4mCommand[24m [4mLine[0m

       [40mThe design  of  UNIX  makes  the  testing  of  command  line[0m
       [40mprograms  relatively  simple, because you can redirect input[0m
       [40mfrom a file.  This means that you  don't  actually  need  to[0m
       [40mchange your software (or not much, anyway).[0m

       [4m[40m3.2.2[24m  [4mFull[24m [4mScreen[0m

       [40mFull-screen  programs  are  often  similar, with input again[0m
       [40mdirected from a file, although  you  may  need  to  make  it[0m
       [40mtolerant  of  non-tty  input possibly under the control of a[0m
       [40mcommand line option.  The trickier cases can be handled with[0m
       [4m[40mexpect[24m.[0m

       [4m[40m3.2.3[24m  [4mGUI[0m

       [40mOn the other hand GUI interfaces are harder.  There are some[0m
       [40mutilities, such as [4mTkReplay[24m which help.  But they lead us to[0m
       [40mlooking  at the problem differently: where can we inject the[0m
       [40minput?[0m
       [40mWe can inject it into the X server (or have a fake X  server[0m
       [40mwhich exists solely to provide test input).[0m
       [40mWe  can  proxy  the  X  server, and inject the input via the[0m
       [40mproxy.[0m
       [40mWe can inject it into the event  loop  of  our  application.[0m
       [40mThis, of course, requires changing the system under test.[0m

       
       Testing?  What testing? Peter Miller                  Page 4





                                   - 5 -



       [40mWe  can  have  alternate  input classes, a "real" one and an[0m
       [40m"automated" one.  This, of course,  means  that  the  "real"[0m
       [40minput  class  doesn't get tested, but the rest of the system[0m
       [40mdoes, and that may be enough.[0m

       [4m[40m3.2.4[24m  [4mClient[24m [4mServer[0m

       [40mMost of the techniques useful for X programs work for client[0m
       [40mserver   systems  as  well.   Fake  clients,  fake  servers,[0m
       [40mproxies, alternative input classes, [4metc[24m.[0m

       [4m[40m3.2.5[24m  [4mObservation[0m

       [40mIn order to test the system, some aspect of it was  changed.[0m
       [40mAuxiliary  test support, more tolerant input, multiple input[0m
       [40msources.[0m

       [1m[40m3.3  Verify State[0m

       [40mSome programs, such as the  [4msed[24m  example  given  above,  are[0m
       [40mrelatively  easy to test.  Many programs store a significant[0m
       [40mamount of state when you save to a file,  and  this  may  be[0m
       [40mcompared  with  [4mdiff[24m(1)  or [4mcmp[24m(1).  Other systems, however,[0m
       [40mare more challenging.[0m

       [4m[40m3.3.1[24m  [4mFull[24m [4mScreen[0m

       [40mMany [4mcurses[24m(3) programs need a special command to  dump  the[0m
       [40mscreen into a text file for comparison using [4mdiff[24m(1).  It is[0m
       [40malso possible to use [4mexpect[24m in many cases.[0m

       [4m[40m3.3.2[24m  [4mGUI[0m

       [40mMany of the input solutions also work for  output,  but  you[0m
       [40mwill probably need special commands or options to get screen[0m
       [40mdumps at strategic moments, for comparison.[0m

       [40mWholesale capture and comparison of  the  output  stream  is[0m
       [40mproblematic,  usually  because of gratuitous differences not[0m
       [40mrelevant to the test.[0m

       [4m[40m3.3.3[24m  [4mClient[24m [4mServer[0m

       [40mYou can use bogus clients, bogus servers, or clever proxies.[0m

       [4m[40m3.3.4[24m  [4mObservation[0m

       [40mIn  order to test the system, some aspect of it was changed.[0m
       [40mAuxiliary test support,  captured  output,  multiple  output[0m
       [40mdestinations.[0m


       [1m[40m4.  Discussion[0m


       
       Testing?  What testing? Peter Miller                  Page 5





                                   - 6 -





       [40mThere  are  some  things  which  arise from consideration of[0m
       [40mthese ideas.[0m

       [1m[40m4.1  No Result[0m

       [40mIn coming up with a  testing  regime,  it  is  necessary  to[0m
       [40mremember that tests do not simply [4mpass[24m or [4mfail[24m.[0m

       [40mThis  is  further  complicated by the inverted sense of some[0m
       [40mtests.  For example, your development  process  may  require[0m
       [40mthat  a  bug fix be accompanied by a test which [4mfails[24m on the[0m
       [40munfixed system, and [4mpasses[24m on the fixed system.[0m

       [40mConsider the issues in achieving a necessary  initial  state[0m
       [40mby  applying transactions to an initial state.  What happens[0m
       [40mwhen  one  of  these  transactions,  which   are   not   the[0m
       [40mtransaction under test, [4mfail[24m?  In such a case it can't [4mfail[24m,[0m
       [40mbecause the bug fix case will give  a  false  [4mpositive[24m,  but[0m
       [40mequally  it  can't  succeed  because  this  renders the test[0m
       [40mmeaningless.[0m

       [40mThe solution is to have a  third  result,  often  called  [4mno[0m
       [4m[40mresult[24m, which when negated still means [4mno[24m [4mresult[24m.[0m

       [40mSimilar   problems   can  occur  with  the  transaction  and[0m
       [40mverification stages of the test.[0m

       [1m[40m4.2  Negative Testing[0m

       [40mSome other examples of negative testing will be given  (i.e.[0m
       [4m[40mdidn't[24m  arrive  in  the right state, or invalid transactions[0m
       [40mresulting in an invalid state change).[0m

       [1m[40m4.3  Watch Me[0m

       [40mA useful facility for creating tests is a "watch  me"  mode.[0m
       [40mThis  is a mode or tool or whatnot that allows the system to[0m
       [40mrecord  inputs  and  output  for  replay  and   confirmation[0m
       [40m(respectively)  at  a  later time.  While this is [4mnot[24m one of[0m
       [40mthe necessary attributes, it is often a useful side  effect.[0m

       [1m[40m4.4  Assert[0m

       [40mThis  simple  model of testing gives a different spin on the[0m
       [40mhumble assert statement.  The use of assert can  be  thought[0m
       [40mof as verifying that the system is in a particular state, or[0m
       [40mthat the transaction (input) is valid.  This is not the kinf[0m
       [40mof  artifact  you  [4mwant[24m  to  see  in  production code; it is[0m
       [40musually compiled out of production code.[0m




       
       Testing?  What testing? Peter Miller                  Page 6





                                   - 7 -



       [40m[1m4.5  Trace on Request[0m

       [40mAnother thing which is often compiled out of production code[0m
       [40mis  a  variety of tracing macros, which allow you to see the[0m
       [40mstate  of  various  portions  of  the  system  as  they  are[0m
       [40mexecuted.   You  sometimes  see  this in production systems,[0m
       [40mwhene there is little performance impact;  it  is  extremely[0m
       [40museful feature for tech support, as well as testing.[0m


       [1m[40m5.  Testing?  What testing?[0m



       [40mI  once  worked  on an image processing system for which the[0m
       [40mcompany had partial source, and  the  inner  workings  where[0m
       [40msupplied   as  a  library  from  the  vendor.   One  of  the[0m
       [40mtransforms had some trouble, and I  fixed  it,  but  then  I[0m
       [40mwondered  how  I should test it.  How many of us can confirm[0m
       [40mvisually that  a  2D  Walsh-Hadamard  transform  has  worked[0m
       [40mcorrectly?   While  the destination state was visible on the[0m
       [40mscreen, giving humans 2 side-by-side pictures  (a  "does  it[0m
       [40mlook like this" manual test) you will almost certainly get a[0m
       [40mfalse positive.  [4mE.g.[24m those "find  10  differences"  cartoon[0m
       [40mpictures on the funnies section of the newspaper.  If humans[0m
       [40mare so bad at spotting [4mgross[24m differences, how can we  expect[0m
       [40mthem to find one pixel different in a million?  So, I looked[0m
       [40mfor the tool to compare two images  and  tell  me  how  many[0m
       [40mpixels  were  different.   [4mThere[24m  [4mwasn't[24m  [4mone.[24m   How did the[0m
       [40mvendor test their product?[0m

       [40mIf you have testability as a requirement of  your  software,[0m
       [40myou  will  write  different software than if testability was[0m
       [40mnot a requirement.[0m

       [40mDo  all  the  tools  we  use  every  day  have  these  three[0m
       [40mproperties: Can their initial state be loaded automatically?[0m
       [40mCan their transactions be applied automatically?  Can  their[0m
       [40mdestination state be confirmed automatically?  If any one of[0m
       [40mthese is missing (but usually the last one), what  gives  us[0m
       [40many confidence that they were tested at all?[0m














       
       Testing?  What testing? Peter Miller                  Page 7


