cprover
/builddir/build/BUILD/cbmc-cbmc-5.8/doc/cprover-manual.md
Go to the documentation of this file.
1 \ingroup module_hidden
2 \page cprover-manual CProver User Manual
3 
4 \author Daniel Kroening
5 
6 This tutorial is intended for users of CProver (CBMC, SatAbs, and
7 associated tools).
8 
9 \tableofcontents
10 
11 \section man_introduction Introduction
12 
13 ## Motivation
14 
15 Numerous tools to hunt down functional design flaws in silicon have been
16 available for many years, mainly due to the enormous cost of hardware
17 bugs. The use of such tools is wide-spread. In contrast, the market for
18 tools that address the need for quality software is still in its
19 infancy.
20 
21 Research in software quality has an enormous breadth. We focus the
22 presentation using two criteria:
23 
24 1. We believe that any form of quality requires a specific *guarantee*,
25  in theory and practice.
26 2. The sheer size of software designs requires techniques that are
27  highly *automated*.
28 
29 In practice, quality guarantees usually do not refer to "total
30 correctness" of a design, as ensuring the absence of all bugs is too
31 expensive for most applications. In contrast, a guarantee of the absence
32 of specific flaws is achievable, and is a good metric of quality.
33 
34 We document two programs that try to achieve formal guarantees of the
35 absence of specific problems: CBMC and SATABS. The algorithms
36 implemented by CBMC and SATABS are complementary, and often, one tool is
37 able to solve a problem that the other cannot solve.
38 
39 Both CBMC and SATABS are verification tools for ANSI-C/C++ programs.
40 They verify array bounds (buffer overflows), pointer safety, exceptions
41 and user-specified assertions. Both tools model integer arithmetic
42 accurately, and are able to reason about machine-level artifacts such as
43 integer overflow. CBMC and SATABS are therefore able to detect a class
44 of bugs that has so far gone unnoticed by many other verification tools.
45 This manual also covers some variants of CBMC, which includes HW-CBMC
46 for
47 \ref man_hard-soft-introduction "hardware/software co-verification".
48 
49 ## Bounded Model Checking with CBMC
50 
51 CBMC implements a technique called *Bounded Model Checking* (BMC). In
52 BMC, the transition relation for a complex state machine and its
53 specification are jointly unwound to obtain a Boolean formula, which is
54 then checked for satisfiability by using an efficient SAT procedure. If
55 the formula is satisfiable, a counterexample is extracted from the
56 output of the SAT procedure. If the formula is not satisfiable, the
57 program can be unwound more to determine if a longer counterexample
58 exists.
59 
60 In many engineering domains, real-time guarantees are a strict
61 requirement. An example is software embedded in automotive controllers.
62 As a consequence, the loop constructs in these types of programs often
63 have a strict bound on the number of iterations. CBMC is able to
64 formally verify such bounds by means of *unwinding assertions*. Once
65 this bound is established, CBMC is able to prove the absence of errors.
66 
67 A more detailed description of how to apply CBMC to verify programs is
68 \ref man_cbmc-tutorial "here".
69 
70 ## Automatic Program Verification with SATABS
71 
72 In many cases, lightweight properties such as array bounds do not rely
73 on the entire program. A large fraction of the program is *irrelevant*
74 to the property. SATABS exploits this observation and computes an
75 *abstraction* of the program in order to handle large amounts of code.
76 
77 In order to use SATABS it is not necessary to understand the abstraction
78 refinement process. For the interested reader, a high-level introduction
79 to abstraction refinement is provided
80 \ref man_satabs-overview "here".
81 We also provide
82 \ref man_satabs-tutorials "tutorials on how to use SATABS".
83 
84 Just as CBMC, SATABS attempts to build counterexamples that refute the
85 property. If such a counterexample is found, it is presented to the
86 engineer to facilitate localization and repair of the program.
87 
88 ### Example: Buffer Overflows
89 
90 In order to give a brief overview of the capabilities of CBMC and SATABS
91 we start with a small example. The issue of *[buffer
92 overflows](http://en.wikipedia.org/wiki/Buffer_overflow)* has obtained
93 wide public attention. A buffer is a contiguously-allocated chunk of
94 memory, represented by an array or a pointer in C. Programs written in C
95 do not provide automatic bounds checking on the buffer, which means a
96 program can – accidentally or maliciously – write past a buffer. The
97 following example is a perfectly valid C program (in the sense that a
98 compiler compiles it without any errors):
99 
100 ```{.c}
101 int main()
102 {
103  int buffer[10];
104  buffer[20] = 10;
105 }
106 ```
107 
108 However, the write access to an address outside the allocated memory
109 region can lead to unexpected behavior. In particular, such bugs can be
110 exploited to overwrite the return address of a function, thus enabling
111 the execution of arbitrary user-induced code. CBMC and SATABS are able
112 to detect this problem and reports that the "upper bound property" of
113 the buffer is violated. CBMC and SATABS are capable of checking these
114 lower and upper bounds, even for arrays with dynamic size. A detailed
115 discussion of the properties that CBMC and SATABS can check
116 automatically is \ref man_instrumentation-properties "here".
117 
118 ## Hardware/Software Co-Verification
119 
120 Software programs often interact with hardware in a non-trivial manner,
121 and many properties of the overall design only arise from the interplay
122 of both components. CBMC and SATABS therefore support *Co-Verification*,
123 i.e., are able to reason about a C/C++ program together with a circuit
124 description given in Verilog.
125 
126 These co-verification capabilities can also be applied to perform
127 refinement proofs. Software programs are often used as high-level
128 descriptions of circuitry. While both describe the same functionality,
129 the hardware implementation usually contains more detail. It is highly
130 desirable to establish some form for equivalence between the two
131 descriptions. Hardware/Software co-verification and equivalence checking
132 with CBMC and SATABS are described
133 \ref man_hard-soft-introduction "here".
134 
135 
136 \section man_installation Installation
137 
138 \subsection man_install-cbmc Installing CBMC
139 
140 ### Requirements
141 
142 CBMC is available for Windows, i86 Linux, and MacOS X. CBMC requires a
143 code pre-processing environment comprising of a suitable preprocessor
144 and an a set of header files.
145 
146 1. **Linux:** the preprocessor and the header files typically come with
147  a package called *gcc*, which must be installed prior to the
148  installation of CBMC.
149 
150 2. **Windows:** The Windows version of CBMC requires the preprocessor
151  `cl.exe`, which is part of Microsoft Visual Studio. We recommend the
152  free [Visual Studio Community
153  2013](http://www.visualstudio.com/en-us/products/visual-studio-community-vs).
154 
155 3. **MacOS:** Install the [XCode Command Line
156  Utilities](http://developer.apple.com/technologies/xcode.html) prior
157  to installing CBMC. Just installing XCode alone is not enough.
158 
159 Important note for Windows users: Visual Studio's `cl.exe` relies on a
160 complex set of environment variables to identify the target architecture
161 and the directories that contain the header files. You must run CBMC
162 from within the *Visual Studio Command Prompt*.
163 
164 Note that the distribution files for the [Eclipse
165 plugin](installation-plugin.shtml) include the CBMC executable.
166 Therefore, if you intend to run CBMC exclusively within Eclipse, you can
167 skip the installation of the CBMC executable. However, you still have to
168 install the compiler environment as described above.
169 
170 ### Installing the CBMC Binaries
171 
172 1. Download CBMC for your operating system. The binaries are available
173  from http://www.cprover.org/cbmc/.
174 2. Unzip/untar the archive into a directory of your choice. We
175  recommend to add this directory to your `PATH` environment variable.
176 
177 You are now ready to \ref man_cbmc-tutorial "use CBMC"!
178 
179 ### Building CBMC from Source
180 
181 Alternatively, the CBMC source code is available [via
182 SVN](http://www.cprover.org/svn/cbmc/). To compile the source code,
183 follow [these
184 instructions](http://www.cprover.org/svn/cbmc/trunk/COMPILING).
185 
186 \subsection man_install-satabs Installing SATABS
187 
188 ### Requirements
189 
190 SATABS is available for Windows, i86 Linux, and MacOS X. SATABS requires
191 a code pre-processing environment comprising of a suitable preprocessor
192 and an a set of header files.
193 
194 1. **Linux:** the preprocessor and the header files typically come with
195  a package called *gcc*, which must be installed prior to the
196  installation of SATABS.
197 2. **Windows:** The Windows version of SATABS requires the preprocessor
198  `cl.exe`, which is part of Visual Studio (including the free [Visual
199  Studio
200  Express](http://msdn.microsoft.com/vstudio/express/visualc/)).
201 3. **MacOS:** Install
202  [XCode](http://developer.apple.com/technologies/xcode.html) prior to
203  installing SATABS.
204 
205 Important note for Windows users: Visual Studio's `cl.exe` relies on a
206 complex set of environment variables to identify the target architecture
207 and the directories that contain the header files. You must run SATABS
208 from within the *Visual Studio Command Prompt*.
209 
210 Note that the distribution files for the [Eclipse
211 plugin](installation-plugin.shtml) include the command-line tools.
212 Therefore, if you intend to run SATABS exclusively within Eclipse, you
213 can skip the installation of the command-line tools. However, you still
214 have to install the compiler environment as described above.
215 
216 ### Choosing and Installing a Model Checker
217 
218 You need to install a Model Checker in order to be able to run SATABS.
219 You can choose between following alternatives:
220 
221 - **Cadence SMV**. Available from http://www.kenmcmil.com/smv.html.
222  Cadence SMV is a commercial model checker. The free version that is
223  available on the homepage above must not be used for commercial
224  purposes (read the license agreement thoroughly before you download
225  the tool). The documentation for SMV can be found in the directory
226  where you unzip/untar SMV under ./smv/doc/smv/. Read the
227  installation instructions carefully. The Linux/MacOS versions
228  require setting environment variables. You must add add the
229  directory containing the `smv` binary (located in ./smv/bin/,
230  relative to the path where you unpacked it) to your `PATH`
231  environment variable. SATABS uses Cadence SMV by default.
232 
233 - **NuSMV**. Available from http://nusmv.irst.itc.it/. NuSMV is the
234  open source alternative to Cadence SMV. Installation instructions
235  and documentation can be found on the NuSMV homepage. The directory
236  containing the NuSMV binary should be added to your `PATH`
237  environment variable. Use the option
238 
239  --modelchecker nusmv
240 
241  to instruct SATABS to use NuSMV.
242 
243 - **BOPPO**. Available from http://www.cprover.org/boppo/. BOPPO is
244  a model checker that uses SAT-solving algorithms. BOPPO relies on a
245  built-in SAT solver and Quantor, a solver for quantified boolean
246  formulas that is currently bundled with BOPPO, but also available
247  separately from <http://fmv.jku.at/quantor/>. We recommend to add
248  the directories containing both tools to your `PATH` environment
249  variable. Use the option
250 
251  --modelchecker boppo
252 
253  when you call SATABS and want it to use BOPPO instead of SMV.
254 
255 - **BOOM**. Available from http://www.cprover.org/boom/. Boom has a
256  number of unique features, including the verification of programs
257  with unbounded thread creation.
258 
259 ### Installing SATABS
260 
261 1. Download SATABS for your operating system. The binaries are
262  available from <http://www.cprover.org/satabs/>.
263 2. Unzip/untar the archive into a directory of your choice. We
264  recommend to add this directory to your `PATH` environment variable.
265 
266 Now you can execute SATABS. Try running SATABS on the small examples
267 presented in the \ref man_satabs-tutorials "SATABS tutorial". If you use
268 the Cadence SMV model checker, the only command line arguments you have
269 to specify are the names of the files that contain your program.
270 
271 
272 \subsection man_install-eclipse Installing the Eclipse Plugin
273 
274 ### Requirements
275 
276 We provide a graphical user interface to CBMC and SATABS, which is
277 realized as a plugin to the Eclipse framework. Eclipse is available at
278 http://www.eclipse.org. We do not provide installation instructions for
279 Eclipse (basically, you only have to download the current version and
280 extract the files to your hard-disk) and assume that you have already
281 installed the current version.
282 
283 CBMC and SATABS have their own requirements. As an example, both CBMC and SATABS require a suitable preprocessor and a set of header files. As
284 first step, you should therefore follow the installation instructions
285 for \ref man_install-cbmc "CBMC" and \ref man_install-satabs "SATABS".
286 
287 Important note for Windows users: Visual Studio's `cl.exe` relies on a
288 complex set of environment variables to identify the target architecture
289 and the directories that contain the header files. You must run Eclipse
290 from within the *Visual Studio Command Prompt*.
291 
292 ### Installing the Eclipse Plugin
293 
294 The installation instructions for the Eclipse Plugin, including the link
295 to the download site, are available
296 [here](http://www.cprover.org/eclipse-plugin/). This includes a small
297 tutorial on how to use the Eclipse plugin.
298 
299 
300 \section man_cbmc CBMC: Bounded Model Checking for C, C++ and Java
301 
302 \subsection man_cbmc-tutorial A Short Tutorial
303 
304 ### First Steps
305 
306 We assume you have already installed CBMC and the necessary support
307 files on your system. If not so, please follow the instructions
308 \ref man_install-cbmc "here".
309 
310 Like a compiler, CBMC takes the names of .c files as command line
311 arguments. CBMC then translates the program and merges the function
312 definitions from the various .c files, just like a linker. But instead
313 of producing a binary for execution, CBMC performs symbolic simulation
314 on the program.
315 
316 As an example, consider the following simple program, named file1.c:
317 
318  int puts(const char *s) { }
319 
320  int main(int argc, char **argv) {
321  puts(argv[2]);
322  }
323 
324 Of course, this program is faulty, as the `argv` array might have fewer
325 than three elements, and then the array access `argv[2]` is out of
326 bounds. Now, run CBMC as follows:
327 
328  cbmc file1.c --show-properties --bounds-check --pointer-check
329 
330 The two options `--bounds-check` and `--pointer-check` instruct CBMC to
331 look for errors related to pointers and array bounds. CBMC will print
332 the list of properties it checks. Note that it lists, among others, a
333 property labelled with "object bounds in argv" together with the location
334 of the faulty array access. As you can see, CBMC largely determines the
335 property it needs to check itself. This is realized by means of a
336 preliminary static analysis, which relies on computing a fixed point on
337 various [abstract
338 domains](http://en.wikipedia.org/wiki/Abstract_interpretation). More
339 detail on automatically generated properties is provided
340 \ref man_instrumentation-properties "here".
341 
342 Note that these automatically generated properties need not necessarily
343 correspond to bugs – these are just *potential* flaws, as abstract
344 interpretation might be imprecise. Whether these properties hold or
345 correspond to actual bugs needs to be determined by further analysis.
346 
347 CBMC performs this analysis using *symbolic simulation*, which
348 corresponds to a translation of the program into a formula. The formula
349 is then combined with the property. Let's look at the formula that is
350 generated by CBMC's symbolic simulation:
351 
352  cbmc file1.c --show-vcc --bounds-check --pointer-check
353 
354 With this option, CBMC performs the symbolic simulation and prints the
355 verification conditions on the screen. A verification condition needs to
356 be proven to be valid by a [decision
357 procedure](http://en.wikipedia.org/wiki/Decision_problem) in order to
358 assert that the corresponding property holds. Let's run the decision
359 procedure:
360 
361  cbmc file1.c --bounds-check --pointer-check
362 
363 CBMC transforms the equation you have seen before into CNF and passes it
364 to a SAT solver (more background on this step is in the book on
365 [Decision Procedures](http://www.decision-procedures.org/)). It then
366 determines which of the properties that it has generated for the program
367 hold and which do not. Using the SAT solver, CBMC detects that the
368 property for the object bounds of `argv` does not hold, and will thus
369 print a line as follows:
370 
371  [main.pointer_dereference.6] dereference failure: object bounds in argv[(signed long int)2]: FAILURE
372 
373 ### Counterexample Traces
374 
375 Let us have a closer look at this property and why it fails. To aid the
376 understanding of the problem, CBMC can generate a *counterexample trace*
377 for failed properties. To obtain this trace, run
378 
379  cbmc file1.c --bounds-check --trace
380 
381 CBMC then prints a counterexample trace, i.e., a program trace that
382 begins with `main` and ends in a state which violates the property. In
383 our example, the program trace ends in the faulty array access. It also
384 gives the values the input variables must have for the bug to occur. In
385 this example, `argc` must be one to trigger the out-of-bounds array
386 access. If you add a branch to the example that requires that `argc>=3`,
387 the bug is fixed and CBMC will report that the proofs of all properties
388 have been successful.
389 
390 ### Verifying Modules
391 
392 In the example above, we used a program that starts with a `main`
393 function. However, CBMC is aimed at embedded software, and these kinds
394 of programs usually have different entry points. Furthermore, CBMC is
395 also useful for verifying program modules. Consider the following
396 example, called file2.c:
397 
398  int array[10];
399  int sum() {
400  unsigned i, sum;
401 
402  sum=0;
403  for(i=0; i<10; i++)
404  sum+=array[i];
405  }
406 
407 In order to set the entry point to the `sum` function, run
408 
409  cbmc file2.c --function sum --bounds-check
410 
411 It is often necessary to build a suitable *harness* for the function in
412 order to set up the environment appropriately.
413 
414 ### Loop Unwinding
415 
416 When running the previous example, you will have noted that CBMC unwinds
417 the `for` loop in the program. As CBMC performs Bounded Model Checking,
418 all loops have to have a finite upper run-time bound in order to
419 guarantee that all bugs are found. CBMC can optionally check that enough
420 unwinding is performed. As an example, consider the program binsearch.c:
421 
422  int binsearch(int x) {
423  int a[16];
424  signed low=0, high=16;
425 
426  while(low<high) {
427  signed middle=low+((high-low)>>1);
428 
429  if(a[middle]<x)
430  high=middle;
431  else if(a[middle]>x)
432  low=middle+1;
433  else // a[middle]==x
434  return middle;
435  }
436 
437  return -1;
438  }
439 
440 If you run CBMC on this function, you will notice that the unwinding
441 does not stop on its own. The built-in simplifier is not able to
442 determine a run time bound for this loop. The unwinding bound has to be
443 given as a command line argument:
444 
445  cbmc binsearch.c --function binsearch --unwind 6 --bounds-check --unwinding-assertions
446 
447 CBMC verifies that verifies the array accesses are within the bounds;
448 note that this actually depends on the result of the right shift. In
449 addition, as CBMC is given the option `--unwinding-assertions`, it also
450 checks that enough unwinding is done, i.e., it proves a run-time bound.
451 For any lower unwinding bound, there are traces that require more loop
452 iterations. Thus, CBMC will report that the unwinding assertion has
453 failed. As usual, a counterexample trace that documents this can be
454 obtained with the option `--property`.
455 
456 ### Unbounded Loops
457 
458 CBMC can also be used for programs with unbounded loops. In this case,
459 CBMC is used for bug hunting only; CBMC does not attempt to find all
460 bugs. The following program (lock-example.c) is an example of a program
461 with a user-specified property:
462 
463  _Bool nondet_bool();
464  _Bool LOCK = 0;
465 
466  _Bool lock() {
467  if(nondet_bool()) {
468  assert(!LOCK);
469  LOCK=1;
470  return 1; }
471 
472  return 0;
473  }
474 
475  void unlock() {
476  assert(LOCK);
477  LOCK=0;
478  }
479 
480  int main() {
481  unsigned got_lock = 0;
482  int times;
483 
484  while(times > 0) {
485  if(lock()) {
486  got_lock++;
487  /* critical section */
488  }
489 
490  if(got_lock!=0)
491  unlock();
492 
493  got_lock--;
494  times--;
495  }
496  }
497 
498 The `while` loop in the `main` function has no (useful) run-time bound.
499 Thus, a bound has to be set on the amount of unwinding that CBMC
500 performs. There are two ways to do so:
501 
502 1. The `--unwind` command-line parameter can to be used to limit the
503  number of times loops are unwound.
504 2. The `--depth` command-line parameter can be used to limit the number
505  of program steps to be processed.
506 
507 Given the option `--unwinding-assertions`, CBMC checks whether the
508 arugment to `--unwind` is large enough to cover all program paths. If
509 the argument is too small, CBMC will detect that not enough unwinding is
510 done reports that an unwinding assertion has failed.
511 
512 Reconsider the example. For a loop unwinding bound of one, no bug is
513 found. But already for a bound of two, CBMC detects a trace that
514 violates an assertion. Without unwinding assertions, or when using the
515 `--depth` command line switch, CBMC does not prove the program correct,
516 but it can be helpful to find program bugs. The various command line
517 options that CBMC offers for loop unwinding are described in the section
518 on \ref man_cbmc-loops "understanding loop unwinding".
519 
520 ### A Note About Compilers and the ANSI-C Library
521 
522 Most C programs make use of functions provided by a library; instances
523 are functions from the standard ANSI-C library such as `malloc` or
524 `printf`. The verification of programs that use such functions has two
525 requirements:
526 
527 1. Appropriate header files have to be provided. These header files
528  contain *declarations* of the functions that are to be used.
529 2. Appropriate *definitions* have to be provided.
530 
531 Most C compilers come with header files for the ANSI-C library
532 functions. We briefly discuss how to obtain/install these library files.
533 
534 #### Linux
535 
536 Linux systems that are able to compile software are usually equipped
537 with the appropriate header files. Consult the documentation of your
538 distribution on how to install the compiler and the header files. First
539 try to compile some significant program before attempting to verify it.
540 
541 #### Windows
542 
543 On Microsoft Windows, CBMC is pre-configured to use the compiler that is
544 part of Microsoft's Visual Studio. Microsoft's [Visual Studio
545 Community](http://www.visualstudio.com/en-us/products/visual-studio-community-vs)
546 is fully featured and available for download for free from the Microsoft
547 webpage. Visual Studio installs the usual set of header files together
548 with the compiler. However, the Visual Studio compiler requires a large
549 set of environment variables to function correctly. It is therefore
550 required to run CBMC from the *Visual Studio Command Prompt*, which can
551 be found in the menu *Visual Studio Tools*.
552 
553 Note that in both cases, only header files are available. CBMC only
554 comes with a small set of definitions, which includes functions such as
555 `malloc`. Detailed information about the built-in definitions is
556 \ref man_instrumentation-libraries "here".
557 
558 ### Command Line Interface
559 
560 This section describes the command line interface of CBMC. Like a C
561 compiler, CBMC takes the names of the .c source files as arguments.
562 Additional options allow to customize the behavior of CBMC. Use
563 `cbmc --help` to get a full list of the available options.
564 
565 Structured output can be obtained from CBMC using the option `--xml-ui`.
566 Any output from CBMC (e.g., counterexamples) will then use an XML
567 representation.
568 
569 ### Further Reading
570 
571 - \ref man_cbmc-loops "Understanding Loop Unwinding"
572 - [Hardware Verification using ANSI-C Programs as a
573  Reference](http://www-2.cs.cmu.edu/~svc/papers/view-publications-ck03.html)
574 - [Behavioral Consistency of C and Verilog Programs Using Bounded
575  Model Checking](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cky03.html)
576 - [A Tool for Checking ANSI-C
577  Programs](http://www-2.cs.cmu.edu/~svc/papers/view-publications-ckl2004.html)
578 
579 We also have a [list of interesting applications of
580 CBMC](http://www.cprover.org/cbmc/applications/).
581 
582 
583 \subsection man_cbmc-loops Understanding Loop Unwinding
584 
585 ### Iteration-based Unwinding
586 
587 The basic idea of CBMC is to model the computation of the programs up to
588 a particular depth. Technically, this is achieved by a process that
589 essentially amounts to *unwinding loops*. This concept is best
590 illustrated with a generic example:
591 
592  int main(int argc, char **argv) {
593  while(cond) {
594  BODY CODE
595  }
596  }
597 
598 A BMC instance that will find bugs with up to five iterations of the
599 loop would contain five copies of the loop body, and essentially
600 corresponds to checking the following loop-free program:
601 
602  int main(int argc, char **argv) {
603  if(cond) {
604  BODY CODE COPY 1
605  if(cond) {
606  BODY CODE COPY 2
607  if(cond) {
608  BODY CODE COPY 3
609  if(cond) {
610  BODY CODE COPY 4
611  if(cond) {
612  BODY CODE COPY 5
613  }
614  }
615  }
616  }
617  }
618  }
619 
620 Note the use of the `if` statement to prevent the execution of the loop
621 body in the case that the loop ends before five iterations are executed.
622 The construction above is meant to produce a program that is trace
623 equivalent with the original programs for those traces that contain up
624 to five iterations of the loop.
625 
626 In many cases, CBMC is able to automatically determine an upper bound on
627 the number of loop iterations. This may even work when the number of
628 loop unwindings is not constant. Consider the following example:
629 
630  _Bool f();
631 
632  int main() {
633  for(int i=0; i<100; i++) {
634  if(f()) break;
635  }
636  assert(0);
637  }
638 
639 The loop in the program above has an obvious upper bound on the number
640 of iterations, but note that the loop may abort prematurely depending on
641 the value that is returned by `f()`. CBMC is nevertheless able to
642 automatically unwind the loop to completion.
643 
644 This automatic detection of the unwinding bound may fail if the number
645 of loop iterations is highly data-dependent. Furthermore, the number of
646 iterations that are executed by any given loop may be too large or may
647 simply be unbounded. For this case, CBMC offers the command-line option
648 `--unwind B`, where `B` denotes a number that corresponds to the maximal
649 number of loop unwindings CBMC performs on any loop.
650 
651 Note that the number of unwindings is measured by counting the number of
652 *backjumps*. In the example above, note that the condition `i<100` is in
653 fact evaluated 101 times before the loop terminates. Thus, the loop
654 requires a limit of 101, and not 100.
655 
656 ### Setting Separate Unwinding Limits
657 
658 The setting given with `--unwind` is used globally, that is, for all
659 loops in the program. In order to set individual limits for the loops,
660 first use
661 
662  --show-loops
663 
664 to obtain a list of all loops in the program. Then identify the loops
665 you need to set a separate bound for, and note their loop ID. Then use
666 
667  --unwindset L:B,L:B,...
668 
669 where `L` denotes a loop ID and `B` denotes the bound for that loop.
670 
671 As an example, consider a program with two loops in the function main:
672 
673  --unwindset c::main.0:10,c::main.1:20
674 
675 This sets a bound of 10 for the first loop, and a bound of 20 for the
676 second loop.
677 
678 What if the number of unwindings specified is too small? In this case,
679 bugs that require paths that are deeper may be missed. In order to
680 address this problem, CBMC can optionally insert checks that the given
681 unwinding bound is actually sufficiently large. These checks are called
682 *unwinding assertions*, and are enabled with the option
683 `--unwinding-assertions`. Continuing the generic example above, this
684 unwinding assertion for a bound of five corresponds to checking the
685 following loop-free program:
686 
687  int main(int argc, char **argv) {
688  if(cond) {
689  BODY CODE COPY 1
690  if(cond) {
691  BODY CODE COPY 2
692  if(cond) {
693  BODY CODE COPY 3
694  if(cond) {
695  BODY CODE COPY 4
696  if(cond) {
697  BODY CODE COPY 5
698  assert(!cond);
699  }
700  }
701  }
702  }
703  }
704  }
705 
706 The unwinding assertions can be verified just like any other generated
707 assertion. If all of them are proven to hold, the given loop bounds are
708 sufficient for the program. This establishes a [high-level worst-case
709 execution time](http://en.wikipedia.org/wiki/Worst-case_execution_time)
710 (WCET).
711 
712 In some cases, it is desirable to cut off very deep loops in favor of
713 code that follows the loop. As an example, consider the following
714 program:
715 
716  int main() {
717  for(int i=0; i<10000; i++) {
718  BODY CODE
719  }
720  assert(0);
721  }
722 
723 In the example above, small values of `--unwind` will prevent that the
724 assertion is reached. If the code in the loop is considered irrelevant
725 to the later assertion, use the option
726 
727  --partial-loops
728 
729 This option will allow paths that execute loops only partially, enabling
730 a counterexample for the assertion above even for small unwinding
731 bounds. The disadvantage of using this option is that the resulting path
732 may be spurious, i.e., may not exist in the original program.
733 
734 ### Depth-based Unwinding
735 
736 The loop-based unwinding bound is not always appropriate. In particular,
737 it is often difficult to control the size of the generated formula when
738 using the `--unwind` option. The option
739 
740  --depth nr
741 
742 specifies an unwinding bound in terms of the number of instructions that
743 are executed on a given path, irrespectively of the number of loop
744 iterations. Note that CBMC uses the number of instructions in the
745 control-flow graph as the criterion, not the number of instructions in
746 the source code.
747 
748 \subsection man_cbmc-cover COVER: Test Suite Generation with CBMC
749 
750 
751 ### A Small Tutorial with A Case Study
752 
753 We assume that CBMC is installed on your system. If not so, follow
754 \ref man_install-cbmc "these instructions".
755 
756 CBMC can be used to automatically generate test cases that satisfy a
757 certain [code coverage](https://en.wikipedia.org/wiki/Code_coverage)
758 criteria. Common coverage criteria include branch coverage, condition
759 coverage and [Modified Condition/Decision Coverage
760 (MC/DC)](https://en.wikipedia.org/wiki/Modified_condition/decision_coverage).
761 Among others, MC/DC is required by several avionics software development
762 guidelines to ensure adequate testing of safety critical software.
763 Briefly, in order to satisfy MC/DC, for every conditional statement
764 containing boolean decisions, each Boolean variable should be evaluated
765 one time to "true" and one time to "false", in a way that affects the
766 outcome of the decision.
767 
768 In the following, we are going to demonstrate how to apply the test
769 suite generation functionality in CBMC, by means of a case study. The
770 following program is an excerpt from a real-time embedded benchmark
771 [PapaBench](https://www.irit.fr/recherches/ARCHI/MARCH/rubrique.php3?id_rubrique=97),
772 and implements part of a fly-by-wire autopilot for an Unmanned Aerial
773 Vehicle (UAV). It is adjusted mildly for our purposes.
774 
775 The aim of function `climb_pid_run` is to control the vertical climb of
776 the UAV. Details on the theory behind this operation are documented in
777 the [wiki](https://wiki.paparazziuav.org/wiki/Theory_of_Operation) for
778 the Paparazzi UAV project. The behaviour of this simple controller,
779 supposing that the desired speed is 0.5 meters per second, is plotted in
780 the Figure below.
781 
782 \image html pid.png "The pid controller"
783 
784  01: // CONSTANTS:
785  02: #define MAX_CLIMB_SUM_ERR 10
786  03: #define MAX_CLIMB 1
787  04:
788  05: #define CLOCK 16
789  06: #define MAX_PPRZ (CLOCK*600)
790  07:
791  08: #define CLIMB_LEVEL_GAZ 0.31
792  09: #define CLIMB_GAZ_OF_CLIMB 0.75
793  10: #define CLIMB_PITCH_OF_VZ_PGAIN 0.05
794  11: #define CLIMB_PGAIN -0.03
795  12: #define CLIMB_IGAIN 0.1
796  13:
797  14: const float pitch_of_vz_pgain=CLIMB_PITCH_OF_VZ_PGAIN;
798  15: const float climb_pgain=CLIMB_PGAIN;
799  16: const float climb_igain=CLIMB_IGAIN;
800  17: const float nav_pitch=0;
801  18:
802  19: /** PID function INPUTS */
803  20: // The user input: target speed in vertical direction
804  21: float desired_climb;
805  22: // Vertical speed of the UAV detected by GPS sensor
806  23: float estimator_z_dot;
807  24:
808  25: /** PID function OUTPUTS */
809  26: float desired_gaz;
810  27: float desired_pitch;
811  28:
812  29: /** The state variable: accumulated error in the control */
813  30: float climb_sum_err=0;
814  31:
815  32: /** Computes desired_gaz and desired_pitch */
816  33: void climb_pid_run()
817  34: {
818  35:
819  36: float err=estimator_z_dot-desired_climb;
820  37:
821  38: float fgaz=climb_pgain*(err+climb_igain*climb_sum_err)+CLIMB_LEVEL_GAZ+CLIMB_GAZ_OF_CLIMB*desired_climb;
822  39:
823  40: float pprz=fgaz*MAX_PPRZ;
824  41: desired_gaz=((pprz>=0 && pprz<=MAX_PPRZ) ? pprz : (pprz>MAX_PPRZ ? MAX_PPRZ : 0));
825  42:
826  43: /** pitch offset for climb */
827  44: float pitch_of_vz=(desired_climb>0) ? desired_climb*pitch_of_vz_pgain : 0;
828  45: desired_pitch=nav_pitch+pitch_of_vz;
829  46:
830  47: climb_sum_err=err+climb_sum_err;
831  48: if (climb_sum_err>MAX_CLIMB_SUM_ERR) climb_sum_err=MAX_CLIMB_SUM_ERR;
832  49: if (climb_sum_err<-MAX_CLIMB_SUM_ERR) climb_sum_err=-MAX_CLIMB_SUM_ERR;
833  50:
834  51: }
835  52:
836  53: int main()
837  54: {
838  55:
839  56: while(1)
840  57: {
841  58: /** Non-deterministic input values */
842  59: desired_climb=nondet_float();
843  60: estimator_z_dot=nondet_float();
844  61:
845  62: /** Range of input values */
846  63: __CPROVER_assume(desired_climb>=-MAX_CLIMB && desired_climb<=MAX_CLIMB);
847  64: __CPROVER_assume(estimator_z_dot>=-MAX_CLIMB && estimator_z_dot<=MAX_CLIMB);
848  65:
849  66: __CPROVER_input("desired_climb", desired_climb);
850  67: __CPROVER_input("estimator_z_dot", estimator_z_dot);
851  68:
852  69: climb_pid_run();
853  70:
854  71: __CPROVER_output("desired_gaz", desired_gaz);
855  72: __CPROVER_output("desired_pitch", desired_pitch);
856  73:
857  74: }
858  75:
859  76: return 0;
860  77: }
861 
862 In order to test the PID controller, we construct a main control loop,
863 which repeatedly invokes the function `climb_pid_run` (line 69). This
864 PID function has two input variables: the desired speed `desired_climb`
865 and the estimated speed `estimated_z_dot`. In the beginning of each loop
866 iteration, values of the inputs are assigned non-deterministically.
867 Subsequently, the `__CPROVER_assume` statement in lines 63 and 64
868 guarantees that both values are bounded within a valid range. The
869 `__CPROVER_input` and `__CPROVER_output` will help clarify the inputs
870 and outputs of interest for generating test suites.
871 
872 To demonstrate the automatic test suite generation in CBMC, we call the
873 following command and we are going to explain the command line options
874 one by one.
875 
876  cbmc pid.c --cover mcdc --unwind 6 --xml-ui
877 
878 The option `--cover mcdc` specifies the code coverage criterion. There
879 are four conditional statements in the PID function: in line 41, line
880 44, line 48 and line 49. To satisfy MC/DC, the test suite has to meet
881 multiple requirements. For instance, each conditional statement needs to
882 evaluate to *true* and *false*. Consider the condition
883 `"pprz>=0 && pprz<=MAX_PPRZ"` in line 41. CBMC instruments three
884 coverage goals to control the respective evaluated results of
885 `"pprz>=0"` and `"pprz<=MAX_PPRZ"`. We list them in below and they
886 satisfy the MC/DC rules. Note that `MAX_PPRZ` is defined as 16 \* 600 in
887 line 06 of the program.
888 
889  !(pprz >= (float)0) && pprz <= (float)(16 * 600) id="climb_pid_run.coverage.1"
890  pprz >= (float)0 && !(pprz <= (float)(16 * 600)) id="climb_pid_run.coverage.2"
891  pprz >= (float)0 && pprz <= (float)(16 * 600) id="climb_pid_run.coverage.3"
892 
893 The "id" of each coverage goal is automatically assigned by CBMC. For
894 every coverage goal, a test suite (if there exists) that satisfies such
895 a goal is printed out in XML format, as the parameter `--xml-ui` is
896 given. Multiple coverage goals can share a test suite, when the
897 corresponding execution of the program satisfies all these goals at the
898 same time.
899 
900 In the end, the following test suites are automatically generated for
901 testing the PID controller. A test suite consists of a sequence of input
902 parameters that are passed to the PID function `climb_pid_run` at each
903 loop iteration. For example, Test 1 covers the MC/DC goal with
904 id="climb\_pid\_run.coverage.1". The complete output from CBMC is in
905 [pid\_test\_suites.xml](pid_test_suites.xml), where every test suite and
906 the coverage goals it is for are clearly described.
907 
908  Test suite:
909  Test 1.
910  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f
911 
912  Test 2.
913  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f
914  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f
915 
916  Test 3.
917  (iteration 1) desired_climb=0.000000f, estimator_z_dot=-1.000000f
918  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f
919 
920  Test 4.
921  (iteration 1) desired_climb=1.000000f, estimator_z_dot=-1.000000f
922  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f
923  (iteration 3) desired_climb=1.000000f, estimator_z_dot=-1.000000f
924  (iteration 4) desired_climb=1.000000f, estimator_z_dot=-1.000000f
925  (iteration 5) desired_climb=0.000000f, estimator_z_dot=-1.000000f
926  (iteration 6) desired_climb=1.000000f, estimator_z_dot=-1.000000f
927 
928  Test 5.
929  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f
930  (iteration 2) desired_climb=-1.000000f, estimator_z_dot=1.000000f
931  (iteration 3) desired_climb=-1.000000f, estimator_z_dot=1.000000f
932  (iteration 4) desired_climb=-1.000000f, estimator_z_dot=1.000000f
933  (iteration 5) desired_climb=-1.000000f, estimator_z_dot=1.000000f
934  (iteration 6) desired_climb=-1.000000f, estimator_z_dot=1.000000f
935 
936 The option `--unwind 6` unwinds the loop inside the main function body
937 six times. In order to achieve the complete coverage on all the
938 instrumented goals in the PID function `climb_pid_run`, the loop must be
939 unwound sufficient enough times. For example, `climb_pid_run` needs to
940 be called at least six times for evaluating the condition
941 `climb_sum_err>MAX_CLIMB_SUM_ERR` in line 48 to *true*. This corresponds
942 to the Test 5. An introduction to the use of loop unwinding can be found
943 in [Understanding Loop Unwinding](cbmc-loops.shtml).
944 
945 In this small tutorial, we present the automatic test suite generation
946 functionality of CBMC, by applying the MC/DC code coverage criterion to
947 a PID controller case study. In addition to `--cover mcdc`, other
948 coverage criteria like `branch`, `decision`, `path` etc. are also
949 available when calling CBMC.
950 
951 ### Coverage Criteria
952 
953 The table below summarizes the coverage criteria that CBMC supports.
954 
955 Criterion |Definition
956 ----------|----------
957 assertion |For every assertion, generate a test that reaches it
958 location |For every location, generate a test that reaches it
959 branch |Generate a test for every branch outcome
960 decision |Generate a test for both outcomes of every Boolean expression that is not an operand of a propositional connective
961 condition |Generate a test for both outcomes of every Boolean expression
962 mcdc |Modified Condition/Decision Coverage (MC/DC)
963 path |Bounded path coverage
964 cover |Generate a test for every `__CPROVER_cover` statement
965 
966 
967 \section man_satabs SATABS---Predicate Abstraction with SAT
968 
969 \subsection man_satabs-overview Overview
970 
971 This section describes SATABS from the point of view of the user. To
972 learn about the technology implemented in SATABS, read
973 \ref man_satabs-background "this".
974 
975 We assume you have already installed SATABS and the necessary support
976 files on your system. If not so, please follow
977 \ref man_install-satabs "these instructions".
978 
979 While users of SATABS almost never have to be concerned about the
980 underlying refinement abstraction algorithms, understanding the classes
981 of properties that can be verified is crucial. Predicate abstraction is
982 most effective when applied to *control-flow dominated properties*. As
983 an example, reconsider the following program (lock-example-fixed.c):
984 
985  _Bool nondet_bool();
986  _Bool LOCK = 0;
987 
988  _Bool lock() {
989  if(nondet_bool()) {
990  assert(!LOCK);
991  LOCK=1;
992  return 1; }
993 
994  return 0;
995  }
996 
997  void unlock() {
998  assert(LOCK);
999  LOCK=0;
1000  }
1001 
1002  int main() {
1003  unsigned got_lock = 0;
1004  int times;
1005 
1006  while(times > 0) {
1007  if(lock()) {
1008  got_lock++;
1009  /* critical section */
1010  }
1011 
1012  if(got_lock!=0) {
1013  unlock();
1014  got_lock--;
1015  }
1016 
1017  times--;
1018  }
1019  }
1020 
1021 The two assertions in the program model that the functions `lock()` and
1022 `unlock()` are called in the right order. Note that the value of `times`
1023 is chosen non-deterministically and is not bounded. The program has no
1024 run-time bound, and thus, unwinding the code with CBMC will never
1025 terminate.
1026 
1027 ### Working with Claims
1028 
1029 The two assertions will give rise to two *properties*. Each property is
1030 associated to a specific line of code, i.e., a property is violated when
1031 some condition can become false at the corresponding program location.
1032 SATABS will generate a list of all properties for the programs as
1033 follows:
1034 
1035  satabs lock-example-fixed.c --show-properties
1036 
1037 SATABS will list two properties; each property corresponds to one of the
1038 two assertions. We can use SATABS to verify both properties as follows:
1039 
1040  satabs lock-example-fixed.c
1041 
1042 SATABS will conclude the verification successfully, that is, both
1043 assertions hold for execution traces of any length.
1044 
1045 By default, SATABS attempts to verify all properties at once. A single
1046 property can be verified (or refuted) by using the `--property id`
1047 option of SATABS, where `id` denotes the identifier of the property in
1048 the list obtained by calling SATABS with the `--show-properties` flag.
1049 Whenever a property is violated, SATABS reports a feasible path that
1050 leads to a state in which the condition that corresponds to the violated
1051 property evaluates to false.
1052 
1053 \subsection man_satabs-libraries Programs that use Libraries
1054 
1055 SATABS cannot check programs that use functions that are only available
1056 in binary (compiled) form (this restriction is not imposed by the
1057 verification algorithms that are used by SATABS – they also work on
1058 assembly code). The reason is simply that so far no assembly language
1059 frontend is available for SATABS. At the moment, (library) functions for
1060 which no C source code is available have to be replaced by stubs. The
1061 usage of stubs and harnesses (as known from unit testing) also allows to
1062 check more complicated properties (like, for example, whether function
1063 `fopen` is always called before `fclose`). This technique is explained
1064 in detail in the \ref man_satabs-tutorials "SATABS tutorials".
1065 
1066 \subsection man_satabs-unit-test Unit Testing with SATABS
1067 
1068 The example presented \ref man_satabs-tutorial-driver "here" is
1069 obviously a toy example and can hardly be used to convince your project
1070 manager to use static verification in your next project. Even though we
1071 recommend to use formal verification and specification already in the
1072 early phases of your project, the sad truth is that in most projects
1073 verification (of any kind) is still pushed to the very end of the
1074 development cycle. Therefore, this section is dedicated to the
1075 verification of legacy code. However, the techniques presented here can
1076 also be used for *unit testing*.
1077 
1078 Unit testing is used in most software development projects, and static
1079 verification with SATABS can be very well combined with this technique.
1080 Unit testing relies on a number test cases that yield the desired code
1081 coverage. Such test cases are implemented by a software testing engineer
1082 in terms of a test harness (aka test driver) and a set of function
1083 stubs. Typically, a slight modification to the test harness allows it to
1084 be used with SATABS. Replacing the explicit input values with
1085 non-deterministic inputs (as explained in
1086 \ref man_satabs-tutorial-aeon "here" and
1087 \ref man_satabs-tutorial-driver "here")) guarantees that SATABS will try
1088 to achieve **full** path and state coverage (due to the fact that
1089 predicate abstraction implicitly detects equivalence classes). However,
1090 it is not guaranteed that SATABS terminates in all cases. Keep in mind
1091 that you must not make any assumptions about the validity of the
1092 properties if SATABS did not run to completion!
1093 
1094 ### Further Reading
1095 
1096 - [Model Checking Concurrent Linux Device
1097  Drivers](http://www.kroening.com/publications/view-publications-wbwk2007.html)
1098 - [SATABS: SAT-based Predicate Abstraction for
1099  ANSI-C](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2005.html)
1100 - [Predicate Abstraction of ANSI-C Programs using
1101  SAT](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cksy2004.html)
1102 
1103 \subsection man_satabs-background Background
1104 
1105 ### Sound Abstractions
1106 
1107 This section provides background information on how SATABS operates.
1108 Even for very trivial C programs it is impossible to exhaustively
1109 examine their state space (which is potentially unbounded). However, not
1110 all details in a C program necessarily contribute to a bug, so it may be
1111 sufficient to only examine those parts of the program that are somehow
1112 related to a bug.
1113 
1114 In practice, many static verification tools (such as `lint`) try to
1115 achieve this goal by applying heuristics. This approach comes at a cost:
1116 bugs might be overlooked because the heuristics do not cover all
1117 relevant aspects of the program. Therefore, the conclusion that a
1118 program is correct whenever such a static verification tool is unable to
1119 find an error is invalid.
1120 
1121 \image html cegar-1.png "CEGAR Loop"
1122 
1123 A more sophisticated approach that has been very successful recently is
1124 to generate a *sound* abstraction of the original program. In this
1125 context, *soundness* refers to the fact that the abstract program
1126 contains (at least) all relevant behaviors (i.e., bugs) that are present
1127 in the original program. In the Figure above, the first component strips
1128 details from the original program. The number of possible behaviors
1129 increases as the number of details in the abstract program decreases.
1130 Intuitively, the reason is that whenever the model checking tool lacks
1131 the information that is necessary to make an accurate decision on
1132 whether a branch of an control flow statement can be taken or not, both
1133 branches have to be considered.
1134 
1135 In the resulting *abstract program*, a set of concrete states is
1136 subsumed by means of a single abstract state. Consider the following
1137 figure:
1138 
1139 ![](states.png)
1140 
1141 The concrete states *x*~1~ and *x*~2~ are mapped to an abstract state
1142 *X*, and similarly *Y* subsumes *y*~1~ and *y*~2~. However, all
1143 transitions that are possible in the concrete program are also possible
1144 in the abstract model. The abstract transition *X* → *Y* summarizes the
1145 concrete transitions *x*~1~ → *y*~1~ and *x*~1~ → *x*~1~, and *Y* → *X*
1146 corresponds to *x*~1~ → *x*~2~. The behavior *X* → *Y* → *X* is feasible
1147 in the original program, because it maps to *x*~1~ → *x*~1~ → *x*~2~.
1148 However, *Y* → *X* → *Y* is feasible only in the abstract model.
1149 
1150 ### Spurious Counterexamples
1151 
1152 The consequence is that the model checker (component number two in the
1153 figure above) possibly reports a *spurious* counterexample. We call a
1154 counterexample spurious whenever it is feasible in the current abstract
1155 model but not in the original program. However, whenever the model
1156 checker is unable to find an execution trace that violates the given
1157 property, we can conclude that there is no such trace in the original
1158 program, either.
1159 
1160 The feasibility of counterexamples is checked by *symbolic simulation*
1161 (performed by component three in the figure above). If the
1162 counterexample is indeed feasible, SATABS found a bug in the original
1163 program and reports it to the user.
1164 
1165 ### Automatic Refinement
1166 
1167 On the other hand, infeasible counterexamples (that originate from
1168 abstract behaviors that result from the omission of details and are not
1169 present in the original program) are never reported to the user.
1170 Instead, the information is used in order to refine the abstraction such
1171 that the spurious counterexample is not part of the refined model
1172 anymore. For instance, the reason for the infeasibility of *Y* → *X* →
1173 *Y* is that neither *y*~1~ nor *x*~1~ can be reached from *x*~2~.
1174 Therefore, the abstraction can be refined by partitioning *X*.
1175 
1176 The refinement steps can be illustrated as follows:
1177 
1178 ![Iterative refinement](refinement.png)
1179 
1180 The first step (1) is to generate a very coarse abstraction with a very
1181 small state space. This abstraction is then successively refined (2, 3,
1182 ...) until either a feasible counterexample is found or the abstract
1183 program is detailed enough to show that there is no path that leads to a
1184 violation of the given property. The problem is that this point is not
1185 necessarily reached for every input program, i.e., it is possible that
1186 the the abstraction refinement loop never terminates. Therefore, SATABS
1187 allows to specify an upper bound for the number of iterations.
1188 
1189 When this upper bound is reached and no counterexample was found, this
1190 does not necessarily mean that there is none. In this case, you cannot
1191 make any conclusions at all with respect to the correctness of the input
1192 program.
1193 
1194 \subsection man_satabs-tutorials Tutorials
1195 
1196 We provide an introduction to model checking "real" C programs with
1197 SATABS using two small examples.
1198 
1199 \subsubsection man_satabs-tutorial-driver Reference Counting in Linux Device Drivers
1200 
1201 Microsoft's [SLAM](http://research.microsoft.com/SLAM) toolkit has been
1202 successfully used to find bugs in Windows device drivers. SLAM
1203 automatically verifies device driver whether a device driver adheres to
1204 a set of specifications. SLAM provides a test harness for device drivers
1205 that calls the device driver dispatch routines in a non-deterministic
1206 order. Therefore, the Model Checker examines all combinations of calls.
1207 Motivated by the success this approach, we provide a toy example based
1208 on Linux device drivers. For a more complete approach to the
1209 verification of Linux device drivers, consider
1210 [DDVerify](http://www.cprover.org/ddverify/).
1211 
1212 Dynamically loadable modules enable the Linux Kernel to load device
1213 drivers on demand and to release them when they are not needed anymore.
1214 When a device driver is registered, the kernel provides a major number
1215 that is used to uniquely identify the device driver. The corresponding
1216 device can be accessed through special files in the filesystem; by
1217 convention, they are located in the `/dev` directory. If a process
1218 accesses a device file the kernel calls the corresponding `open`, `read`
1219 and `write` functions of the device driver. Since a driver must not be
1220 released by the kernel as long as it is used by at least one process,
1221 the device driver must maintain a usage counter (in more recent Linux
1222 kernels, this is done automatically, however, drivers that must maintain
1223 backward compatibility have to adjust this counter).
1224 
1225 We provide a skeleton of such a driver. Download the files
1226 assets/spec.c, assets/driver.c, assets/driver.h, assets/kdev_t.h, and
1227 assets/modules.h.
1228 
1229 The driver contains following functions:
1230 
1231 1. `register_chrdev`: (in assets/spec.c) Registers a character device.
1232  In our implementation, the function sets the variable `usecount` to
1233  zero and returns a major number for this device (a constant, if the
1234  user provides 0 as argument for the major number, and the value
1235  specified by the user otherwise).
1236 
1237  int usecount;
1238 
1239  int register_chrdev (unsigned int major, const char* name)
1240  {
1241  usecount = 0;
1242  if (major == 0)
1243  return MAJOR_NUMBER;
1244  return major;
1245  }
1246 
1247 2. `unregister_chrdev`: (in assets/spec.c) Unregisters a character
1248  device. This function asserts that the device is not used by any
1249  process anymore (we use the macro `MOD_IN_USE` to check this).
1250 
1251  int unregister_chrdev (unsigned int major, const char* name)
1252  {
1253  if (MOD_IN_USE)
1254  {
1255  ERROR: assert (0);
1256  }
1257  else
1258  return 0;
1259  }
1260 
1261 3. `dummy_open`: (in assets/driver.c) This function increases the
1262  `usecount`. If the device is locked by some other process
1263  `dummy_open` returns -1. Otherwise it locks the device for the
1264  caller.
1265 
1266 4. `dummy_read`: (in assets/driver.c) This function "simulates" a read
1267  access to the device. In fact it does nothing, since we are
1268  currently not interested in the potential buffer overflow that may
1269  result from a call to this function. Note the usage of the function
1270  `nondet_int`: This is an internal SATABS-function that
1271  non­determi­nistically returns an arbitrary integer value. The
1272  function `__CPROVER_assume` tells SATABS to ignore all traces that
1273  do not adhere to the given assumption. Therefore, whenever the lock
1274  is held, `dummy_read` will return a value between 0 and `max`. If
1275  the lock is not held, then `dummy_read` returns -1.
1276 
1277 5. `dummy_release`: (in assets/driver.c) If the lock is held, then
1278  `dummy_release` decreases the `usecount`, releases the lock, and
1279  returns 0. Otherwise, the function returns -1.
1280 
1281 We now want to check if any *valid* sequence of calls of the dispatch
1282 functions (in driver.c) can lead to the violation of the assertion (in
1283 assets/spec.c). Obviously, a call to `dummy_open` that is immediately
1284 followed by a call to `unregister_chrdev` violates the assertion.
1285 
1286 The function `main` in spec.c gives an example of how these functions
1287 are called. First, a character device "`dummy`" is registered. The major
1288 number is stored in the `inode` structure of the device. The values for
1289 the file structure are assigned non-deterministically. We rule out
1290 invalid sequences of calls by ensuring that no device is unregistered
1291 while it is still locked. We use the following model checking harness
1292 for calling the dispatching functions:
1293 
1294  random = nondet_uchar ();
1295  __CPROVER_assume (0 <= random && random <= 3);
1296 
1297  switch (random)
1298  {
1299  case 1:
1300  rval = dummy_open (&inode, &my_file);
1301  if (rval == 0)
1302  lock_held = TRUE;
1303  break;
1304  case 2:
1305  __CPROVER_assume (lock_held);
1306  count = dummy_read (&my_file, buffer, BUF_SIZE);
1307  break;
1308  case 3:
1309  dummy_release (&inode, &my_file);
1310  lock_held = FALSE;
1311  break;
1312  default:
1313  break;
1314  }
1315 
1316 The variable `random` is assigned non-deterministically. Subsequently,
1317 the value of `random` is restricted to be 0 &le `random` ≤ 3 by a call
1318 to `__CPROVER_assume`. Whenever the value of `random` is not in this
1319 interval, the corresponding execution trace is simply discarded by
1320 SATABS. Depending on the value of `random`, the harness calls either
1321 `dummy_open`, `dummy_read` or `dummy_close`. Therefore, if there is a
1322 sequence of calls to these three functions that leads to a violation of
1323 the assertion in `unregister_chrdev`, then SATABS will eventually
1324 consider it.
1325 
1326 If we ask SATABS to show us the properties it verifies with
1327 
1328  satabs driver.c spec.c --show-properties
1329 
1330 for our example, we obtain
1331 
1332 1. Claim unregister\_chrdev.1:\
1333      file spec.c line 18 function unregister\_chrdev\
1334      MOD\_IN\_USE in unregister\_chrdev\
1335      FALSE
1336 
1337 2. Claim dummy\_open.1:\
1338      file driver.c line 15 function dummy\_open\
1339      i\_rdev mismatch\
1340      (unsigned int)inode-&gt;i\_rdev &gt;&gt; 8 == (unsigned
1341  int)dummy\_major
1342 
1343 It seems obvious that the property dummy\_open.1 can never be violated.
1344 SATABS confirms this assumption: We call
1345 
1346  satabs driver.c spec.c --property dummy_open.1
1347 
1348 and SATABS reports `VERIFICATION SUCCESSFUL` after a few iterations.
1349 
1350 If we try to verify property unregister\_chrdev.1, SATABS reports that
1351 the property in line 18 in file spec.c is violated (i.e., the assertion
1352 does not hold, therefore the `VERIFICATION FAILED`). Furthermore, SATABS
1353 provides a detailed description of the problem in the form of a
1354 counterexample (i.e., an execution trace that violates the property). On
1355 this trace, `dummy_open` is called **twice**, leading to a `usecount` of 2.
1356 The second call of course fails with `rval=-1`, but the counter is
1357 increased nevertheless:
1358 
1359  int dummy_open (struct inode *inode, struct file *filp)
1360  {
1361  __CPROVER_assert(MAJOR (inode->i_rdev) == dummy_major,
1362  "i_rdev mismatch");
1363  MOD_INC_USE_COUNT;
1364 
1365  if (locked)
1366  return -1;
1367  locked = TRUE;
1368 
1369  return 0; /* success */
1370  }
1371 
1372 Then, `dummy_release` is called to release the lock on the device.
1373 Finally, the loop is left and the call to `unregister_chrdev` results in
1374 a violation of the assertion (since `usecount` is still 1, even though
1375 `locked=0`).
1376 
1377 \subsubsection man_satabs-tutorial-aeon Buffer Overflow in a Mail Transfer Agent
1378 
1379 We explain how to model check Aeon version 0.2a, a small mail transfer
1380 agent written by Piotr Benetkiewicz. The description advertises Aeon as
1381 a "*good choice for **hardened** or minimalistic boxes*". The sources
1382 are available
1383 [here](http://www.cprover.org/satabs/examples/loop_detection/aeon-0.2a.tar.gz).
1384 
1385 Our first naive attempt to verify Aeon using
1386 
1387  satabs *.c
1388 
1389 produces a positive result, but also warns us that the property holds
1390 trivially. It also reveals that a large number library functions are
1391 missing: SATABS is unable to find the source code for library functions
1392 like `send`, `write` and `close`.
1393 
1394 Now, do you have to provide a body for all missing library functions?
1395 There is no easy answer to this question, but a viable answer would be
1396 "most likely not". It is necessary to understand how SATABS handles
1397 functions without bodies: It simply assumes that such a function returns
1398 an arbitrary value, but that no other locations than the one on the left
1399 hand side of the assignment are changed. Obviously, there are cases in
1400 which this assumption is un­sound, since the function potentially
1401 modifies all memory locations that it can somehow address.
1402 
1403 We now use static analysis to generate array bounds checks for Aeon:
1404 
1405  satabs *.c --pointer-check --bounds-check --show-properties
1406 
1407 SATABS will show about 300 properties in various functions (read
1408 \ref man_instrumentation-properties "this" for more information on the
1409 property instrumentation). Now consider the first few lines of the
1410 `main` function of Aeon:
1411 
1412  int main(int argc, char **argv)
1413  {
1414  char settings[MAX_SETTINGS][MAX_LEN];
1415  ...
1416  numSet = getConfig(settings);
1417  if (numSet == -1) {
1418  logEntry("Missing config file!");
1419  exit(1);
1420  }
1421  ...
1422 
1423 and the function `getConfig` in `lib_aeon.c`:
1424 
1425  int getConfig(char settings[MAX_SETTINGS][MAX_LEN])
1426  {
1427  char home[MAX_LEN];
1428  FILE *fp; /* .rc file handler */
1429  int numSet = 0; /* number of settings */
1430 
1431  strcpy(home, getenv("HOME")); /* get home path */
1432  strcat(home, "/.aeonrc"); /* full path to rc file */
1433  fp = fopen(home, "r");
1434  if (fp == NULL) return -1; /* no cfg - ERROR */
1435  while (fgets(settings[numSet], MAX_LEN-1, fp)
1436  && (numSet < MAX_SETTINGS)) numSet++;
1437  fclose(fp);
1438 
1439  return numSet;
1440  }
1441 
1442 The function `getConfig` makes calls to `strcpy`, `strcat`, `getenv`,
1443 `fopen`, `fgets`, and `fclose`. It is very easy to provide an
1444 implementation for the functions from the string library (string.h), and
1445 SATABS comes with meaningful definitions for most of them. The
1446 definition of `getenv` is not so straight-forward. The man-page of
1447 `getenv` (which we obtain by entering `man 3 getenv` in a Unix or cygwin
1448 command prompt) tells us:
1449 
1450  `` `getenv' `` searches the list of en­vi­ron­ment variable names
1451  and values (using the global pointer `char **environ`) for a
1452  variable whose name matches the string at `NAME`. If a variable name
1453  matches, `getenv` returns a pointer to the associated value.
1454 
1455 SATABS has no information whatsoever about the content of `environ`.
1456 Even if SATABS could access the environment variables on your
1457 computer, a successful verification of `Aeon` would then only guarantee
1458 that the properties for this program hold on your computer with a
1459 specific set of en­vi­ron­ment variables. We have to assume that
1460 `environ` contains en­vi­ron­ment variables that have an arbitrary
1461 content of arbitrary length. The content of en­vi­ron­ment variables is
1462 not only arbitrary but could be malefic, since it can be modified by the
1463 user. The approximation of the behavior of `getenv` that is shipped with
1464 SATABS completely ignores the content of the string.
1465 
1466 Now let us have another look at the properties that SATABS generates for
1467 the models of the the string library and for `getenv`. Most of these
1468 properties require that we verify that the upper and lower bounds of
1469 buffers or arrays are not violated. Let us look at one of the properties
1470 that SATABS generates for the code in function `getConfig`:
1471 
1472  Claim getConfig.3:   file lib_aeon.c line 19 function getConfig   dereference failure: NULL plus offset pointer   !(SAME-OBJECT(src, NULL))`
1473 
1474 The model of the function `strcpy` dereferences the pointer returned by
1475 `getenv`, which may return a NULL pointer. This possibility is detected
1476 by the static analysis, and thus a corresponding property is generated.
1477 Let us check this specific property:
1478 
1479  satabs *.c --pointer-check --bounds-check --property getConfig.3
1480 
1481 SATABS immediately returns a counterexample path that demonstrates how
1482 `getenv` returns a NULL, which is subsequently dereferenced. We have
1483 identified the first bug in this program: it requires that the
1484 environment variable HOME is set, and crashes otherwise.
1485 
1486 Let us examine one more property in the same function:
1487 
1488  Claim getConfig.7:   file lib_aeon.c line 19 function getConfig   dereference failure: array `home' upper bound   !(POINTER_OFFSET(dst) + (int)i >= 512) || !(SAME-OBJECT(dst, &home[0]))
1489 
1490 This property asserts that the upper bound of the array `home` is not
1491 violated. The variable `home` looks familiar: We encountered it in the
1492 function `getConfig` given above. The function `getenv` in combination
1493 with functions `strcpy`, `strcat` or `sprintf` is indeed often the
1494 source for buffer overflows. Therefore, we try to use SATABS to check
1495 the upper bound of the array `home`:
1496 
1497  satabs *.c --pointer-check --bounds-check --property getConfig.7
1498 
1499 SATABS runs for quite a while and will eventually give up, telling us
1500 that its upper bound for abstraction refinement iterations has been
1501 exceeded. This is not exactly the result we were hoping for, and we
1502 could now increase the bound for iterations with help of the
1503 `--iterations` command line switch of SATABS.
1504 
1505 Before we do this, let us investigate why SATABS has failed to provide a
1506 useful result. The function `strcpy` contains a loop that counts from 1
1507 to the length of the input string. Predicate abstraction, the mechanism
1508 SATABS is based on, is unable to detect such loops and will therefore
1509 unroll the loop body as often as necessary. The array `home` has
1510 `MAX_LEN` elements, and `MAX_LEN` is defined to be 512 in `aeon.h`.
1511 Therefore, SATABS would have to run through at least 512 iterations,
1512 only to verify (or reject) one of the more than 300 properties! Does
1513 this fact defeat the purpose of static verification?
1514 
1515 We can make the job easier: after reducing the value of `MAX_LEN` in
1516 `aeon.h` to a small value, say to 10, SATABS provides a counterexample
1517 trace that demonstrates how the buffer overflow be reproduced. If you
1518 use the Eclipse plugin (as described \ref man_install-eclipse "here"),
1519 you can step through this counterexample. The trace contains the string
1520 that is returned by `getenv`.
1521 
1522 
1523 \section man_modelling Modelling
1524 
1525 \subsection man_modelling-nondet Nondeterminism
1526 
1527 ### Rationale
1528 
1529 Programs typically read inputs from an environment. These inputs can
1530 take the form of data read from a file, keyboard or network socket, or
1531 arguments passed on the command line. It is usually desirable to analyze
1532 the program for any choice of these inputs. In Model Checking, inputs
1533 are therefore modeled by means of *nondeterminism*, which means that the
1534 value of the input is not specified. The program may follow any
1535 computation that results from any choice of inputs.
1536 
1537 ### Sources of Nondeterminism
1538 
1539 The CPROVER tools support the following sources of nondeterminism:
1540 
1541 - functions that read inputs from the environments;
1542 - the thread schedule in concurrent programs;
1543 - initial values of local-scoped variables and memory allocated with
1544  `malloc`;
1545 - initial values of variables that are `extern` in all compilation
1546  units;
1547 - explicit functions for generating nondeterminism.
1548 
1549 The CPROVER tools are shipped with a number of stubs for the most
1550 commonly used library functions. When executing a statement such as
1551 `getchar()`, a nondeterministic value is chosen instead of reading a
1552 character from the keyboard.
1553 
1554 When desired, nondeterminism can be introduced explicitly into the
1555 program by means of functions that begin with the prefix `nondet_`. As
1556 an example, the following function returns a nondeterministically chosen
1557 unsigned short int:
1558 
1559  unsigned short int nondet_ushortint();
1560 
1561 Note that the body of the function is not defined. The name of the
1562 function itself is irrelevant (save for the prefix), but must be unique.
1563 Also note that a nondeterministic choice is not to be confused with a
1564 probabilistic (or random) choice.
1565 
1566 ### Uninterpreted Functions
1567 
1568 It may be necessary to check parts of a program independently.
1569 Nondeterminism can be used to over-approximate the behaviour of part of
1570 the system which is not being checked. Rather than calling a complex or
1571 unrelated function, a nondeterministic stub is used. However, separate
1572 calls to the function can return different results, even for the same
1573 inputs. If the function output only depends on its inputs then this can
1574 introduce spurious errors. To avoid this problem, functions whose names
1575 begin with the prefix `__CPROVER_uninterpreted_` are treated as
1576 uninterpreted functions. Their value is non-deterministic but different
1577 invocations will return the same value if their inputs are the same.
1578 Note that uninterpreted functions are not supported by all back-end
1579 solvers.
1580 
1581 \subsection man_modelling-assumptions Modeling with Assertions and Assumptions
1582 
1583 ### Assertions
1584 
1585 [Assertions](http://en.wikipedia.org/wiki/Assertion_%28computing%29) are
1586 statements within the program that attempt to capture the programmer's
1587 intent. The ANSI-C standard defines a header file
1588 [assert.h](http://en.wikipedia.org/wiki/Assert.h), which offers a macro
1589 `assert(cond)`. When executing a statement such as
1590 
1591  assert(p!=NULL);
1592 
1593 the execution is aborted with an error message if the condition
1594 evaluates to *false*, i.e., if `p` is NULL in the example above. The
1595 CPROVER tools can check the validity of the programmer-annotated
1596 assertions statically. Specifically, the CPROVER tools will check that
1597 the assertions hold for *any* nondeterministic choice that the program
1598 can make. The static assertion checks can be disabled using the
1599 `--no-assertions` command line option.
1600 
1601 In addition, there is a CPROVER-specific way to specify assertions,
1602 using the built-in function `__CPROVER_assert`:
1603 
1604  __CPROVER_assert(p!=NULL, "p is not NULL");
1605 
1606 The (mandatory) string that is passed as the second argument provides an
1607 informal description of the assertion. It is shown in the list of
1608 properties together with the condition.
1609 
1610 The assertion language of the CPROVER tools is identical to the language
1611 used for expressions. Note that \ref man_modelling-nondet
1612 "nondeterminism"
1613 can be exploited in order to check a range of choices. As an example,
1614 the following code fragment asserts that **all** elements of the array
1615 are zero:
1616 
1617  int a[100], i;
1618 
1619  ...
1620 
1621  i=nondet_uint();
1622  if(i>=0 && i<100)
1623  assert(a[i]==0);
1624 
1625 The nondeterministic choice will guess the element of the array that is
1626 nonzero. The code fragment above is therefore equivalent to
1627 
1628  int a[100], i;
1629 
1630  ...
1631 
1632  for(i=0; i<100; i++)
1633  assert(a[i]==0);
1634 
1635 Future CPROVER releases will support explicit quantifiers with a syntax
1636 that resembles Spec\#:
1637 
1638  __CPROVER_forall { *type identifier* ; *expression* }
1639  __CPROVER_exists { *type identifier* ; *expression* }
1640 
1641 ### Assumptions
1642 
1643 Assumptions are used to restrict nondeterministic choices made by the
1644 program. As an example, suppose we wish to model a nondeterministic
1645 choice that returns a number from 1 to 100. There is no integer type
1646 with this range. We therefore use \_\_CPROVER\_assume to restrict the
1647 range of a nondeterministically chosen integer:
1648 
1649  unsigned int nondet_uint();
1650 
1651  unsigned int one_to_hundred()
1652  {
1653  unsigned int result=nondet_uint();
1654  __CPROVER_assume(result>=1 && result<=100);
1655  return result;
1656  }
1657 
1658 The function above returns the desired integer from 1 to 100. You must
1659 ensure that the condition given as an assumption is actually satisfiable
1660 by some nondeterministic choice, or otherwise the model checking step
1661 will pass vacuously.
1662 
1663 Also note that assumptions are never retroactive: They only affect
1664 assertions (or other properties) that follow them in program order. This
1665 is best illustrated with an example. In the following fragment, the
1666 assumption has no effect on the assertion, which means that the
1667 assertion will fail:
1668 
1669  x=nondet_uint();
1670  assert(x==100);
1671  __CPROVER_assume(x==100);
1672 
1673 Assumptions do restrict the search space, but only for assertions that
1674 follow. As an example, the following program will pass:
1675 
1676  int main() {
1677  int x;
1678 
1679  __CPROVER_assume(x>=1 && x<=100000);
1680 
1681  x*=-1;
1682 
1683  __CPROVER_assert(x<0, "x is negative");
1684  }
1685 
1686 Beware that nondeterminism cannot be used to obtain the effect of
1687 universal quantification in assumptions. As an example,
1688 
1689  int main() {
1690  int a[10], x, y;
1691 
1692  x=nondet_int();
1693  y=nondet_int();
1694  __CPROVER_assume(x>=0 && x<10 && y>=0 && y<10);
1695 
1696  __CPROVER_assume(a[x]>=0);
1697 
1698  assert(a[y]>=0);
1699  }
1700 
1701 fails, as there is a choice of x and y which results in a counterexample
1702 (any choice in which x and y are different).
1703 
1704 \subsection man_modelling-pointers Pointer Model
1705 
1706 ### Pointers in C
1707 
1708 C programs (and sometimes C++ programs as well) make intensive use of
1709 pointers in order to decouple program code from specific data. A pointer
1710 variable does not store data such as numbers or letters, but instead
1711 points to a location in memory that hold the relevant data. This section
1712 describes the way the CPROVER tools model pointers.
1713 
1714 ### Objects and Offsets
1715 
1716 The CPROVER tools represent pointers as a pair. The first member of the
1717 pair is the *object* the pointer points to, and the second is the offset
1718 within the object.
1719 
1720 In C, objects are simply continuous fragments of memory (this definition
1721 of "object" is not to be confused with the use of the term in
1722 object-oriented programming). Variables of any type are guaranteed to be
1723 stored as one object, irrespectively of their type. As an example, all
1724 members of a struct or array belong to the same object. CPROVER simply
1725 assigns a number to each active object. The object number of a pointer
1726 `p` can be extracted using the expression `__CPROVER_POINTER_OBJECT(p)`.
1727 As a consequence, pointers to different objects are always different,
1728 which is not sound.
1729 
1730 The offset (the second member of the pair that forms a pointer) is
1731 relative to the beginning of the object; it uses byte granularity. As an
1732 example, the code fragment
1733 
1734  unsigned array[10];
1735  char *p;
1736 
1737  p=(char *)(array+1);
1738  p++;
1739 
1740 will result in a pointer with offset 5. The offset of a pointer `p` can
1741 be extracted using the expression `__CPROVER_POINTER_OFFSET(p)`.
1742 
1743 ### Dereferencing Pointers
1744 
1745 The CPROVER tools require that pointers that are dereferenced point to a
1746 valid object. Assertions that check this requirement can be generated
1747 using the option --pointer-check and, if desired, --bounds-check. These
1748 options will ensure that NULL pointers are not dereferenced, and that
1749 dynamically allocated objects have not yet been deallocated.
1750 
1751 Furthermore, the CPROVER tools check that dynamically allocated memory
1752 is not deallocated twice. The goto-instrument tool is also able to add
1753 checks for memory leaks, i.e., it detects dynamically allocated objects
1754 that are not deallocated once the program terminates.
1755 
1756 The CPROVER tools support pointer typecasts. Most casts are supported,
1757 with the following exceptions:
1758 
1759 1. One notable exception is that pointers can only be accessed using a
1760  pointer type. The conversion of a pointer into an integer-type using
1761  a pointer typecast is not supported.
1762 
1763 2. Casts from integers to pointers yield a pointer that is either NULL
1764  (if the integer is zero) or that point into a special array for
1765  modeling [memory-mapped
1766  I/O](http://en.wikipedia.org/wiki/Memory-mapped_I/O). Such pointers
1767  are assumed not to overlap with any other objects. This is, of
1768  course, only sound if a corresponding range check is instrumented.
1769 
1770 3. Accesses to arrays via pointers that have the array subtype need to
1771  be well-aligned.
1772 
1773 ### Pointers in Open Programs
1774 
1775 It is frequently desired to validate an open program, i.e., a fragment
1776 of a program. Some variables are left undefined. In case an undefined
1777 pointer is dereferenced, CBMC assumes that the pointer points to a
1778 separate object of appropriate type with unbounded size. The object is
1779 assumed not to alias with any other object. This assumption may
1780 obviously be wrong in specific extensions of the program.
1781 
1782 \subsection man_modelling-floating-point Floating Point
1783 
1784 The CPROVER tools support bit-accurate reasoning about IEEE-754
1785 floating-point and fixed-point arithmetic. The C standard contains a
1786 number of areas of implementation-defined behaviour with regard to
1787 floating-point arithmetic:
1788 
1789 - CPROVER supports C99 Appendix F, and thus, the `__STD_IEC_559__`
1790  macro is defined. This means that the C `float` data type maps to
1791  IEEE 754 `binary32` and `double` maps to `binary64` and operations
1792  on them are as specified in IEEE 754.
1793 
1794 - `long double` can be configured to be `binary64`, `binary128`
1795  (quad precision) or a 96 bit type with 15 exponent bits and 80
1796  significant bits. The last is an approximation of Intel's x87
1797  extended precision double data type. As the C standard allows a
1798  implementations a fairly wide set of options for `long double`, it
1799  is best avoided for both portable code and bit-precise analysis. The
1800  default is to match the build architecture as closely as possible.
1801 
1802 - In CPROVER, floating-point expressions are evaluated at the 'natural
1803  precision' (the greatest of the arguments) and not at a
1804  higher precision. This corresponds to `FLT_EVAL_METHOD` set to `0`.
1805  Note that this is a different policy to some platforms (see below).
1806 
1807 - Expression contraction (for example, converting `x * y + c` to
1808  `fma(x,y,c)`) is not performed. In effect, the `FP_CONTRACT` pragma
1809  is always off.
1810 
1811 - Constant expressions are evaluated at \`run' time wherever possible
1812  and so will respect changes in the rounding mode. In effect, the
1813  `FENV_ACCESS` pragma is always off. Note that floating point
1814  constants are treated as doubles (unless they are followed by `f`
1815  when they are float) as specified in the C standard. `goto-cc`
1816  supports `-fsingle-precision-constant`, which allows
1817  the (non-standard) treatment of constants as floats.
1818 
1819 - Casts from int to float and float to float make use of the current
1820  rounding mode. Note that the standard requires that casts from float
1821  to int use round-to-zero (i.e. truncation).
1822 
1823 ### x86 and Other Platform-specific Issues
1824 
1825 Not all platforms have the same implementation-defined behaviour as
1826 CPROVER. This can cause mismatches between the verification environment
1827 and the execution environment. If this occurs, check the compiler manual
1828 for the choices listed above. There are two common cases that can cause
1829 these problems: 32-bit x86 code and the use of unsafe optimisations.
1830 
1831 Many compilers that target 32-bit x86 platforms employ a different
1832 evaluation method. The extended precision format of the x87 unit is used
1833 for all computations regardless of their native precision. Most of the
1834 time, this results in more accurate results and avoids edge cases.
1835 However, it can result in some obscure and difficult to debug behaviour.
1836 Checking if the `FLT_EVAL_METHOD` macro is non-zero (for these platforms
1837 it will typically be 2), should warn of these problems. Changing the
1838 compiler flags to use the SSE registers will resolve many of them, give
1839 a more standards-compliant platform and will likely perform better. Thus
1840 it is *highly* recommended. Use `-msse2 -mfpmath=sse` to enable this
1841 option for GCC. Visual C++ does not have an option to force the
1842 exclusive use of SSE instructions, but `/arch:SSE2` will pick SSE
1843 instructions "when it \[the compiler\] determines that it is faster to
1844 use the SSE/SSE2 instructions" and is thus better than `/arch:IA32`,
1845 which exclusively uses the x87 unit.
1846 
1847 The other common cause of discrepancy between CPROVER results and the
1848 actual platform are the use of unsafe optimisations. Some higher
1849 optimisation levels enable transformations that are unsound with respect
1850 to the IEEE-754 standard. Consult the compiler manual and disable any
1851 optimisations that are described as unsafe (for example, the GCC options
1852 `-ffast-math`). The options `-ffp-contract=off` (which replaces
1853 `-mno-fused-madd`), `-frounding-math` and `-fsignaling-nans` are needed
1854 for GCC to be strictly compliant with IEEE-754.
1855 
1856 ### Rounding Mode
1857 
1858 CPROVER supports the four rounding modes given by IEEE-754 1985; round
1859 to nearest (ties to even), round up, round down and round towards zero.
1860 By default, round to nearest is used. However, command line options
1861 (`--round-to-zero`, etc.) can be used to over-ride this. If more control
1862 is needed, CPROVER has models of `fesetround` (for POSIX systems) and
1863 `_controlfp` (for Windows), which can be used to change the rounding
1864 mode during program execution. Furthermore, the inline assembly commands
1865 fstcw/fnstcw/fldcw (on x86) can be used.
1866 
1867 The rounding mode is stored in the (thread local) variable
1868 `__CPROVER_rounding_mode`, but users are strongly advised not to use
1869 this directly.
1870 
1871 ### Math Library
1872 
1873 CPROVER implements some of `math.h`, including `fabs`, `fpclassify` and
1874 `signbit`. It has very limited support for elementary functions. Care
1875 must be taken when verifying properties that are dependent on these
1876 functions as the accuracy of implementations can vary considerably. The
1877 C compilers can (and many do) say that the accuracy of these functions
1878 is unknown.
1879 
1880 ### Fixed-point Arithmetic
1881 
1882 CPROVER also has support for fixed-point types. The `--fixedbv` flag
1883 switches `float`, `double` and `long double` to fixed-point types. The
1884 length of these types is platform specific. The upper half of each type
1885 is the integer component and the lower half is the fractional part.
1886 
1887 
1888 \section man_hard-soft Hardware and Software Equivalence and Co-Verification
1889 
1890 \subsection man_hard-soft-introduction Introduction
1891 
1892 A common hardware design approach employed by many companies is to first
1893 write a quick prototype that behaves like the planned circuit in a
1894 language like ANSI-C. This program is then used for extensive testing
1895 and debugging, in particular of any embedded software that will later on
1896 be shipped with the circuit. An example is the hardware of a cell phone
1897 and its software. After testing and debugging of the program, the actual
1898 hardware design is written using hardware description languages like
1899 [VHDL](http://en.wikipedia.org/wiki/VHDL) or
1900 [Verilog](http://en.wikipedia.org/wiki/Verilog).
1901 
1902 Thus, there are two implementations of the same design: one written in
1903 ANSI-C, which is written for simulation, and one written in register
1904 transfer level HDL, which is the actual product. The ANSI-C
1905 implementation is usually thoroughly tested and debugged.
1906 
1907 Due to market constraints, companies aim to sell the chip as soon as
1908 possible, i.e., shortly after the HDL implementation is designed. There
1909 is usually little time for additional debugging and testing of the HDL
1910 implementation. Thus, an automated, or nearly automated way of
1911 establishing the consistency of the HDL implementation is highly
1912 desirable.
1913 
1914 This motivates the verification problem: we want to verify the
1915 consistency of the HDL implementation, i.e., the product, [using the
1916 ANSI-C implementation as a
1917 reference](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=936243&userType=inst).
1918 Es­ta­bli­shing the consistency does not re­quire a formal
1919 specification. However, formal methods to verify either the hardware or
1920 software design are still desirable.
1921 
1922 ### Related Work
1923 
1924 There have been several attempts in the past to tackle the problem.
1925 [Semeria et al.](http://portal.acm.org/citation.cfm?id=513951) describe
1926 a tool for verifying the combinational equivalence of RTL-C and an HDL.
1927 They translate the C code into HDL and use standard equivalence checkers
1928 to establish the equivalence. The C code has to be very close to a
1929 hardware description (RTL level), which implies that the source and
1930 target have to be implemented in a very similar way. There are also
1931 variants of C specifically for this purpose. The [SystemC
1932 standard](http://en.wikipedia.org/wiki/SystemC) defines a subset of C++
1933 that can be used for synthesis. Further variants of ANSI-C for
1934 specifying hardware are SpecC and Handel C, among others.
1935 
1936 The concept of verifying the equivalence of a software implementation
1937 and a synchronous transition system was introduced by [Pnueli, Siegel,
1938 and Shtrichman](http://www.springerlink.com/content/ah5lpxaagrjp8ax2/).
1939 The C program is re­quired to be in a very specific form, since a
1940 mechanical translation is assumed.
1941 
1942 In 2000, [Currie, Hu, and
1943 Rajan](http://doi.acm.org/10.1145/337292.337339) transform DSP assembly
1944 language into an equation for the Stanford Validity Checker. The
1945 symbolic execution of programs for comparison with RTL is now common
1946 practice.
1947 
1948 The previous work focuses on a small subset of ANSI-C that is
1949 particularly close to register transfer language. Thus, the designer is
1950 often re­quired to rewrite the C program manually in order to comply
1951 with these constraints. We extend the methodology to handle the full set
1952 of ANSI-C language features. This is a challenge in the presence of
1953 complex, dynamic data structures and pointers that may dynamically point
1954 to multiple objects. Furthermore, our methodology allows arbitrary loop
1955 constructs.
1956 
1957 ### Further Material
1958 
1959 We provide a small \ref man_hard-soft-tutorial "tutorial" and a
1960 description on
1961 \ref man_hard-soft-inputs "how to synchronize inputs between the C model and the Verilog model".
1962 There is also a collection of
1963 [benchmark problems](http://www.cprover.org/hardware/sequential-equivalence/)
1964 available.
1965 
1966 Further Reading
1967 
1968 - [Hardware Verification using ANSI-C Programs as a
1969  Reference](http://www-2.cs.cmu.edu/~svc/papers/view-publications-ck03.html)
1970 - [Behavioral Consistency of C and Verilog Programs Using Bounded
1971  Model Checking](http://www-2.cs.cmu.edu/~svc/papers/view-publications-cky03.html)
1972 - [A Tool for Checking ANSI-C
1973  Programs](http://www-2.cs.cmu.edu/~svc/papers/view-publications-ckl2004.html)
1974 - [Checking Consistency of C and Verilog using Predicate Abstraction
1975  and Induction](http://www.kroening.com/publications/view-publications-kc2004.html)
1976 
1977 
1978 \subsection man_hard-soft-tutorial Tutorial
1979 
1980 ### Verilog vs. ANSI-C
1981 
1982 We assume that CBMC is installed on your system. If not so, follow
1983 \ref man_install-cbmc "these instructions".
1984 
1985 The following Verilog module implements a 4-bit counter
1986 (counter.v):
1987 
1988  module top(input clk);
1989 
1990  reg [3:0] counter;
1991 
1992  initial counter=0;
1993 
1994  always @(posedge clk)
1995  counter=counter+1;
1996 
1997  endmodule
1998 
1999 HW-CBMC can take Verilog modules as the one above as additional input.
2000 Similar as in co-simulation, the data in the Verilog modules is
2001 available to the C program by means of global variables. For the example
2002 above, the following C fragment shows the definition of the variable
2003 that holds the value of the `counter` register:
2004 
2005  struct module_top {
2006  unsigned int counter;
2007  };
2008 
2009  extern struct module_top top;
2010 
2011 Using this definition, the value of the `counter` register in the
2012 Verilog fragment above can be accessed as `top.counter`. Please note
2013 that the name of the variable **must** match the name of the top module.
2014 The C program only has a view of one state of the Verilog model. The
2015 Verilog model makes a transition once the function `next_timeframe()` is
2016 called.
2017 
2018 As CBMC performs Bounded Model Checking, the number of timeframes
2019 available for analysis must be bounded (SATABS). As it is desirable to
2020 change the bound to adjust it to the available computing capacity, the
2021 bound is given on the command line and not as part of the C program.
2022 This makes it easy to use only one C program for arbitrary bounds. The
2023 actual bound is available in the C program using the following
2024 declaration:
2025 
2026  extern const unsigned int bound;
2027 
2028 Also note that the fragment above declares a constant variable of struct
2029 type. Thus, the C program can only read the trace values and is not able
2030 to modify them. We will later on describe how to drive inputs of the
2031 Verilog module from within the C program.
2032 
2033 As described in previous chapters, assertions can be used to verify
2034 properties of the Verilog trace. As an example, the following program
2035 checks two values of the trace of the counter module (counter.c):
2036 
2037  void next_timeframe();
2038 
2039  struct module_top {
2040  unsigned int counter;
2041  };
2042 
2043  extern struct module_top top;
2044 
2045  int main() {
2046  next_timeframe();
2047  next_timeframe();
2048  assert(top.counter==2);
2049  next_timeframe();
2050  assert(top.counter==3);
2051  }
2052 
2053 The following CBMC command line checks these assertions with a bound of
2054 20:
2055 
2056  hw-cbmc counter.c counter.v --module top --bound 20
2057 
2058 Note that a specific version of CBMC is used, called `hw-cbmc`. The
2059 module name given must match the name of the module in the Verilog file.
2060 Multiple Verilog files can be given on the command line.
2061 
2062 The `--bound` parameter is not to be confused with the `--unwind`
2063 parameter. While the `--unwind` parameter specifies the maximum
2064 unwinding depth for loops within the C program, the `--bound` parameter
2065 specifies the number of times the transition relation of the Verilog
2066 module is to be unwound.
2067 
2068 ### Counterexamples
2069 
2070 For the given example, the verification is successful. If the first
2071 assertion is changed to
2072 
2073  assert(top.counter==10);
2074 
2075 and the bound on the command line is changed to 6, CBMC will produce a
2076 counterexample. CBMC produces two traces: One for the C program, which
2077 matches the traces described earlier, and a separate trace for the
2078 Verilog module. The values of the registers in the Verilog module are
2079 also shown in the C trace as part of the initial state.
2080 
2081  Initial State
2082  ----------------------------------------------------
2083  bound=6 (00000000000000000000000000000110)
2084  counter={ 0, 1, 2, 3, 4, 5, 6 }
2085 
2086  Failed assertion: assertion line 6 function main
2087 
2088  Transition system state 0
2089  ----------------------------------------------------
2090  counter=0 (0000)
2091 
2092  Transition system state 1
2093  ----------------------------------------------------
2094  counter=1 (0001)
2095 
2096  Transition system state 2
2097  ----------------------------------------------------
2098  counter=2 (0010)
2099 
2100  Transition system state 3
2101  ----------------------------------------------------
2102  counter=3 (0011)
2103 
2104  Transition system state 4
2105  ----------------------------------------------------
2106  counter=4 (0100)
2107 
2108  Transition system state 5
2109  ----------------------------------------------------
2110  counter=5 (0101)
2111 
2112  Transition system state 6
2113  ----------------------------------------------------
2114  counter=6 (0110)
2115 
2116 ### Using the Bound
2117 
2118 The following program is using the bound variable to check the counter
2119 value in all cycles:
2120 
2121  void next_timeframe();
2122  extern const unsigned int bound;
2123 
2124  struct module_top {
2125  unsigned int counter;
2126  };
2127 
2128  extern struct module_top top;
2129 
2130  int main() {
2131  unsigned cycle;
2132 
2133  for(cycle=0; cycle<bound; cycle++) {
2134  assert(top.counter==(cycle & 15));
2135  next_timeframe();
2136  }
2137  }
2138 
2139 CBMC performs bounds checking, and restricts the number of times that
2140 `next_timeframe()` can be called. SATABS does not re­quire a bound, and
2141 thus, `next_timeframe()` can be called arbitrarily many times.
2142 
2143 
2144 \subsection man_hard-soft-mapping Mapping Variables
2145 
2146 ### Mapping Variables within the Module Hierarchy
2147 
2148 Verilog modules are hierarchical. The `extern` declarations shown above
2149 only allow reading the values of signals and registers that are in the
2150 top module. In order to read values from sub-modules, CBMC uses
2151 structures.
2152 
2153 As an example, consider the following Verilog file
2154 (heirarchy.v):
2155 
2156  module counter(input clk, input [7:0] increment);
2157 
2158  reg [7:0] counter;
2159 
2160  initial counter=0;
2161 
2162  always @(posedge clk)
2163  counter=counter+increment;
2164 
2165  endmodule
2166 
2167  module top(input clk);
2168 
2169  counter c1(clk, 1);
2170  counter c2(clk, 2);
2171 
2172  endmodule
2173 
2174 The file has two modules: a top module and a counter module. The counter
2175 module is instantiated twice within the top module. A reference to the
2176 register `counter` within the C program would be ambiguous, as the two
2177 module instances have separate instances of the register. CBMC and
2178 SATABS use the following data structures for this example:
2179 
2180  void next_timeframe();
2181  extern const unsigned int bound;
2182 
2183  struct counter {
2184  unsigned char increment;
2185  unsigned char counter;
2186  };
2187 
2188  struct module_top {
2189  struct module_counter c1, c2;
2190  };
2191 
2192  extern struct module_top top;
2193 
2194  int main() {
2195  next_timeframe();
2196  next_timeframe();
2197  next_timeframe();
2198  assert(top.c1.counter==3);
2199  assert(top.c2.counter==6);
2200  }
2201 
2202 The `main` function reads both counter values for cycle 3. A deeper
2203 hierarchy (modules in modules) is realized by using additional structure
2204 members. Writing these data structures for large Verilog designs is
2205 error prone, and thus, HW-CBMC can generate them automatically. The
2206 declarations above are generated using the command line
2207 
2208  hw-cbmc --gen-interface --module top hierarchy.v
2209 
2210 ### Mapping Verilog Vectors to Arrays or Scalars
2211 
2212 In Verilog, a definition such as
2213 
2214  wire [31:0] x;
2215 
2216 can be used for arithmetic (as in `x+10`) and as array of Booleans (as
2217 in `x[2]`). ANSI-C does not allow both, so when mapping variables from
2218 Verilog to C, the user has to choose one option for each such variable.
2219 As an example, the C declaration
2220 
2221  unsigned int x;
2222 
2223 will allow using `x` in arithmetic expressions, while the C declaration
2224 
2225  __CPROVER_bool x[32];
2226 
2227 will allow accessing the individual bits of `x` using the syntax
2228 `x[bit]`. The `--gen-interface` option of HW-CBMC will generate the
2229 first variant if the vector has the same size as one of the standard
2230 integer types, and will use the `__CPROVER_bitvector[]` type if not
2231 so. This choice can be changed by adjusting the declaration accordingly.
2232 Note that both SpecC and SystemC offer bit-extraction operators, which
2233 means that it unnecessary to use the declaration as array in order to
2234 access individual bits of a vector.
2235 
2236 \subsection man_hard-soft-inputs Synchronizing Inputs
2237 
2238 ### Driving Primary Inputs
2239 
2240 The examples in the \ref man_hard-soft-tutorial "tutorial" are trivial
2241 in the sense that the model has only one possible trace. The initial
2242 state is deterministic, and there is only one possible transition, so
2243 the verification problem can be solved by testing a single run. In
2244 contrast, consider the following Verilog module:
2245 
2246  module top(input clk, input i);
2247 
2248  reg [3:0] counter;
2249 
2250  initial counter=0;
2251 
2252  always @(posedge clk)
2253  if(i)
2254  counter=counter+1;
2255 
2256  endmodule
2257 
2258 The module above has an input named `i`. The top-level inputs of the
2259 Verilog design have to be generated by the C program. This is done by
2260 assigning the desired values to the corresponding struct member, and
2261 then calling the `set_inputs()` function before calling
2262 `next_timeframe()`. Consider the following example:
2263 
2264  void next_timeframe();
2265  void set_inputs();
2266  extern const unsigned int bound;
2267 
2268  struct module_top {
2269  unsigned int counter;
2270  _Bool i;
2271  };
2272 
2273  extern struct module_top top;
2274 
2275  int main() {
2276  assert(top.counter==0);
2277 
2278  top.i=1;
2279  set_inputs(); next_timeframe();
2280  assert(top.counter==1);
2281 
2282  top.i=1;
2283  set_inputs(); next_timeframe();
2284  assert(top.counter==2);
2285 
2286  top.i=0;
2287  set_inputs(); next_timeframe();
2288  assert(top.counter==2);
2289  }
2290 
2291 As an example, consider a Verilog module that has a signal `reset` as an
2292 input, which is active-low. The following C fragment drives this input
2293 to be active in the first cycle, and not active in any subsequent cycle:
2294 
2295  top.resetn=0;
2296  set_inputs(); next_timeframe();
2297 
2298  for(i=1; i<bound; i++) {
2299  top.resetn=1;
2300  set_inputs(); next_timeframe();
2301  }
2302 
2303 Note that the value of the input must be set *before* calling
2304 `next_timeframe()`. The effect of the input values on values derived in
2305 a combinatorial way is immediately visible. The effect on clocked values
2306 becomes visible in the next time frame.
2307 
2308 ### Using Nondeterminism
2309 
2310 The examples above use particular, constant values to drive the primary
2311 inputs. In order to check the behavior of the Verilog model for more
2312 than one specific input, use \ref man_modelling-nondet "nondeterminism".
2313 
2314 
2315 \section man_instrumentation Build Systems, Libraries, and Instrumentation
2316 
2317 \subsection man_instrumentation-libraries Build Systems and Libraries
2318 
2319 ### The Problem
2320 
2321 Similar to unit testing, the model checking approach requires the user
2322 to clearly define what parts of the program should be tested and what
2323 the behaviour of these parts is. This requirement has following reasons:
2324 
2325 - Despite recent advances, the size of the programs that model
2326  checkers can cope with is still restricted.
2327 
2328 - Typically, you want to verify *your* program and not the libraries
2329  or the operating that it uses (the correctness of these libraries
2330  and the OS us usually addressed separately).
2331 
2332 - CBMC and SATABS cannot verify binary libraries.
2333 
2334 - CBMC and SATABS does not provide a model for the hardware (e.g.,
2335  hard disk, input/output devices) the tested program runs on. Since
2336  CBMC and SATABS are supposed to examine the behavior of the tested
2337  program for **all** possible inputs and outputs, it is reasonable to
2338  model input and output by means of non-deterministic choice.
2339 
2340 ### Further Reading
2341 
2342 Existing software projects usually do not come in a single source file
2343 that may simply be passed to a model checker, but is a collection of
2344 files held together by a build system. The extraction of models from
2345 such a build system using goto-cc is described
2346 \ref man_instrumentation-goto-cc "here".
2347 
2348 \subsection man_instrumentation-goto-cc Integration into Build Systems with goto-cc
2349 
2350 Existing software projects usually do not come in a single source file
2351 that may simply be passed to a model checker. They rather come in a
2352 multitude of source files in different directories and refer to external
2353 libraries and system-wide options. A *build system* then collects the
2354 configuration options from the system and compiles the software
2355 according to build rules.
2356 
2357 The most prevalent build tool on Unix (-based) systems surely is the
2358 `make` utility. This tool uses build rules given in a *Makefile* that
2359 comes with the software sources. Running software verification tools on
2360 projects like these is greatly simplified by a compiler that first
2361 collects all the necessary models into a single model file.
2362 [goto-cc](http://www.cprover.org/goto-cc/) is such a model file
2363 extractor, which can seamlessly replace `gcc` and `cl.exe` in Makefiles.
2364 The normal build system for the project may be used to build the
2365 software, but the outcome will be a model file with suitable detail for
2366 verification, as opposed to a flat executable program. Note that goto-cc
2367 comes in different variants depending on the compilation environment.
2368 These variants are described [here](goto-cc-variants.shtml).
2369 
2370 ### Example: Building wu-ftpd
2371 
2372 This example assumes a Unix-like machine.
2373 
2374 1. Download the sources of wu-ftpd from
2375  [here](ftp://ftp.wu-ftpd.org/pub/wu-ftpd/wu-ftpd-current.tar.gz).
2376 
2377 2. Unpack the sources by running ` tar xfz wu-ftpd-current.tar.gz`
2378 
2379 3. Change to the source directory, by entering, e.g.,
2380  `cd wu-ftpd-2.6.2`
2381 
2382 4. Configure the project for verification by running
2383 
2384  `./configure YACC=byacc CC=goto-cc --host=none-none-none`
2385 
2386 5. Build the project by running `make`. This creates multiple model
2387  files in the `src` directory. Among them is a model for the main
2388  executable `ftpd`.
2389 
2390 6. Run a model-checker, e.g., CBMC, on the model file:
2391 
2392  `cbmc src/ftpd`
2393 
2394  CBMC automatically recognizes that the file is a goto binary.
2395 
2396 ### Important Notes
2397 
2398 More elaborate build or configuration scripts often make use of features
2399 of the compiler or the system library to detect configuration options
2400 automatically, e.g., in a `configure` script. Replacing `gcc` by goto-cc
2401 at this stage may confuse the script, or detect wrong options. For
2402 example, missing library functions do not cause goto-cc to throw an
2403 error (only to issue a warning). Because of this, configuration scripts
2404 sometimes falsely assume the availability of a system function or
2405 library.
2406 
2407 In the case of this or similar problems, it is more advisable to
2408 configure the project using the normal routine, and replacing the
2409 compiler setting manually in the generated Makefiles, e.g., by replacing
2410 lines like `CC=gcc` by `CC=goto-cc`.
2411 
2412 A helpful command that accomplishes this task successfully for many
2413 projects is the following:
2414 
2415  for i in `find . -name Makefile`; do   sed -e 's/^\(\s*CC[ \t]*=\)\(.*$\)/\1goto-cc/g' -i $i done
2416 
2417 Here are additional examples on how to use goto-cc:
2418 
2419 - \ref man_goto-cc-linux "Linux Kernel"
2420 - \ref man_goto-cc-apache "Apache HTTPD"
2421 
2422 A description of how to integrate goto-cc into Microsoft's Visual Studio
2423 is \ref man_instrumentation-vs "here".
2424 
2425 ### Linking Libraries
2426 
2427 Some software projects come with their own libraries; also, the goal may
2428 be to analyze a library by itself. For this purpose it is possible to
2429 use goto-cc to link multiple model files into a library of model files.
2430 An object file can then be linked against this model library. For this
2431 purpose, goto-cc also features a linker mode.
2432 
2433 To enable this linker mode, create a link to the goto-cc binary by the
2434 name of goto-ld (Linux and Mac) or copy the goto-cc binary to
2435 goto-link.exe (Windows). The goto-ld tool can now be used as a seamless
2436 replacement for the `ld` tool present on most Unix (-based) systems and
2437 for the `link` tool on Windows.
2438 
2439 The default linker may need to be replaced by `goto-ld` or
2440 `goto-link.exe` in the build script, which can be achieved in much the
2441 same way as replacing the compiler.
2442 
2443 
2444 \subsection man_instrumentation-vs Integration into Visual Studio 2008 to 2012
2445 
2446 Visual Studio version 2008 onwards comes with a new XML-based build
2447 system called
2448 [MSBuild](http://msdn.microsoft.com/en-us/library/ms171452.aspx).
2449 The MSBuild system is also activated when triggering a build from the
2450 Visual Studio GUI. The project files created by the Visual Studio GUI
2451 are used as input by the MSBuild tool.
2452 
2453 The MSBuild system can be used to generate goto-binaries from your
2454 Visual Studio project as follows:
2455 
2456 1. Install the `goto-cl.exe` and `goto-link.exe` binaries in some
2457  directory that is contained in the PATH environment variable.
2458 
2459 2. Add a configuration for the goto-cc build for your project in the
2460  configuration manager, named "goto-cc".
2461 
2462 3. Open the Visual Studio Command Prompt (in the Tools menu).
2463 
2464 4. Locate the directory that contains the project. Change into this
2465  directory using "CD".
2466 
2467 5. Type
2468 
2469  msbuild /p:CLToolExe=goto-cl.exe /p:LinkToolExe=goto-link.exe    /p:Flavor=goto-cc /p:Platform=x86
2470 
2471  The platform can be adjusted as required; the "Flavor" given should
2472  match the configuration that was created earlier.
2473 
2474 Note that the recent versions of goto-cc also support file names with
2475 non-ASCII (Unicode) characters on Windows platforms.
2476 
2477 \subsection man_instrumentation-goto-cc-variants Variants of goto-cc
2478 
2479 The goto-cc utility comes in several variants, summarised in the
2480 following table.
2481 
2482 Executable | Environment | Preprocessor
2483 --------------|-------------------------------------------------------------------------|-------
2484  goto-cc | [gcc](http://gcc.gnu.org/) (control-flow graph only) | gcc -E
2485  goto-gcc | [gcc](http://gcc.gnu.org/) ("hybrid" executable) | gcc -E
2486  goto-armcc | [ARM RVDS](http://arm.com/products/tools/software-tools/rvds/index.php) | armcc -E
2487  goto-cl | [Visual Studio](http://www.microsoft.com/visualstudio/) | cl /E
2488  goto-cw | [Freescale CodeWarrior](http://www.freescale.com/webapp/sps/site/homepage.jsp?code=CW_HOME) | mwcceppc
2489 
2490 The primary difference between the variants is the preprocessor called.
2491 Furthermore, the language recognized varies slightly. The variants can
2492 be obtained by simply renaming the goto-cc executable. On Linux/MacOS,
2493 the variants can be obtained by creating a symbolic link.
2494 
2495 The "hybrid" executables contain both the control-flow graph for
2496 verification purposes and the usual, executable machine code.
2497 
2498 \subsection man_instrumentation-architecture Architectural Settings
2499 
2500 The behavior of a C/C++ program depends on a number of parameters that
2501 are specific to the architecture the program was compiled for. The three
2502 most important architectural parameters are:
2503 
2504 - The width of the various scalar types; e.g., compare the value of
2505  `sizeof(long int)` on various machines.
2506 - The width of pointers; e.g., compare the value of `sizeof(int *)` on
2507  various machines.
2508 - The [endianness](http://en.wikipedia.org/wiki/Endianness) of
2509  the architecture.
2510 
2511 In general, the CPROVER tools attempt to adopt the settings of the
2512 particular architecture the tool itself was compiled for. For example,
2513 when running a 64 bit binary of CBMC on Linux, the program will be
2514 processed assuming that `sizeof(long int)==8`.
2515 
2516 As a consequence of these architectural parameters, you may observe
2517 different verification results for an identical program when running
2518 CBMC on different machines. In order to get consistent results, or when
2519 aiming at validating a program written for a different platform, the
2520 following command-line arguments can be passed to the CPROVER tools:
2521 
2522 - The word-width can be set with `--16`, `--32`, `--64`.
2523 - The endianness can be defined with `--little-endian` and
2524  `--big-endian`.
2525 
2526 When using a goto binary, CBMC and the other tools read the
2527 configuration from the binary, i.e., the setting when running goto-cc is
2528 the one that matters; the option given to the model checker is ignored
2529 in this case.
2530 
2531 In order to see the effect of the options `--16`, `--32` and `--64`,
2532 pass the following program to CBMC:
2533 
2534  #include <stdio.h>
2535  #include <assert.h>
2536 
2537  int main() {
2538  printf("sizeof(long int): %d\n", (int)sizeof(long int));
2539  printf("sizeof(int *): %d\n", (int)sizeof(int *));
2540  assert(0);
2541  }
2542 
2543 The counterexample trace contains the strings printed by the `printf`
2544 command.
2545 
2546 The effects of endianness are more subtle. Try the following program
2547 with `--big-endian` and `--little-endian`:
2548 
2549  #include <stdio.h>
2550  #include <assert.h>
2551 
2552  int main() {
2553  int i=0x01020304;
2554  char *p=(char *)&i;
2555  printf("Bytes of i: %d, %d, %d, %d\n", p[0], p[1], p[2], p[3]);
2556  assert(0);
2557  }
2558 
2559 
2560 \subsection man_instrumentation-properties Property Instrumentation
2561 
2562 ### Properties
2563 
2564 We have mentioned *properties* several times so far, but we never
2565 explained *what* kind of properties CBMC and SATABS can verify. We cover
2566 this topic in more detail in this section.
2567 
2568 Both CBMC and SATABS use
2569 [assertions](http://en.wikipedia.org/wiki/Assertion_(computing)) to
2570 specify program properties. Assertions are properties of the state of
2571 the program when the program reaches a particular program location.
2572 Assertions are often written by the programmer by means of the `assert`
2573 macro.
2574 
2575 In addition to the assertions written by the programmer, assertions for
2576 specific properties can also be generated automatically by CBMC and
2577 SATABS, often relieving the programmer from writing "obvious"
2578 assertions.
2579 
2580 CBMC and SATABS come with an assertion generator called
2581 `goto-instrument`, which performs a conservative [static
2582 analysis](http://en.wikipedia.org/wiki/Static_code_analysis) to
2583 determine program locations that potentially contain a bug. Due to the
2584 imprecision of the static analysis, it is important to emphasize that
2585 these generated assertions are only *potential* bugs, and that the Model
2586 Checker first needs to confirm that they are indeed genuine bugs.
2587 
2588 The assertion generator can generate assertions for the verification of
2589 the following properties:
2590 
2591 - **Buffer overflows.** For each array access, check whether the upper
2592  and lower bounds are violated.
2593 - **Pointer safety.** Search for `NULL`-pointer dereferences or
2594  dereferences of other invalid pointers.
2595 
2596 - **Division by zero.** Check whether there is a division by zero in
2597  the program.
2598 
2599 - **Not-a-Number.** Check whether floating-point computation may
2600  result in [NaNs](http://en.wikipedia.org/wiki/NaN).
2601 
2602 - **Unitialized local.** Check whether the program uses an
2603  uninitialized local variable.
2604 
2605 - **Data race.** Check whether a concurrent program accesses a shared
2606  variable at the same time in two threads.
2607 
2608 We refrain from explaining the properties above in detail. Most of them
2609 relate to behaviors that are left undefined by the respective language
2610 semantics. For a discussion on why these behaviors are usually very
2611 undesirable, read [this](http://blog.regehr.org/archives/213) blog post
2612 by John Regehr.
2613 
2614 All the properties described above are *reachability* properties. They
2615 are always of the form
2616 
2617 "*Is there a path through the program such that property ... is
2618 violated?*"
2619 
2620 The counterexamples to such properties are always program paths. Users
2621 of the Eclipse plugin can step through these counterexamples in a way
2622 that is similar to debugging programs. The installation of this plugin
2623 is explained \ref man_install-eclipse "here".
2624 
2625 ### Using goto-instrument
2626 
2627 The goto-instrument static analyzer operates on goto-binaries, which is
2628 a binary representation of control-flow graphs. The goto-binary is
2629 extracted from program source code using goto-cc, which is explained
2630 \ref man_instrumentation-goto-cc "here". Given a goto-program,
2631 goto-instrument operates as follows:
2632 
2633 1. A goto-binary is read in.
2634 2. The specified static analyses are performed.
2635 3. Any potential bugs found are transformed into corresponding
2636  assertions, and are added into the program.
2637 4. A new goto-binary (with assertions) is written to disc.
2638 
2639 As an example, we begin with small C program we call `expr.c` (taken
2640 from [here](http://www.spinroot.com/uno/)):
2641 
2642  int *ptr;
2643 
2644  int main(void) {
2645  if (ptr)
2646  *ptr = 0;
2647  if (!ptr)
2648  *ptr = 1;
2649  }
2650 
2651 The program contains an obvious NULL-pointer dereference. We first
2652 compile the example program with goto-cc and then instrument the
2653 resulting goto-binary with pointer checks.
2654 
2655  goto-cc expr.c -o in.gb   goto-instrument in.gb out.gb --pointer-check
2656 
2657 We can now get a list of the assertions that have been generated as
2658 follows:
2659 
2660  goto-instrument out.gb --show-properties
2661 
2662 Using either CBMC or SATABS on `out.gb`, we can obtain a counterexample
2663 trace for the NULL-pointer dereference:
2664 
2665  cbmc out.gb
2666 
2667 The goto-instrument program supports the following checks:
2668 
2669 Flag | Check
2670 -----------------------------|----------------------------------------------
2671 `--no-assertions` | ignore user assertions
2672 `--bounds-check` | add array bounds checks
2673 `--div-by-zero-check` | add division by zero checks
2674 `--pointer-check` | add pointer checks
2675 `--signed-overflow-check` | add arithmetic over- and underflow checks
2676 `--unsigned-overflow-check` | add arithmetic over- and underflow checks
2677 `--undefined-shift-check` | add range checks for shift distances
2678 `--nan-check` | add floating-point NaN checks
2679 `--uninitialized-check` | add checks for uninitialized locals (experimental)
2680 `--error-label label` | check that given label is unreachable
2681 
2682 \subsection man_instrumentation-api The CPROVER API Reference
2683 
2684 The following sections summarize the functions available to programs
2685 that are passed to the CPROVER tools.
2686 
2687 ### Functions
2688 
2689 #### \_\_CPROVER\_assume, \_\_CPROVER\_assert, assert
2690 
2691  void __CPROVER_assume(_Bool assumption);
2692  void __CPROVER_assert(_Bool assertion, const char *description);
2693  void assert(_Bool assertion);
2694 
2695 The function **\_\_CPROVER\_assume** adds an expression as a constraint
2696 to the program. If the expression evaluates to false, the execution
2697 aborts without failure. More detail on the use of assumptions is in the
2698 section on [Assumptions and Assertions](modeling-assertions.shtml).
2699 
2700 #### \_\_CPROVER\_same\_object, \_\_CPROVER\_POINTER\_OBJECT, \_\_CPROVER\_POINTER\_OFFSET, \_\_CPROVER\_DYNAMIC\_OBJECT
2701 
2702  _Bool __CPROVER_same_object(const void *, const void *);
2703  unsigned __CPROVER_POINTER_OBJECT(const void *p);
2704  signed __CPROVER_POINTER_OFFSET(const void *p);
2705  _Bool __CPROVER_DYNAMIC_OBJECT(const void *p);
2706 
2707 The function **\_\_CPROVER\_same\_object** returns true if the two
2708 pointers given as arguments point to the same object. The function
2709 **\_\_CPROVER\_POINTER\_OFFSET** returns the offset of the given pointer
2710 relative to the base address of the object. The function
2711 **\_\_CPROVER\_DYNAMIC\_OBJECT** returns true if the pointer passed as
2712 arguments points to a dynamically allocated object.
2713 
2714 #### \_\_CPROVER\_is\_zero\_string, \_\_CPROVER\_zero\_string\_length, \_\_CPROVER\_buffer\_size
2715 
2716  _Bool __CPROVER_is_zero_string(const void *);
2717  __CPROVER_size_t __CPROVER_zero_string_length(const void *);
2718  __CPROVER_size_t __CPROVER_buffer_size(const void *);
2719 
2720 #### \_\_CPROVER\_initialize
2721 
2722  void __CPROVER_initialize(void);
2723 
2724 The function **\_\_CPROVER\_initialize** computes the initial state of
2725 the program. It is called prior to calling the main procedure of the
2726 program.
2727 
2728 #### \_\_CPROVER\_input, \_\_CPROVER\_output
2729 
2730  void __CPROVER_input(const char *id, ...);
2731  void __CPROVER_output(const char *id, ...);
2732 
2733 The functions **\_\_CPROVER\_input** and **\_\_CPROVER\_output** are
2734 used to report an input or output value. Note that they do not generate
2735 input or output values. The first argument is a string constant to
2736 distinguish multiple inputs and outputs (inputs are typically generated
2737 using nondeterminism, as described [here](modeling-nondet.shtml)). The
2738 string constant is followed by an arbitrary number of values of
2739 arbitrary types.
2740 
2741 #### \_\_CPROVER\_cover
2742 
2743  void __CPROVER_cover(_Bool condition);
2744 
2745 This statement defines a custom coverage criterion, for usage with the
2746 [test suite generation feature](cover.shtml).
2747 
2748 #### \_\_CPROVER\_isnan, \_\_CPROVER\_isfinite, \_\_CPROVER\_isinf, \_\_CPROVER\_isnormal, \_\_CPROVER\_sign
2749 
2750  _Bool __CPROVER_isnan(double f);
2751  _Bool __CPROVER_isfinite(double f);
2752  _Bool __CPROVER_isinf(double f);
2753  _Bool __CPROVER_isnormal(double f);
2754  _Bool __CPROVER_sign(double f);
2755 
2756 The function **\_\_CPROVER\_isnan** returns true if the double-precision
2757 floating-point number passed as argument is a
2758 [NaN](http://en.wikipedia.org/wiki/NaN).
2759 
2760 The function **\_\_CPROVER\_isfinite** returns true if the
2761 double-precision floating-point number passed as argument is a finite
2762 number.
2763 
2764 This function **\_\_CPROVER\_isinf** returns true if the
2765 double-precision floating-point number passed as argument is plus or
2766 minus infinity.
2767 
2768 The function **\_\_CPROVER\_isnormal** returns true if the
2769 double-precision floating-point number passed as argument is a normal
2770 number.
2771 
2772 This function **\_\_CPROVER\_sign** returns true if the double-precision
2773 floating-point number passed as argument is negative.
2774 
2775 #### \_\_CPROVER\_abs, \_\_CPROVER\_labs, \_\_CPROVER\_fabs, \_\_CPROVER\_fabsl, \_\_CPROVER\_fabsf
2776 
2777  int __CPROVER_abs(int x);
2778  long int __CPROVER_labs(long int x);
2779  double __CPROVER_fabs(double x);
2780  long double __CPROVER_fabsl(long double x);
2781  float __CPROVER_fabsf(float x);
2782 
2783 These functions return the absolute value of the given argument.
2784 
2785 #### \_\_CPROVER\_array\_equal, \_\_CPROVER\_array\_copy, \_\_CPROVER\_array\_set
2786 
2787  _Bool __CPROVER_array_equal(const void array1[], const void array2[]);
2788  void __CPROVER_array_copy(const void dest[], const void src[]);
2789  void __CPROVER_array_set(const void dest[], value);
2790 
2791 The function **\_\_CPROVER\_array\_equal** returns true if the values
2792 stored in the given arrays are equal. The function
2793 **\_\_CPROVER\_array\_copy** copies the contents of the array **src** to
2794 the array **dest**. The function **\_\_CPROVER\_array\_set** initializes
2795 the array **dest** with the given value.
2796 
2797 #### Uninterpreted Functions
2798 
2799 Uninterpreted functions are documented \ref man_modelling-nondet "here".
2800 
2801 ### Predefined Types and Symbols
2802 
2803 #### \_\_CPROVER\_bitvector
2804 
2805  __CPROVER_bitvector [ expression ]
2806 
2807 This type is only available in the C frontend. It is used to specify a
2808 bit vector with arbitrary but fixed size. The usual integer type
2809 modifiers **signed** and **unsigned** can be applied. The usual
2810 arithmetic promotions will be applied to operands of this type.
2811 
2812 #### \_\_CPROVER\_floatbv
2813 
2814  __CPROVER_floatbv [ expression ] [ expression ]
2815 
2816 This type is only available in the C frontend. It is used to specify an
2817 IEEE-754 floating point number with arbitrary but fixed size. The first
2818 parameter is the total size (in bits) of the number, and the second is
2819 the size (in bits) of the mantissa, or significand (not including the
2820 hidden bit, thus for single precision this should be 23).
2821 
2822 #### \_\_CPROVER\_fixedbv
2823 
2824  __CPROVER_fixedbv [ expression ] [ expression ]
2825 
2826 This type is only available in the C frontend. It is used to specify a
2827 fixed-point bit vector with arbitrary but fixed size. The first
2828 parameter is the total size (in bits) of the type, and the second is the
2829 number of bits after the radix point.
2830 
2831 #### \_\_CPROVER\_size\_t
2832 
2833 The type of sizeof expressions.
2834 
2835 #### \_\_CPROVER\_rounding\_mode
2836 
2837  extern int __CPROVER_rounding_mode;
2838 
2839 This variable contains the IEEE floating-point arithmetic rounding mode.
2840 
2841 #### \_\_CPROVER\_constant\_infinity\_uint
2842 
2843 This is a constant that models a large unsigned integer.
2844 
2845 #### \_\_CPROVER\_integer, \_\_CPROVER\_rational
2846 
2847 **\_\_CPROVER\_integer** is an unbounded, signed integer type.
2848 **\_\_CPROVER\_rational** is an unbounded, signed rational number type.
2849 
2850 #### \_\_CPROVER\_memory
2851 
2852  extern unsigned char __CPROVER_memory[];
2853 
2854 This array models the contents of integer-addressed memory.
2855 
2856 #### \_\_CPROVER::unsignedbv&lt;N&gt; (C++ only)
2857 
2858 This type is the equivalent of **unsigned \_\_CPROVER\_bitvector\[N\]**
2859 in the C++ front-end.
2860 
2861 #### \_\_CPROVER::signedbv&lt;N&gt; (C++ only)
2862 
2863 This type is the equivalent of **signed \_\_CPROVER\_bitvector\[N\]** in
2864 the C++ front-end.
2865 
2866 #### \_\_CPROVER::fixedbv&lt;N&gt; (C++ only)
2867 
2868 This type is the equivalent of **\_\_CPROVER\_fixedbv\[N,m\]** in the
2869 C++ front-end.
2870 
2871 ### Concurrency
2872 
2873 Asynchronous threads are created by preceding an instruction with a
2874 label with the prefix **\_\_CPROVER\_ASYNC\_**.
2875 
2876 \subsection man_goto-cc-linux goto-cc: Extracting Models from the Linux Kernel
2877 
2878 The Linux kernel code consists of more than 11 million lines of
2879 low-level C and is frequently used to evaluate static analysis
2880 techniques. In the following, we show how to extract models from Linux
2881 2.6.39.
2882 
2883 1. First of all, you will need to make sure you have around 100 GB of
2884  free disc space available.
2885 
2886 2. Download the Kernel sources at
2887  <http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.39.tar.bz2>.
2888 
2889 3. Now do
2890 
2891    `bunzip2 linux-2.6.39.tar.bz2`\
2892    `tar xvf linux-2.6.39.tar`\
2893    `cd linux-2.6.39`
2894 
2895 4. Now ensure that you can actually compile a kernel by doing
2896 
2897    `make defconfig`\
2898    `make`
2899 
2900  These steps need to succeed before you can try to extract models
2901  from the kernel.
2902 
2903 5. Now compile [gcc-wrap.c](gcc-wrap.c) and put the resulting binary
2904  into a directory that is in your PATH variable:
2905 
2906    `lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c`\
2907    `gcc gcc-wrap.c -o gcc-wrap`\
2908    `cp gcc-wrap ~/bin/`\
2909 
2910  This assumes that the directory `~/bin` exists and is in your
2911  PATH variable.
2912 
2913 6. Now change the variable CC in the kernel Makefile as follows:
2914 
2915  CC = ~/bin/gcc-wrap
2916 
2917 7. Now do
2918 
2919    make clean
2920    make
2921 
2922  This will re-compile the kernel, but this time retaining the
2923  preprocessed source files.
2924 
2925 8. You can now compile the preprocessed source files with goto-cc as
2926  follows:
2927 
2928  find ./ -name .tmp_*.i > source-file-list
2929  for a in `cat source-file-list` ; do
2930    goto-cc -c $a -o $a.gb
2931  done
2932 
2933  Note that it is important that the word-size of the kernel
2934  configuration matches that of goto-cc. Otherwise, compile-time
2935  assertions will fail, generating the error message "bit field size
2936  is negative". For a kernel configured for a 64-bit word-width, pass
2937  the option --64 to goto-cc.
2938 
2939 The resulting `.gb` files can be passed to any of the CPROVER tools.
2940 
2941 \subsection man_goto-cc-apache goto-cc: Extracting Models from the Apache HTTPD
2942 
2943 The [Apache HTTPD](http://httpd.apache.org/) is still the most
2944 frequently used web server. Together with the relevant libraries, it
2945 consists of around 0.4 million lines of C code. In the following, we
2946 show how to extract models from Apache HTTPD 2.4.2.
2947 
2948 1. First of all, we download the sources of Apache HTTPD and two
2949  supporting libraries and uncompress them:
2950 
2951  lwp-download http://www.mirrorservice.org/sites/ftp.apache.org/apr/apr-1.4.6.tar.bz2
2952  lwp-download http://www.mirrorservice.org/sites/ftp.apache.org/apr/apr-util-1.4.1.tar.bz2
2953  lwp-download http://mirror.catn.com/pub/apache/httpd/httpd-2.4.2.tar.bz2
2954  bunzip2 < apr-1.4.6.tar.bz2 | tar x
2955  bunzip2 < apr-util-1.4.1.tar.bz2 | tar x
2956  bunzip2 < httpd-2.4.2.tar.bz2 | tar x
2957 
2958 2. Now compile [gcc-wrap.c](gcc-wrap.c) and put the resulting binary
2959  into a directory that is in your PATH variable:
2960 
2961  lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c
2962  gcc gcc-wrap.c -o gcc-wrap
2963  cp gcc-wrap ~/bin/
2964 
2965  This assumes that the directory `~/bin` exists and is in your
2966  PATH variable.
2967 
2968 3. We now build the sources with gcc:
2969 
2970  (cd apr-1.4.6; ./configure; make CC=gcc-wrap)
2971  (cd apr-util-1.4.1; ./configure --with-apr=../apr-1.4.6 ; make CC=gcc-wrap)
2972  (cd httpd-2.4.2; ./configure --with-apr=../apr-1.4.6 --with-apr-util=../apr-util-1.4.1 ; make CC=gcc-wrap)
2973 
2974 4. You can now compile the preprocessed source files with goto-cc as
2975  follows:
2976 
2977  find ./ -name *.i > source-file-list
2978  for a in `cat source-file-list` ; do
2979    goto-cc -c $a -o $a.gb
2980  done
2981 
2982 The resulting `.gb` files can be passed to any of the CPROVER tools.