Requirements

The plugin has the following requirements:

  • GCC: 4.6 or later (it uses APIs that weren’t exposed to plugins in 4.5)

  • Python: tested with 2.7 and 3.2; it may work with earlier versions

  • “six”: The libcpychecker code uses the “six” Python compatibility library to smooth over Python 2 vs Python 3 differences, both at build-time and run-time:

  • “pygments”: The libcpychecker code uses the “pygments” Python syntax-highlighting library when writing out error reports:

Basic usage of the plugin

To build the plugin, run:

make plugin

To build the plugin and run the selftests, run:

make

You can also use:

make demo

to demonstrate the new compiler errors.

There isn’t a well-defined process yet for installing the plugin (though the rpm specfile in the source tree contains some work-in-progress towards this).

Some notes on GCC plugins can be seen at http://gcc.gnu.org/wiki/plugins and http://gcc.gnu.org/onlinedocs/gccint/Plugins.html

Once you’ve built the plugin, you can invoke a Python script like this:

gcc -fplugin=./python.so -fplugin-arg-python-script=PATH_TO_SCRIPT.py OTHER_ARGS

and have it run your script as the plugin starts up.

Alternatively, you can run a one-shot Python command like this:

gcc -fplugin=./python.so -fplugin-arg-python-command="python code" OTHER_ARGS

such as:

gcc -fplugin=./python.so -fplugin-arg-python-command="import sys; print(sys.path)" OTHER_ARGS

The plugin automatically adds the absolute path to its own directory to the end of its sys.path, so that it can find support modules, such as gccutils.py and libcpychecker.

There is also a helper script, gcc-with-python, which expects a python script as its first argument, then regular gcc arguments:

./gcc-with-python PATH_TO_SCRIPT.py other args follow

For example, this command will use graphviz to draw how GCC “sees” the internals of each function in test.c (within its SSA representation):

./gcc-with-python examples/show-ssa.py test.c

Most of the rest of this document describes the Python API visible for scripting.

The plugin GCC’s various types as Python objects, within a “gcc” module. You can see the API by running the following within a script:

import gcc
help(gcc)

To make this easier, there’s a script to do this for you:

./gcc-python-docs

from where you can review the built-in documentation strings (this document may be easier to follow though).

The exact API is still in flux: and may well change (this is an early version of the code; we may have to change things as GCC changes in future releases also).

Debugging your script

You can place a forced breakpoint in your script using this standard Python one-liner:

import pdb; pdb.set_trace()

If Python reaches this location it will interrupt the compile and put you within the pdb interactive debugger, from where you can investigate.

See http://docs.python.org/library/pdb.html#debugger-commands for more information.

If an exception occurs during Python code, and isn’t handled by a try/except before returning into the plugin, the plugin prints the traceback to stderr and treats it as an error:

/home/david/test.c: In function ‘main’:
/home/david/test.c:28:1: error: Unhandled Python exception raised within callback
Traceback (most recent call last):
  File "test.py", line 38, in my_pass_execution_callback
    dot = gccutils.tree_to_dot(fun)
NameError: global name 'gccutils' is not defined

(In this case, it was a missing import statement in the script)

GCC reports errors at a particular location within the source code. For an unhandled exception such as the one above, by default, the plugin reports the error as occurring as the top of the current source function (or the last location within the current source file for passes and callbacks that aren’t associated with a function).

You can override this using gcc.set_location:

gcc.set_location(loc)

Temporarily overrides the error-reporting location, so that if an exception occurs, it will use this gcc.Location, rather than the default. This may be of use when debugging tracebacks from scripts. The location is reset each time after returning from Python back to the plugin, after printing any traceback.

Accessing parameters

gcc.argument_dict

Exposes the arguments passed to the plugin as a dictionary.

For example, running:

gcc -fplugin=python.so \
    -fplugin-arg-python-script=test.py \
    -fplugin-arg-python-foo=bar

with test.py containing:

import gcc
print(gcc.argument_dict)

has output:

{'script': 'test.py', 'foo': 'bar'}
gcc.argument_tuple

Exposes the arguments passed to the plugin as a tuple of (key, value) pairs, so you have ordering. (Probably worth removing, and replacing argument_dict with an OrderedDict instead; what about duplicate args though?)

Adding new passes to the compiler

You can create new compiler passes by subclassing the appropriate gcc.Pass subclasss. For example, here’s how to wire up a new pass that displays the control flow graph of each function:

# Show the GIMPLE form of each function, using GraphViz
import gcc
from gccutils import get_src_for_loc, cfg_to_dot, invoke_dot

# We'll implement this as a custom pass, to be called directly after the
# builtin "cfg" pass, which generates the CFG:

class ShowGimple(gcc.GimplePass):
    def execute(self, fun):
        # (the CFG should be set up by this point, and the GIMPLE is not yet
        # in SSA form)
        if fun and fun.cfg:
            dot = cfg_to_dot(fun.cfg, fun.decl.name)
            # print dot
            invoke_dot(dot)

ps = ShowGimple(name='show-gimple')
ps.register_after('cfg')

For more information, see Creating new optimization passes

Wiring up callbacks

The other way to write scripts is to register callback functions to be called when various events happen during compilation, such as using gcc.PLUGIN_PASS_EXECUTION to piggyback off of an existing GCC pass.

gcc.register_callback(event_id, function[, extraargs], **kwargs)

Wire up a python function as a callback. It will be called when the given event occurs during compilation. For some events, the callback will be called just once; for other events, the callback is called once per function within the source code being compiled. In the latter case, the plugin passes a gcc.Function instance as a parameter to your callback, so that you can work on it:

def my_pass_execution_callback(*args, **kwargs):
     print('my_pass_execution_callback was called: args=%r  kwargs=%r'
           % (args, kwargs))

import gcc
gcc.register_callback(gcc.PLUGIN_PASS_EXECUTION,
                      my_pass_execution_callback)

You can pass additional arguments when registering the callback - they will be passed to the callback after any normal arguments. This is denoted in the descriptions of events below by *extraargs.

You can also supply keyword arguments: they will be passed on as keyword arguments to the callback. This is denoted in the description of events below by **kwargs.

The various events are exposed as constants within the gcc module and directly wrap GCC’s plugin mechanism. The exact arguments you get aren’t well-documented there, and may be subject to change. I’ve tried to document what I’ve seen in GCC 4.6 here, but it’s worth experimenting and printing args and kwargs as shown above.

Currently useful callback events

gcc.PLUGIN_PASS_EXECUTION

Called when GCC is about to run one of its passes.

Arguments passed to the callback are:

(ps, fun, *extraargs, **kwargs)

where ps is a gcc.Pass and fun is a gcc.Function. Your callback will typically be called many times: there are many passes, and each can be invoked zero or more times per function (in the code being compiled)

More precisely, some passes have a “gate check”: the pass first checks a condition, and only executes if the condition is true.

Any callback registered with gcc.PLUGIN_PASS_EXECUTION will get called if this condition succeeds.

The actual work of the pass is done after the callbacks return.

In pseudocode:

if pass.has_gate_condition:
    if !pass.test_gate_condition():
       return
invoke_all_callbacks()
actually_do_the_pass()

For passes working on individual functions, all of the above is done per-function.

To connect to a specific pass, you can simply add a conditional based on the name of the pass:

def my_callback(ps, fun):
    if ps.name != '*warn_function_return':
        # Not the pass we want
        return
    # Do something here
    print(fun.decl.name)

gcc.register_callback(gcc.PLUGIN_PASS_EXECUTION,
                      my_callback)
gcc.PLUGIN_PRE_GENERICIZE

Arguments passed to the callback are:

(fndecl, *extraargs, **kwargs)

where fndecl is a gcc.Tree representing a function declaration within the source code being compiled.

gcc.PLUGIN_FINISH_UNIT

Called when GCC has finished compiling a particular translation unit.

Arguments passed to the callback are:

(*extraargs, **kwargs)

Generating custom errors and warnings

gcc.warning(location, option, message)

Emits a compiler warning at the given gcc.Location.

The warning is controlled by the given gcc.Option.

For example, given this Python code:

gcc.warning(func.start, gcc.Option('-Wformat'), 'Incorrect formatting')

if the given warning is enabled, a warning will be printed to stderr:

$ ./gcc-with-python script.py input.c
input.c:25:1: warning: incorrect formatting [-Wformat]

If the given warning is being treated as an error (through the usage of -Werror), then an error will be printed:

$ ./gcc-with-python -Werror script.py input.c
input.c:25:1: error: incorrect formatting [-Werror=format]
cc1: all warnings being treated as errors
$ ./gcc-with-python -Werror=format script.py input.c
input.c:25:1: error: incorrect formatting [-Werror=format]
cc1: some warnings being treated as errors

If the given warning is disabled, the warning will not be printed:

$ ./gcc-with-python -Wno-format script.py input.c

The function returns a boolean, indicating whether or not anything was actually printed.

gcc.error(location, message)

Emits a compiler error at the given gcc.Location.

For example:

gcc.error(func.start, 'something bad was detected')

would lead to this error being printed to stderr:

$ ./gcc-with-python script.py input.c
input.c:25:1: error: something bad was detected
gcc.permerror(loc, str)

This is a wrapper around GCC’s permerror function.

Expects an instance of gcc.Location (not None) and a string

Emit a “permissive” error at that location, intended for things that really ought to be errors, but might be present in legacy code.

In theory it’s suppressable using “-fpermissive” at the GCC command line (which turns it into a warning), but this only seems to be legal for C++ source files.

Returns True if the warning was actually printed, False otherwise

gcc.inform(loc, str)

This is a wrapper around GCC’s inform function.

Expects an instance of gcc.Location (not None) and a string

Emit an informational message at that location.

For example:

gcc.inform(stmt.loc, 'this is where X was defined')

would lead to this informational message being printed to stderr:

$ ./gcc-with-python script.py input.c
input.c:23:3: note: this is where X was defined

Global data access

gcc.get_variables()

Get all variables in this compilation unit as a list of gcc.Variable

gccutils.get_variables_as_dict()

Get a dictionary of all variables, where the keys are the variable names (as strings), and the values are instances of gcc.Variable

gcc.maybe_get_identifier(str)

Get the gcc.IdentifierNode with this name, if it exists, otherwise None. (However, after the front-end has run, the identifier node may no longer point at anything useful to you; see gccutils.get_global_typedef() for an example of working around this)

gcc.get_translation_units()

Get a list of all gcc.TranslationUnitDecl for the compilation units within this invocation of GCC (that’s “source code files” for the layperson).

class gcc.TranslationUnitDecl

Subclass of gcc.Tree representing a compilation unit

block

The gcc.Block representing global scope within this source file.

language

The source language of this translation unit, as a string (e.g. “GNU C”)

gccutils.get_global_typedef(name)

Given a string name, look for a C/C++ typedef in global scope with that name, returning it as a gcc.TypeDecl, or None if it wasn’t found

gccutils.get_global_vardecl_by_name(name)

Given a string name, look for a C/C++ variable in global scope with that name, returning it as a gcc.VarDecl, or None if it wasn’t found

gccutils.get_field_by_name(decl, name)

Given one of a gcc.RecordType, gcc.UnionType, or gcc.QualUnionType, along with a string name, look for a field with that name within the given struct or union, returning it as a gcc.FieldDecl, or None if it wasn’t found

Working with source code

gccutils.get_src_for_loc(loc)

Given a gcc.Location, get the source line as a string (without trailing whitespace or newlines)

class gcc.Location

Wrapper around GCC’s location_t, representing a location within the source code. Use gccutils.get_src_for_loc() to get at the line of actual source code.

The output from __repr__ looks like this:

gcc.Location(file='./src/test.c', line=42)

The output from__str__ looks like this:

./src/test.c:42
file

(string) Name of the source file (or header file)

line

(int) Line number within source file (starting at 1, not 0)

column

(int) Column number within source file (starting at 1, not 0)