Cosmic Ray: mutation testing for Python
"Four human beings -- changed by space-born cosmic rays into something more than merely human." — The Fantastic Four
Cosmic Ray is a tool for performing mutation testing on Python code.
N.B.! Cosmic Ray is still learning how to walk!
At this time Cosmic Ray is young and incomplete. It doesn't support all of the mutations it should, its output format is crude, it only supports some forms of test discovery, it may fall over on exotic modules...the list goes on and on. Still, for the adventurous it does work. Hopefully things will improve fairly rapidly.
And, of course, patches and ideas are welcome.
If you just want to get down to the business of finding and killing mutants, here's what you do:
Install Cosmic Ray
pip install cosmic_ray
Initialize a Cosmic Ray session
cosmic-ray init --baseline=10 <session name> <module name> -- <test directory>
Execute the session:
cosmic-ray exec <session name>
View the results:
cosmic-ray report <session name>
This will print out a bunch of information about what Cosmic Ray did, including what kinds of mutants were created, which were killed, and – chillingly – which survived.
A concrete example: running the
Cosmic Ray includes a number of unit tests which perform mutations against a
simple module called
adam. As a way of test driving Cosmic Ray, you can run
these tests, too, like this:
cd test_project cosmic-ray init --baseline=10 example-session adam -- tests cosmic-ray --verbose exec example-session cosmic-ray report example-session
In this case we're passing the
--verbose flag to the
exec command so that
you can see what Cosmic Ray is doing. If everything goes as expected, the
report command will report a 0% survival rate.
You can install Cosmic Ray using
pip install cosmic_ray
Or you can use the supplied
python setup.py install
Both of these approaches will install the Cosmic Ray package and
create an executable called
You'll often want to install Cosmic Ray into a virtual environment. However, you generally don't want to install it into its own. Rather, you want to install it into the virtual environment of the project you want to test. This ensures that the test runners have access to the modules they are supposed to test.
Cosmic Ray has a notion of sessions which encompass an entire mutation testing run. Essentially, a session is a database which records the work that needs to be done for a run. Then as results are available from workers that do the actual testing, the database is updated with results. By having a database like this, Cosmic Ray can safely stop in the middle of a (potentially very long) session and be restarted. Since the session knows which work is already completed, it can continue where it left off.
Sessions also allow for arbitrary post-facto analysis and report generation.
Before you can do mutation testing with Cosmic Ray, you need to first initialize
a session. You can do this using the
init command. With this command you tell
Cosmic Ray a) the name of the session, b) which module(s) you wish to mutate and
c) the location of the test suite. For example, if you've a package named
allele and if the
unittest tests for the package are all under the directory
allele_tests, you would run
cosmic-ray init like this:
cosmic-ray init --baseline=2 test_session allele -- allele_tests
You'll notice that this creates a new file called "test_session.json". This the database for your session.
There are a number of other options you can pass to the
see the help message for more details.
An important note on separating tests and production code
Cosmic Ray has a relatively simple view of how to mutate modules. Fundamentally, it will attempt to mutate any and all code in a module. This means that if you have test code in the same module as your code under test, Cosmic Ray will happily mutate the test code along with the production code. This is probably not what you want.
The best way to avoid this problem is to keep your test code in separate modules from your production code. This way you can tell Cosmic Ray precisely what to mutate.
Ideally, your test code will be in a different package from your production
code. This way you can tell Cosmic Ray to mutate an entire package without
needing to filter anything out. However, if your test code is in the same
package as your production code (a common configuration), you can use the
--exclude-modules flag of
cosmic-ray init to prevent mutation of your tests.
Given the choice, though, we recommend keeping your tests outside of the package for your code under test.
Once a session has been initialized, you can start executing tests by using the
exec command. This command just needs the name of the session you provided to
cosmic-ray exec test_session
Normally this won't produce any output unless there are errors.
Viewing the results
Once your tests have completed, you can view the results using the
cosmic-ray report test-session
This will give you detailed information about what work was done, followed by a summary of the entire session.
Originally Cosmic Ray didn't have a notion of sessions, and didn't distinguish
between initialization and execution of the tests. It did all of its work using
Recent versions of Cosmic Ray still support the
run command. All this command
does is first do an
init followed by an
exec. This can be convenient for
small test runs.
Be aware, however, that
init can destroy an existing session database! If
you've got a session database with results representing hours of execution, you
probably don't want to delete it! So be aware that using the
command have the potential to delete data.
Cosmic Ray supports multiple test runners. A test runner is simply a
plugin that supports a particular way of running tests. For example,
there is a test runner for tests written with the standard
module, and there's another for tests written using
To specify a particular test runner when running Cosmic Ray, pass the
--test-runner flag to the
init subcommand. For example, to use the
pytest runner you would use:
cosmic-ray init --test-runner=pytest test_session allele -- allele_tests
To get a list of the available test runners, use the
Test runners require information about which tests to run, flags controlling
their behavior, and so forth. Since each test runner implementation takes
different kinds of information, we allow users to pass arbitrary lists of
arguments to test runners. When running the
cosmic-ray init command,
everything after the lone
-- token is passed verbatim to the test runner
For example, the command:
cosmic-ray init --test-runner=pytest sess allele -- -x -k test_foo allele_tests
would pass the list
['-x', '-k', 'test_foo', 'allele_tests'] to the pytest
runner initializer. This plugin passes this list directly to the
function which treats them as command line arguments; in this case, it means
"exit on first failure, only running tests under 'allele_tests' which match
'test_foo'". Each test runner will accept different arguments, so see their
documentation for details on how to use them.
Specifying test timeouts
One difficulty mutation testing tools have to face is how to deal with mutations that result in infinite loops (or other pathological runtime effects). Cosmic Ray takes the simple approach of using a timeout to determine when to kill a test and consider it incompetent. That is, if a test of a mutant takes longer than the timeout, the test is killed, and the mutant is marked incompetent.
There are two ways to specify timeout values to Cosmic Ray. The first
is through the
--timeout flag for the
init subcommand. This flags
specifies an absolute number of seconds that a test will be allowed to
run. After the timeout is up, the test is killed. For example, to
specify that tests should timeout after 10 seconds, use:
cosmic-ray init --timeout=10 test_session allele -- allele/tests
The second way is by using a baseline timing. To use this technique,
--baseline argument to the
init subcommand. When Cosmic
Ray sees this flag it will make an initial run of the tests on an
un-mutated version of the module under test. The amount of time this
takes is considered the baseline timing. Then, Cosmic Ray multiplies
this baseline timing by the value of
--baseline and this final value
is used as the timeout for tests. For example, to tell Cosmic Ray to
timeout tests when they take 3 times longer than a baseline run, use:
cosmic-ray init --baseline=3 test_session allele -- allele/tests
This baseline technique is particularly useful if your testsuite runtime is in flux.
Running with a config file
For many projects you'll probably be running the same
command over and over. Instead of having to remember and retype
potentially complex commands each time, you can store
commands in a config file. You can then execute these commands by
load command to
Each line in the config file is treated as a separate command-line
cosmic-ray. Empty lines in the file are skipped, and you
can have comments in config files that start with
So, for example, if you need to invoke this command for your project:
cosmic-ray run --verbose --timeout=30 --no-local-import --baseline=2 allele -- allele/tests/unittests
you could instead create a config file,
cr-allele.conf, with these
init --verbose # this can be useful for debugging --timeout=30 # this is plenty of time --no-local-import --baseline=2 test_session allele -- allele/tests/unittests
Then to run the command in that config file:
cosmic-ray load cr-allele.conf
and it will have the same effect as running the original command.
Distributed testing with Celery
One of the main practical challenges to mutation testing is that it can take a long time. Even on moderately sized projects, you might need millions of individual mutations and test runs. This can be prohibitive to run on a single system.
One way to cope with these long runtimes is to parallelize the mutation and testing procedures. Fortunately, mutation testing is embarassingly parallel in nature, so we can apply some relatively simple techniques to get really nice scaling up of the work. We've chosen to use the Celery distributed task queue to spread work across multiple nodes.
The basic idea is very simple. Celery lets you start multiple workers which
listen for commands from a task queue. A central process creates all of the
commands for a mutation testing run, and these commands are distributed to the
workers as they become available. When a worker receives a command, it starts a
new python process (using the
worker subcommand to Cosmic Ray) which
performs a single mutation and runs the test suite.
Spawning a separate process for each test suite may seem expensive. However, it's the best way we have for ensuring that pathological mutants can't somehow corrupt the runtime of the worker processes. And ultimately the cost of starting the process is likely to be very small compared to the runtime of the test suite.
By its nature, Celery lets you start workers on as many systems as you want, all connected to the same task queue. So you could potentially have thousands of workers performing mutation testing runs, giving nearly perfect scaling! While not everyone has thousands of machines on hand to do their testing work, it's conceivable that Cosmic Ray will one day be able to work with machines on commodity cloud providers, meaning that highly-scaled mutation testing for Python will be available to anyone who wants it.
Celery is primarily a Python API atop the RabbitMQ task queue. As such, if you want to use Cosmic Ray in distributed mode you first need to install RabbitMQ and run the server. The steps for installing and running RabbitMQ are covered in detail at that project's site, so go there for more information. Make sure the RabbitMQ server is installated and running before going any further with distributed execution.
Starting distributed worker processes
Once RabbitMQ is running, you need to start some worker processes which will do the actualy mutation testing. Start one or more worker processes like this:
celery -A cosmic_ray.tasks.worker worker
You should do this, of course, from the virtual environment into which you've installed Cosmic Ray. Similary, you need to make sure that the worker is in an environment in which it can import the modules under test. Generally speaking, you can meet both of these criteria if you install Cosmic Ray into and run workers from a virtual environment into which you've installed the modules under test.
Running distributed mutation testing
After you've started your workers, the only different between local and
distributed tesing is that you need to pass
--dist to the
command to do distributed testing. So a full distributed testing run would look something like this:
cosmic-ray init --baseline=3 session-name my_module -- tests cosmic-ray exec --dist session-name cosmic-ray report session-name
Cosmic Ray has a number of test suites to help ensure that it works. The first suite is a pytest test suite that validates some if its internals. You can run that like this:
(Note that these unit tests don't require any workers to be running).
There is also a set of tests which verify the various mutation operators. These
tests comprise a specially prepared body of code,
adam.py, and a full-coverage
test-suite. The idea here is that Cosmic Ray should be 100% lethal against the
adam.py or there's a problem.
These tests can be run via both the standard
py.test. In both
cases, first make sure a worker (or several) is running. Then go to the
Run the operator tests with
unittest like this:
cosmic-ray load cosmic-ray.unittest.conf
View the results of this test with
cosmic-ray report adam_tests.unittest
You should see a 0% survival rate at the end of the report.
Likewise you can run with
py.test like this:
cosmic-ray load cosmic-ray.pytest.conf
The report will be available from the
cosmic-ray report adam_tests.pytest
Mutation testing is conceptually simple and elegant. You make certain kinds of controlled changes (mutations) to your code, and then you run your test suite over this mutated code. If your test suite fails, then we say that your tests "killed" (i.e. detected) the mutant. If the changes cause your code to simply crash, then we say the mutant is "incompetent". If your test suite passes, however, we say that the mutant has "survived".
Needless to say, we want to kill all of the mutants.
The goal of mutation testing is to verify that your test suite is actually testing all of the parts of your code that it needs to, and that it is doing so in a meaningful way. If a mutant survives your test suite, this is an indication that your test suite is not adequately checking the code that was changed. This means that either a) you need more or better tests or b) you've got code which you don't need.
Cosmic Ray works by parsing the module under test (MUT) and its
submodules into abstract syntax trees using
ast module. It
to make systematic mutations to the ASTs.
For each individual mutation, Cosmic Ray modifies the Python runtime
environment to replace the MUT with the mutated version. It then uses
unittest's "discovery" functionality
to discover your tests and run them against the mutant code.
In effect, the mutation testing algorithm is something like this:
for mod in modules_under_test: for op in mutation_operators: for site in mutation_sites(op, mod): mutant_ast = mutate_ast(op, mod, site) replace_module(mod.name, compile(mutant_ast) try: if discover_and_run_tests(): print('Oh no! The mutant survived!') else: print('The mutant was killed.') except Exception: print('The mutant was incompetent.')
Obviously this can result in a lot of tests, and it can take some time if your test suite is large and/or slow.