Python Tests for Jupyter Notebooks

4 min readJan 6, 2021

Table of Contents

· Part 1: Jupyter Notebooks Needs Tests
· Part 2: What is nbval?
· Part 3: How to Use nbval?
· Part 4: Summary
· Parts 5: Codes Demonstration
· Reference

Part 1: Jupyter Notebooks Needs Tests

Jupyter notebooks provide the ability to combine descriptive text (including LaTeX-formatted equations) together with code segments and their outputs in the same notebook document. The code segments can be executed interactively, and the output from the execution is inserted automatically into the document.

As the project grows, it may become infeasible to manually inspect and re-execute all notebooks to ensure they are still working and produce the same results as before.

What is needed in these situations is a tool that can alert the developer of any notebooks which contain broken code or produce different results when re-executed. Ideally, the alert is raised as soon as possible after the change has occurred.

The Jupyter notebooks which were produced during prototyping and subsequent explorations provide a natural suite of system/acceptance tests for the underlying code base.

Part 2: What is nbval?

The requirement for nbval is the ability to automatically re-execute an existing notebook with a modified version of the underlying code base and verify that nothing is broken and that the outputs produced are still the same as before.

We desired a tool that helps to keep Jupyter notebook based documents up-to-date, and that allows to read the combination of a code cell and the stored output as a regression test: can that same output be recomputed from the input? This should be done automatically, so that for each notebook cell, we have a PASS or FAIL outcome. This can then be integrated into existing unit test frameworks, and allow us to (i) use existing notebooks as automatic tests, and (ii) to check if existing documentation notebooks are still up-to-date. The tool NoteBook VALidate (nbval) has been developed to fullfil these requirements.

Part 3: How to Use nbval?

To re-use existing functionality as much as possible, we have developed the current version of nbval as a plugin to the pytest tool. Pytest can scan subdirectory trees for files that match particular patterns. With the nbval plugin activated, pytest will find files with the extension .ipynb, and validate each of these notebooks as the following shows:

$ py.test --nbval -v 03-data-types-structures.ipynb============================= test session starts =========platform darwin -- Python 3.8.0, pytest-5.3.1, ...plugins: nbval-0.9.3collected 137 items03-data-types-structures::ipynb::Cell 0 PASSED       [  0%]03-data-types-structures::ipynb::Cell 1 PASSED       [  1%]03-data-types-structures::ipynb::Cell 2 PASSED       [  2%]...03-data-types-structures::ipynb::Cell 136 PASSED     [100%]===================== 137 passed in 6.08s =================

We use nbval by running pytest with the — nbval or — nbval-lax flag (the difference is described below). This causes pytest to collect files with a .ipynb extension and pass them to nbval, in addition to finding and running more conventional Python tests.

--nbval-lax is used as a relaxed mode. In this mode, it will collect notebooks and run cells, failing if there is an error. This mode does not check the output of cells unless they are marked with a special #NBVAL_CHECK_OUTPUT
--nbval is used as a strict mode. The IPython Notebook Validation plugin will collect and test notebook cells, comparing their outputs with those saved in the file
CELL MARKERS is used to enable or disable checking for the particular cell. Adding a tag nbval-check-output to a cell tells nbval to check its output in relaxed mode, while nbval-ignore-output disables output checking in strict mode. nbval-skip will skip over a code cell entirely, and nbval-raises-exception or raises-exception will ignore an error from executing a cell, which would normally be shown as a failure.
By default, only text output is compared, but nbval can integrate with nbdime to display rich comparisons when outputs differ. If this option ( — nbdime) is used, image outputs (PNG and JPEG formats) are also compared.

Part 4: Summary

nbval is a plugin to pytest, which allows to check that the output saved in the past in a notebook file is consistent with output computed today. Use cases include reproducible science, checks that deployed software behaves as its documentation suggests.

Deployment of nbval in continuous integration automates this process, and notebooks can serve as additional system tests and provide additional test coverage. We note that unit, integration and system tests can be written using Jupyter notebooks; which reduces the effort of formulating the test statement. Notebooks in combination with nbval and continuous integration can be used to develop documentation and tests as part of the design exploration and implementation phase in an agile manner.

Parts 5: Codes Demonstration

Key Cell Marks Summary

#NBVAL_IGNORE_OUTPUT

#NBVAL_IGNORE_OUTPUT
import numpy as np
print('This is not going to be tested')
print(np.random.randint(1, 20000))

# NBVAL_CHECK_OUTPUT

# NBVAL_CHECK_OUTPUT
print("This will be tested even in case of relaxed testing")
print(6 * 7)

#NBVAL_SKIP

# NBVAL_SKIP
print("Entering infinite loop...")
while True:
    pass

#NBVAL_RAISES_EXCEPTION

# NBVAL_RAISES_EXCEPTION
print("This exception will be tested")
raise RuntimeError("Foo")

--sanitize-with my_sanitize_file

Reference

Blog

Testing with Jupyter notebooks: NoteBook VALidation (nbval) plug-in for pytest

Paper

Testing with Jupyter notebooks: NoteBook VALidation (nbval) plug-in for pytest

Codes

IPython Notebook Validation for py.test — Documentation