Tools and tips for documenting scientific Python projects
Tue 22 December 2015 by Emmanuelle GouillartA colleague recently asked me for tips for the documentation of his new in-house Python project. Dozens of webpages explain why and how to write documentation, including for Python packages. Nevertheless, scientific projects have some specificities, so I thought it could be worth sharing here some points that we discussed together. Don't think that these tips only apply to "big projects": as long as you expect other people than yourself to use the code (or to develop it), good documentation can be a game-changer no matter the scale of the project.
A first and counter-intuitive point is that code matters as much as "real documentation". A clear, concise and well-thought API will make your users love your project much more that pages of explanations attempting to make up for a cluttered or clumsy API. As for developers, short functions, or variable names used consistently over the package, are more efficient than commenting through the code. As Pieter Hintjens puts it, "Documenting code is like writing "Tasty!" on the side of a coffee cup. If the code isn't readable on a grey Monday morning before coffee, chuck it out and start again. What you document are APIs (...). That is fine. Explaining what this funky loop does is not fine."
A second useful tip is to look at what others are doing, which often means using standard tools. Most Python projects use Sphinx to generate their documentation from plain-text files. Just use it. Using a standard tool will save you a lot of time, and will prevent you from reinventing the square wheel. For example, Sphinx generates automatically pages documenting your API. For Scientific Python packages, another gold standard is the NumPy docstring standard (see a typical example below). As a scientific Python user, do you often read docstrings of functions from the standard Python library and wish they would follow NumPy's guidelines, such as listing parameters and providing examples? Your users too will appreciate the clear and comprehensive docstrings if you use the NumPy standard. Moreover, they will already be used to the standard.
A typical NumPy docstring: all that you need... and nothing else!
np.zeros(shape, dtype=float, order='C')
Return a new array of given shape and type, filled with zeros.
Parameters
----------
shape : int or sequence of ints
Shape of the new array, e.g., ``(2, 3)`` or ``2``.
dtype : data-type, optional
The desired data-type for the array, e.g., `numpy.int8`. Default is
`numpy.float64`.
order : {'C', 'F'}, optional
Whether to store multidimensional data in C- or Fortran-contiguous
(row- or column-wise) order in memory.
Returns
-------
out : ndarray
Array of zeros with the given shape, dtype, and order.
See Also
--------
zeros_like : Return an array of zeros with shape and type of input.
ones_like : Return an array of ones with shape and type of input.
empty_like : Return an empty array with shape and type of input.
ones : Return a new array setting values to one.
empty : Return a new uninitialized array.
Examples
--------
>>> np.zeros(5)
array([ 0., 0., 0., 0., 0.])
>>> np.zeros((5,), dtype=numpy.int)
array([0, 0, 0, 0, 0])
>>> np.zeros((2, 1))
array([[ 0.],
[ 0.]])
A picture being worth a thousand words, some scientific Python packages have a graphical gallery of examples, with graphical thumbnails that link to scripts generating the thumbnails. Visualization packages such as Matplotlib and Mayavi were among the first packages to propose such a gallery, then came other packages such as scikit-learn, scikit-image, or seaborn. Although some applications seem better suited than others to graphical visualizations, meaningful visualizations are important for all. And once again, users will love it: in scikit-image online documentation, gallery examples represent 55% or page visits as of last month. The good news is you can now include a gallery of examples in your project at minimal cost, thanks to the sphinx-gallery project: put together a couple of Python scripts generating matplotlib figures, and sphinx-gallery will transform these into a nice gallery for your project. No excuses any more for not having a beautiful gallery showcasing what your package can do for users!
My final point could be coined as "don't expect users to be smart". You don't want to be smart when you're looking for documentation. In particular, you might want redundancy in your documentation (cross-linking between gallery examples, docstrings, tutorials, etc.), to be sure that users find the information they are looking for.
If you have other recommendations for a good documentation, please share them in the comments!