Balance Python productivity with package longevity using directed dependency graphs.

Python packages in general should minimize the size of their directed dependency graph for best package longevity with minimum maintenance effort. Some companies and institutions take this to great excess (to their detriment) by refusing to use modules as common as Numpy! Their arguments include the lack of semantic versioning for Numpy as prima facie evidence of an unstable API, which works against archival software versioning. However, a strictly minimized dependency graph implies minimum code reuse, which can drastically increase initial development effort.

In a realistic world, for best effectiveness, that is, the least use of programmer time to achieve a publishable result, we reuse code wherever appropriate, including Numpy and numerous user modules. A peril of user modules is instability causing longevity challenges in terms of maintenance effort. Unstable APIs, particularly those without semantic versioning present a real challenge to operations. To quantify the problem to a optimum point requires the use of a directed dependency graph.

In general, Python packages have direct dependencies and indirect dependencies.

Direct dependencies
imported by the .py files comprising a user package.
Indirect requirements
imported by the modules the user imported. Example: import pandas also imports numpy, and will fail if numpy is not installed.

We need to know the direct and indirect dependencies when creating long term archival versions of software. This is perhaps best done by pip freeze, but this provides no direct sense of module hierarchy. The techniques described below provide a detailed, zoomable hierarchical view of Python module dependencies.

Dependency analysis

Typically we wish to know which Python modules are imported (besides system/stdlib modules), divided into:

  • user modules (user/community created)
  • established modules (very widely used and supported, e.g. Numpy, Xarray, AstroPy)

The problem of finding these is compounded by existing tools requiring modules to be installed before you know what they depend on. That is, setup.py is recursively executed for each module and only after that do you know what modules are needed.

Another problem is that install_requires in setup.py may either specify too many packages (more than strictly required to pass the CI unit test), or not enough. The latter case would lead to self-test failure, or if the coverage of the self-test is inadequate, the incompletely specified prerequisites are only noticed by the end user trying to use basic functionality. Currently, pipdeptree is the most practical solution for Python 2/3 packages. This method assumes that the self-test has adequate coverage to be meaningful for most users, and that the package has put packages only used as convenience methods for some users in extras_require of setup.py. The strictly necessary modules must be in install_requires of setup.py. Best practice is to specify python_requires in setup.py to advise users of the minimum required Python version.

The process below is targeted for packages used in “development mode” that is, not installed into site-packages, except for a link back to the code directory.

pipdeptree

  1. For convenience, create for future reuse a file ~/pydeptree.py containing:
    #!/usr/bin/env python
    
    from pipdeptree import main
       
    main()
    

    We create this file because pipdeptree is a command-line argument oriented program.

  2. In the Python package directory, create a new Python virtual environment, since pipdeptree depends on having only the analyzed package and its dependencies installed.
    virtualenv testdep
    . testdep/bin/activate
    
    python -m pip install pipdeptree[graphviz]
    
  3. install the package you wish to examine (and whatever dependencies it automatically installs)
    python -m pip install -e .
    
  4. Make a hierarchical dependency graph by running pydeptree.py
    python ~/pydeptree.py
    

This should be a very short tree (unless you are testing with a big package).
Try it with a simple package you’ve made, seeing if the dependency list matches what you expect from setup.py install_requires.

Directed Dependency Graph

Now you’re ready to create the directed dependency graph for the package.

python ~/pydeptree.py --graph-output svg > dep.svg

View the SVG in your web browser or image viewer software (e.g. IrfanView).

One-click Python dependency graph

Wrap up the previous discussion and scripts in this Bash script pydeptree.sh:

#!/bin/bash

set -e

[[ ! -z $1 ]] && cd $1

virtualenv testdep     # it's OK if it already exists
. testdep/bin/activate

python -m pip install pipdeptree[graphviz]

python -m pip install -e .[tests]  # extras_require={'tests':['nose','coveralls']}

python ~/code/pybashutils/pydeptree.py --graph-output svg > dep.svg

nosetests --exe

. deactivate

eog dep.svg&  # whatever your favorite image viewing program is

This is in my pybashutils Github repo.

Notes

Other dependency graph modules

These modules are not yet ready to use in my opinion due to the deficiencies noted in each section. Hence, they are included for reference. I would welcome corrections and your experiences.

Modulegraph

Note: to make Modulegraph useful, the output must be post-processed, as almost all of the output is system stdlib modules.

Modulegraph is an established, maintained tool for creating a .dot dependency graph. It lists extremely verbose output. It’s necessary to post-process .dot output with pydot to make use of modulegraph output. What if we instead preemptively excluded from a list of known stdlib modules, removing say 98% of modulegraph output from the start?

python -m pip install modulegraph

Examine a file’s requirements, creating a .dot graph.

python -mmodulegraph file.py -q -g > graph.dot
dot -Tsvg graph.dot > graph.svg

Modulegraph command line options

Snakefood

Snakefood is in maintenance mode. Snakefood is Python 2 only. There was a pull request to Python 3, but it was not yet incorporated.

python -m pip install hg+https://bitbucket.org/blais/snakefood

Categories:

Updated:

Leave a Comment