Python directed dependency graphs

4 minute read

Related: use setup.cfg to completely specify Python package prereqs


Since 2016, setup.cfg can completely specify Python package prerequisites. Proper use of setup.cfg can reduce setup.py to a one-liner for many Python packages. This helps security by allowing extremely fast recursive machine-parsing of prerequisites without installing packages first. Everyone should be specifying Python package prerequisites in setup.cfg alone–not in setup.py. The approach described below handles obsolete Python packages that do not completely describe their prerequisites in setup.cfg.

Enhance reliability with setup.cfg

Python packages should minimize the size of their directed dependency graph for best package longevity with minimum maintenance effort. However, the most effective use of programmer/scientist/engineer time generally comes from reusing code wherever appropriate. How do we evaluate quality of prereqs? We should expect modern Python code to include these factors:

Long term archiving of Python software requires direct and indirect dependencies. This is commonly done by pip freeze, but provides no direct sense of module hierarchy. The techniques described below provide a detailed, zoomable hierarchical view of Python module dependencies.

Python dependency analysis

Obsolete Python package using setup.py to specify package prerequisites generally require modules to be installed to determine their dependencies. That is, setup.py is recursively executed for each module to determine what modules are needed overall. This is bad for automated security analysis, which is slowed greatly by needing to install packages that may themselves create inadvertant security risks with poor quality setup.py. Modern Python packages solve this problem by specifying the entire package configuration in setup.cfg, and setup.py remains as a one-liner for most cases:

from setuptools import setup; setup()

Even with setup.cfg, problems can arise where install_requires specifies too many packages (more than strictly required to pass the CI unit test), or not enough (test fails). Proper use of CI will usually resolve these issues before end users see them.

Solution

Currently, pipdeptree is the most practical solution working simultaneously for:

  • obsolete Python 2/3 packages that use setup.py to specify prereqs.
  • modern packages using only setup.cfg

This method assumes:

  • self-test has adequate coverage to be meaningful for most users
  • packages only used as convenience methods for some users are under [options.extras_require] in setup.cfg.
  • strictly necessary modules are specified
  • minimum Python version is specified
  • CI-only requirements are specified

a partial setup.cfg is:

[options]
python_requires = >= 3.6
install_requires =
  prereq1
  prereq2
  
[options.extras_require]
tests = 
  pytest
  pytest-cov
  coveralls
  flake8
  mypy

An example setup.cfg is complete for many Python packages.

The process below is targeted for packages used in “development mode” that is, not installed into site-packages, except for a link back to the code directory.

pipdeptree

  1. install prereqs
    python -m pip install virtualenv
    
  2. In the Python package directory, create a new Python virtual environment, since pipdeptree depends on having only the analyzed package and its dependencies installed.
    virtualenv testdep
    . testdep/bin/activate
    
    pip install pipdeptree[graphviz]
    
  3. install the package you wish to examine (and whatever dependencies it automatically installs)
    pip install -e .
    
  4. Make a hierarchical dependency graph
    pipdeptree
    

This should be a very short tree (unless you are testing with a big package).
Try it with a simple package you’ve made, seeing if the dependency list matches what you expect from setup.cfg.

Directed Dependency Graph

Now you’re ready to create the directed dependency graph for the package. Install GraphViz by

  • Linux: apt install graphviz
  • Mac: brew install graphviz
  • Windows

and then:

pipdeptree --graph-output svg > dep.svg

View the SVG in your web browser or image viewer software (e.g. IrfanView).

One-click Python dependency graph

Wrap up the previous discussion and scripts in this Bash script pydeptree.sh:

#!/bin/bash

set -e

[[ ! -z $1 ]] && cd $1

virtualenv testdep     # it's OK if it already exists

. testdep/bin/activate

python -m pip install pipdeptree[graphviz]

python -m pip install -e .[tests]

pipdeptree --graph-output svg > dep.svg

. deactivate

eog dep.svg &  # whatever your favorite image viewing program is

Notes

Other dependency graph modules

These modules are not yet ready to use in my opinion due to the deficiencies noted in each section. Hence, they are included for reference.

Modulegraph

Note: to make Modulegraph useful, the output must be post-processed, as almost all of the output is system stdlib modules.

Modulegraph is an established, maintained tool for creating a .dot dependency graph. It lists extremely verbose output. It’s necessary to post-process .dot output with pydot to make use of modulegraph output. What if we instead preemptively excluded from a list of known stdlib modules, removing say 98% of modulegraph output from the start?

python -m pip install modulegraph

Examine a file’s requirements, creating a .dot graph.

python -mmodulegraph file.py -q -g > graph.dot
dot -Tsvg graph.dot > graph.svg

Modulegraph command line options

Snakefood

Snakefood is in maintenance mode. Snakefood is Python 2 only. There was a pull request to Python 3, but it was not yet incorporated.

python -m pip install hg+https://bitbucket.org/blais/snakefood

Leave a comment