Check website for broken link with Python

less than 1 minute read

The LinkChecker Python 3 program is an effective offline or online method to recursively check websites from the command line.

Install

The PyPi releases are out of date so instead of the usual

pip install linkchecker

we recommend using the development Linkchecker code

git clone --depth 1 https://github.com/linkchecker/linkchecker/

cd linkchecker

python -m pip install -e .

Internal/external links are tested recursively. This example is for a Jekyll website running on my laptop:

linkchecker --check-extern http://localhost:4000

The checking process takes several minutes, perhaps even 20-30 minutes, depending on your website size (number of pages & links). Pipe to a file as below if you want to save the result (recommended).

Examples

  • list options for recursion depth, format output and much more:
    linkchecker -h
    
  • save the output to a text file
    linkchecker --check-extern http://localhost:4000 &> check.log
    

    monitor progress with

    tail -f check.log
    

Notes

  • LinkChecker is broken on Ubuntu 17.10 only, from the system apt install linkchecker. --check-extern gives a lot of errors:

    LinkChecker internal error, over and out

    which seem to be outdated references in Python 2.7. This is fixed in Ubuntu 18.04 (or by using the install method recommended at the top of this article).

Categories:

Updated:

Leave a Comment