Check website for broken link with Python

less than 1 minute read

The LinkChecker Python program has been an effective offline method to recursively check websites from the command line.

Install

Linux

LinkChecker is available in Debian and Ubuntu 18.04/16.04 from

apt install linkchecker

Mac/Windows

  1. get the LinkChecker master code (release 9.3 is broken for current python-requests versions) and prereq
    git clone https://github.com/wummel/linkchecker
    
  2. install needs Python 2.7, Python 3 is not yet supported
    python -m pip install -e .
    

Internal/external links are tested recursively. This example is for a Jekyll website running on my laptop:

linkchecker --check-extern http://localhost:4000

The checking process takes 5-10 minutes depending on your website size (number of pages & links). Pipe to a file as below if you want to save the result.

Examples

  • list options for recursion depth, format output and much more:
    linkchecker -h
    
  • save the output to a text file
    linkchecker --check-extern http://localhost:4000 &> check.log
    

Notes

  • LinkChecker is broken on Ubuntu 17.10 only. –check-extern` gives a lot of errors:

    LinkChecker internal error, over and out

    which seem to be outdated references in Python 2.7. This is fixed in Ubuntu 18.04.

Categories:

Updated:

Leave a Comment