Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL

Related: Anaconda Accelerate: GPU from Python/Numba


In distinction to the Benchmarks Game, which uses deep expert optimizations to exploit every advantage of each language, the benchmarks I’ve adapted from the Julia micro-benchmarks are done in the way a general scientist or engineer competent in the language, but not an advanced expert in the language would write them. Emphasis is on keeping the benchmarks written with priority on simplicity and length, where programmer time is far more important than CPU time.

A prime purpose of the benchmark is to say, given ease of programming for a canonical task (say Mandelbrot), which languages are very much better/worse than other languages.

Key language benchmarking takeaways

Matrix Multiplication:

Fortran is comparable to Python with MKL, Matlab, Julia.

If you can use single-precision float, Python Cuda can be 1000+ times faster than Python, Matlab, Julia, and Fortran.

Iteration:

It’s worthwhile to use Numba or Cython with Python, to get Fortran-like speeds from Python, ~5 times faster than Matlab at given test.

Harris IDL

(used only by astronomers?) is ridiculously slow compared to other modern computing languages, including GDL, the free open-source IDL-compatible program.

Language Benchmarking Prereq

apt install libblas-dev gfortran julia

And install Anaconda Python then:

conda install mkl accelerate

Language Benchmark Systems tested

Intel Ivy Bridge desktop PC, Ubuntu 14.04

cat /proc/cpuinfo | grep 'model name' | uniq

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Lenovo W541 with Quadro K1100M GPU, Ubuntu 14.04

cat /proc/cpuinfo | grep 'model name' | uniq

model name    : Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz

Dell Optiplex Core 2 Duo, Ubuntu 14.04

cat /proc/cpuinfo | grep 'model name' | uniq

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz

Matrix Operations Benchmark

This test multiplies two matrices that are too large to fit in CPU cache, so it is a test of system RAM bandwidth as well.

Task: Matrix Multiply a 5000 x 5000 array by another 5000 x 5000 array each comprised of random double-precision 64-bit float numbers.

Results: in milliseconds, best time to compute the result

CPU/GPU/GCC Gfortran (matmul) Gfortran (dgemm) Gfortran (sgemm) Ifort 14 (matmul) Ifort 14 (dgemm) Python 3.5 (MKL) Matlab R2015a (MKL) Julia 0.4.2 IDL 8.4 GDL 0.9.6 Python 3.5 (Cuda 7.5)
i7-3770  2147 2147   18967  2352 2420  2394  2161     0.261
W541/K1100M    1717 960     1160         0.161
E7500    2147             86211 3368    
  • Python CUDA via Anaconda Accelerate (formerly NumbaPro): Note that Python CUDA scaled O(N^0.5), while Python MKL scaled O(N^2.8) or so.
  • Wow! GDL is so much faster than IDL at matrix multiplication.

See the README.rst for procedure

Python/Numba vs. Matlab vs. Julia: iterative algorithm

Task: Iterate one million times, computing x = 0.5*x + mod(i,10).

Results: in milliseconds, best time to compute the result

Gfortran 5.3 Ifort 14 Python 3.5 MKL Py35, Numba 0.23.0, MKL Matlab R2015a MKL Julia 0.4.2
 4.53  6.20  349 8.13  46.3  34

Leave a Comment