# Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL

Related: Anaconda Accelerate: GPU from Python/Numba

In distinction to the Benchmarks Game, which uses deep expert optimizations to exploit every advantage of each language, the benchmarks I’ve adapted from the Julia micro-benchmarks are done in the way a general scientist or engineer competent in the language, but not an advanced expert in the language would write them. Emphasis is on keeping the benchmarks written with priority on simplicity and length, where programmer time is far more important than CPU time.

A prime purpose of the benchmark is to say, given ease of programming for a canonical task (say Mandelbrot), which languages are very much better/worse than other languages.

## Key language benchmarking takeaways

### Matrix Multiplication:

Fortran is **comparable** to Python with MKL, Matlab, Julia.

If you can use single-precision float, Python Cuda can be 1000+ times faster than Python, Matlab, Julia, and Fortran.

### Iteration:

It’s worthwhile to use Numba or Cython with Python, to get Fortran-like speeds from Python, ~5 times faster than Matlab at given test.

### Harris IDL

(used only by astronomers?) is ridiculously slow compared to other modern computing languages, including GDL, the free open-source IDL-compatible program.

## Language Benchmarking Prereq

```
apt install libblas-dev gfortran julia
```

And install Anaconda Python then:

```
conda install mkl accelerate
```

## Language Benchmark Systems tested

### Intel Ivy Bridge desktop PC, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

### Lenovo W541 with Quadro K1100M GPU, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

model name : Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz

### Dell Optiplex Core 2 Duo, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz

## Matrix Operations Benchmark

This test multiplies two matrices that are too large to fit in CPU cache, so it is a test of system RAM bandwidth as well.

Task: Matrix Multiply a 5000 x 5000 array by another 5000 x 5000 array each comprised of random double-precision 64-bit float numbers.

Results: in milliseconds, best time to compute the result

CPU/GPU/GCC | Gfortran (matmul) | Gfortran (dgemm) | Gfortran (sgemm) | Ifort 14 (matmul) | Ifort 14 (dgemm) | Python 3.5 (MKL) | Matlab R2015a (MKL) | Julia 0.4.2 | IDL 8.4 | GDL 0.9.6 | Python 3.5 (Cuda 7.5) |
---|---|---|---|---|---|---|---|---|---|---|---|

i7-3770 | 2147 | 2147 | 18967 | 2352 | 2420 | 2394 | 2161 | 0.261 | |||

W541/K1100M | 1717 | 960 | 1160 | 0.161 | |||||||

E7500 | 2147 | 86211 | 3368 |

- Python CUDA via Anaconda Accelerate (formerly NumbaPro): Note that Python CUDA scaled O(N^0.5), while Python MKL scaled O(N^2.8) or so.
- Wow! GDL is so much faster than IDL at matrix multiplication.

See the README.rst for procedure

## Python/Numba vs. Matlab vs. Julia: iterative algorithm

Task: Iterate one million times, computing x = 0.5*x + mod(i,10).

Results: in milliseconds, best time to compute the result

Gfortran 5.3 | Ifort 14 | Python 3.5 MKL | Py35, Numba 0.23.0, MKL | Matlab R2015a MKL | Julia 0.4.2 |
---|---|---|---|---|---|

4.53 | 6.20 | 349 | 8.13 | 46.3 | 34 |

## Leave a Comment