# Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL

Related: Anaconda Accelerate: GPU from Python/Numba

The Benchmarks Game uses deep expert optimizations to exploit every advantage of each language. The benchmarks I’ve adapted from the Julia micro-benchmarks are done in the way a general scientist or engineer competent in the language, but not an advanced expert in the language would write them. Emphasis is on keeping the benchmarks written with priority on simplicity and length, where programmer time is far more important than CPU time. Jules Kouatchou runs benchmarks on massive clusters comparing Julia, Python, Fortran, etc. A prime purpose of these benchmarks is given ease of programming for a canonical task (say Mandelbrot), which languages are very much better/worse than other languages.

## Julia does not have overwhelmingly compelling advantages vs. Python

What’s striking from my laptop/desktop benchmarks and Jules’ HPCC benchmarks,
as well as other benchmarks seen around the web is that for a wide range of problems,
Julia does *not* have a consistent significant advantage over Python.
The majority of analysts in astronomy/geospace/geoscience/aerospace/etc. are working in Python, with Matlab in second place.
Yes of course C and Fortran are right up there too in usage.

Python often is “close enough” in performance to compiled languages like Fortran and C, by virtue of numeric libraries Numpy, Numba and the like.
For particular tasks, Tensorflow, OpenCV, and directly loading Fortran libraries with `f2py`

or `ctypes`

minimizes Python’s performance penalty.
This was not the case when Julia was conceived in 2009 and first released in 2012.
Thanks to Anaconda, Intel MKL and PyCUDA, momentum and performance are solidly behind Python for scientific and engineering computing for the next several years at least.

### Julia’s downsides

Almost immediately, Julia’s unstable Core API (even after a decade of development) is strikingly noticable.
This bodes ill will for serious, comprehensive adoption by corporate users, particuarly in the sciences and aerospace who need to keep programs running on multi-decade scales.
Rapid deprecation makes Julia and associated packages like a rolling release system almost, which is not suitable for many use cases.
Python can freeze-in versions using `requirements.txt`

and similar means, and worst case a Docker VM can be recalled in long future times to recall prior work output.
Perhaps such capability arises in Julia as well, but the window of change is so short that it’s hard to keep up.
Typically Python’s stable versions are maintained for a few years, not less than a year like Juli

### Julia’s strengths

Julia allows abstract expression of formulas, ideas, and arrays in ways not feasible in other major analysis applications. This allows advanced analysts unique, performant capabilities with Julia. Since Julia is readily called from Python, Julia work can be exploited from more popular packags, provided the Julia API is stable enough.

## Key language benchmarking takeaways

### Matrix Multiplication:

Fortran is **comparable** to Python with MKL, Matlab, Julia.

If you can use single-precision float, Python Cuda can be 1000+ times faster than Python, Matlab, Julia, and Fortran.

### Iteration:

It’s worthwhile to use Numba or Cython with Python, to get Fortran-like speeds from Python, ~5 times faster than Matlab at given test.

### Harris IDL

(used only by astronomers?) is very slow compared to other modern computing languages, including GDL, the free open-source IDL-compatible program.

## Language Benchmarking Prereq

- compilers/libraries
`apt install libblas-dev gfortran`

- install Miniconda Python and Julia
- install Python packages
`conda install mkl accelerate`

## Language Benchmark Systems tested

### HPC

A small supercomputing node.

### Intel Ivy Bridge desktop PC, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

### Lenovo W541 with Quadro K1100M GPU, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

model name : Intel(R) Core(TM) i7-4910MQ CPU @ 2.90GHz

### Dell Optiplex Core 2 Duo, Ubuntu 14.04

```
cat /proc/cpuinfo | grep 'model name' | uniq
```

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz

## Matrix Operations Benchmark

This test multiplies two matrices that are too large to fit in CPU cache, so it is a test of system RAM bandwidth as well.

Task: Matrix Multiply a 5000 x 5000 array by another 5000 x 5000 array each comprised of random double-precision 64-bit float numbers.

- Note: the HPC test was with 1000 x 1000 matrices

Results: in milliseconds, best time to compute the result

CPU/GPU | Gfortran (matmul) | Gfortran (dgemm) | Gfortran (sgemm) | Ifort 14 (matmul) | Ifort 14 (dgemm) | Python 3.6 (MKL) | Matlab R2018a (MKL) | Julia 0.6.2 | IDL 8.6 | GDL 0.9.6 | Python 3.5 (Cuda 7.5) |
---|---|---|---|---|---|---|---|---|---|---|---|

HPC | 723 | 47.3 | 54.8 | 64.4 | 62.2 | ||||||

i7-3770 | 2147 | 2147 | 18967 | 2352 | 2420 | 2394 | 2161 | 0.261 | |||

W541/K1100M | 1717 | 960 | 1160 | 0.161 | |||||||

E7500 | 2147 | 3368 |

- Python CUDA via Anaconda Accelerate (formerly NumbaPro): Note that Python CUDA scaled O(N^0.5), while Python MKL scaled O(N^2.8) or so.
- Wow! GDL is so much faster than IDL at matrix multiplication.

See the README.rst for procedure

## iterative algorithm

Task: compute PI: N=1e6

Results: in milliseconds, best time to compute the result

CPU | GCC 6.2 | Gfortran 6.2 | Ifort 18 | Python 3.6 MKL | Py36, Numba 0.36 MKL | Cython 0.27 | Matlab R2018a MKL | Julia 0.6.2 | IDL 8.6 |
---|---|---|---|---|---|---|---|---|---|

HPC | 46.9 | 49.1 | 46.2 | 544 | 46.5 | 28.2 | 31.4 | 58.3 | 272 |

Task: ”simple” N=1e6

CPU | GCC 6.2 | Gfortran 6.2 | Ifort 18 | Python 3.6 MKL | Py36, Numba 0.36 MKL | Matlab R2018a MKL | Julia 0.6.2 |
---|---|---|---|---|---|---|---|

HPC | 13.8 | 8.8 | 544 | 46.5 | 16.5 | 2.3 |

## Leave a Comment