# NumPy uses the wrong BLAS libraries – solution

I’ve been transferring some code from Matlab to NumPy and my first tests on my Ubuntu 12.10 installation were very disappointing – slow as molasses. Specifically, numpy.dot was taking forever. I figured this might have something to do with BLAS libraries, a subject which is completely beyond mere mortals to understand.

### Is something wrong with BLAS config?

Following some comments on StackOverflow, I wrote a little script to compare the performance of numpy.dot to that of a similar dot product computed through the s/cgemm function in the BLAS library. Here’s the code:

```import ctypes
from ctypes import byref, c_char, c_int, c_float, c_double
import numpy as np
import cProfile

def lldot(A, B):
"""lldot(A,B) performs the matrix multiplication A*B using low-level (blas lib) functions"""
if A.shape != B.shape:
raise Exception("Incompatible array dimensions")

if A.flags.c_contiguous:
tA = c_char('t')
lda = c_int(A.shape)
else:
tA = c_char('n')
lda = c_int(A.shape)

if B.flags.c_contiguous:
tB = c_char('t')
ldb = c_int(B.shape)
else:
tB = c_char('n')
ldb = c_int(B.shape)

M = c_int(A.shape)
N = c_int(B.shape)
K = c_int(B.shape)

ldc = c_int(A.shape)

#upconvert what needs to be upconverted
if A.dtype != np.float32 and A.dtype != np.float64:
print "A1"
A = A.astype(np.float64)
if B.dtype != np.float32 and B.dtype != np.float64:
print "A2"
B = B.astype(np.float64)

#common stuff
if A.dtype != B.dtype:
print "A3"
A = A.astype(np.float64)
B = B.astype(np.float64)

C = np.zeros((A.shape,B.shape),dtype=A.dtype,order='F')

if A.dtype == np.float64:
thefun = _blaslib.dgemm_
one = c_double(1.0)
zero = c_double(0.0)
else:
thefun = _blaslib.sgemm_
one = c_float(1.0)
zero = c_float(0.0)

thefun(byref(tA), byref(tB), byref(M), byref(N), byref(K), byref(one),
A.ctypes.data_as(ctypes.c_void_p), byref(lda),
B.ctypes.data_as(ctypes.c_void_p), byref(ldb), byref(zero),
C.ctypes.data_as(ctypes.c_void_p), byref(ldc))

return C

n = 1000
A = np.asfortranarray(np.random.randn(n,n).astype(np.float64))
B = np.asfortranarray(np.random.randn(n,n).astype(np.float64))
C1 = np.dot(A,B)
C2 =  lldot(A,B)

cProfile.run('np.dot(A,B)')
cProfile.run('lldot(A,B)')
print 'Max discrepancy: %e'% abs(C1-C2).max()```

In my case, the second version using libblas directly was fully 10 times slower than np.dot! Clearly, there was something wrong with the BLAS configuration in NumPy.

### Which BLAS library is numpy using?

Running numpy.show_config() should give information about how NumPy is configured, for instance:

```blas_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
lapack_info:
libraries = ['lapack']
library_dirs = ['/usr/lib']
language = f77
NOT AVAILABLE
blas_opt_info:
libraries = ['blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
NOT AVAILABLE
lapack_opt_info:
libraries = ['lapack', 'blas']
library_dirs = ['/usr/lib']
language = f77
define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE
```

In my case, it referenced ATLAS rather than BLAS. In theory, ATLAS should be faster, but for whatever reason, there was something off about my ATLAS installation.

### Removing ATLAS and reinstall SciPy

The solution was pretty straightforward: uninstall ATLAS via:

```sudo apt-get remove libatlas*

```

Then reinstall SciPy via PIP:

```sudo pip install scipy --upgrade

```

Ran the test script again, and voilà! The timing is now similar for both calls.

### Alternatives

These might be useful (via twitter):