Theano: numerical computation in Python

Theano is a very interesting numeric library for Python that I covered briefly a few years ago. Coming from the machine learning group at Université de Montréal – i.e. Yoshua Bengio et al. – it is well adapted to the kinds of numerical tasks that frequently occur in machine learning problems, in particular deep neural nets. I recently tried it again and found that its debugging and error diagnostic features had been sufficiently improved to make it practical in real-world applications.

It combines several paradigms for numerical computations into a coherent whole, namely:

  • matrix algebra operations, in the style of Matlab and Numpy
  • symbolic variable and function definitions, in the style of Mathematica or Maple
  • optimizing, Just-In-Time compilation to CPU or GPU machine code

Mixing symbolic and numeric concepts is a very powerful construct indeed. To give you a flavour of what Theano is like, let me give you an example of a building a graph which computes the error in a logistic regression:

import theano.tensor as T
from theano.tensor.nnet import sigmoid
import numpy as np
import theano

#define variables
X = T.matrix(name='X')
y = T.vector(name='y')
w = T.vector(name='w')

#Forward data
eta = X.dot(w)
mu = sigmoid(eta)
E = -(y*T.log(mu) + (1-y)*T.log(1-mu)).sum()

So far the codes doesn’t look too dissimilar to something that you would write in Matlab or NumPy. However, the variables X, y and w are symbolic variables: all the subsequently defined variables are also symbolic. That means that E, for example, is not a scalar value: you need to define a function and fill in the values to actually get a result. For example:

Efun = theano.function([X,w,y],E,allow_input_downcast=True)

#Eval on actual data
X_ = np.random.randn(1000,100)
w_ = np.random.randn(100)
y_ = np.random.rand(1000)>.5

error_val = Efun(X_,w_,y_)
print error_val

That might seem like an unnecessary extra step, but therein lies the power of Theano: since it has a representation of the whole expression graph for the Efun function, it can optimize it and run it on both CPU and GPU.

Even more impressive is that it can compute symbolic functions of the graph, in particular gradients:

#Compute the gradient of E w.r.t. w the old fashioned way
g1 = (y-mu).dot(X)

#Compute it with the power of symbolic evaluation
g2 = T.grad(E,w)

gfun = theano.function([X,w,y],[g1,g2],allow_input_downcast=True)
g1_,g2_ = gfun(X_,w_,y_)

print g1_
print g2_

Same thing! Of course this is a trivial example where computing the gradient is straightforward enough. In more general cases, however, gradient computation is a lot less trivial; symbolic evaluation means you can explore complex model architectures without worrying about whether you’ve correctly computed the gradient.

That’s just a flavor of what Theano can do. Look at the tutorials on Theano itself or example applications in the context of deep neural nets.

Leave a comment