Off-the-shelf optimization functions in Matlab

by Sam Derbyshire via Wikipedia – CC-BY-SA 3.0

Estimating a statistical model via maximum likelihood or MAP involves minimizing an error function – the negative log-likelihood or log-posterior. Generic functions built in to Matlab like fminunc and fmincon will often do the trick. There are many other free solvers available, which are often faster, or more powerful:

  • Solvers by Mark Schmidt: there’s a huge collection of functions from Mark Schmidt to solve generic constrained and unconstrained problems as well as solvers for more specific problems, e.g. L1-regularized problems. minFunc and minConf are drop-in replacements for fminunc and fmincon, and they are often much faster. Each function has several variants, so you can fiddle with the parameters until you get something that has acceptable speed.
  • CVX: the toolbox for convex optimization is not always the fastest, but it’s exceedingly powerful for constrained optimization. You use a declarative syntax to specify the problem and CVX takes care of finding a way to solve the problem using a number of backends, including SDPT3 and SeDuMi. See also the excellent companion book. YALMIP is one alternative with similar design goals.
  • L1-penalized solvers ad-nauseam: there are so many solvers available for the L1-regularized least-squares problem – aka the LASSO – it’s getting out of hand. YALL1 is notable for solving many variants of the problem: constrained, unconstrained, positive, group-Lasso, etc. Mark Schmidt‘s L1General is notable for being able to solve the general L1-penalized problem – meaning the error function is something other than the sum of squares. L1-penalized GLMs can be solved directly by L1General without resorting to IRLS.
  • Non-negative least-squares: Matlab’s lsqnonneg is slow. It’s much faster to solve the normal equations – work with the sufficient statistics X’X and X’y of the underlying generative model – when there are more observations than variables. Here’s one function that does the trick. When the design matrix X is exceedingly wide or sparse, on the other hand, this toolbox is very useful. You can use NNLS as a building block for non-negative GLMs via IRLS.

Non-rigid deformation for calcium imaging frame alignment

Oh yeah!

It’s been a while since I last updated the blog – I graduated from McGill, went on a trip to Indonesia where I did a lot of diving (above), came back to Montreal for 16 hours to gather my coffee machine and about three shirts – all I need to survive, really – then moved to sunny California to do a postdoc in Dr. Dario Ringach’s lab at UCLA. Living the dream.

We’ve been doing calcium imaging – GCamp6 – in mice via a custom-built microscope – you can read more about the microscope over at the Scanbox blog.

If you’re used to working with single electrode or even MUAs, calcium imaging will blow your freaking mind – you can get more than a hundred shiny neurons in a given field of view, so you can realistically ask and answer questions about how a local circuit works knowing that you’re not massively undersampling the population.

Inter-frame motion is an issue in awake behaving subjects. There are slow drifts and faster motions due to, e.g., grooming, which you need to correct for. Dario has a post on correcting for translation motion, which is dominant in slow drifts. During grooming, however, the motion can be fast enough that the field of view moves significantly during frame acquisition. Because the field of view is scanned from top to bottom – not captured all at once – this causes non-rigid deformations.

Specifically, vertical motion causes expansion and contraction of frames along the vertical axis, while horizontal motion causes shear. Of course, rigid motion dominates, so rigid motion compensation will be good enough for many analyses. However, if you’re interested in looking at correlations between neurons, for example, you will need to get rid of as much wide-field motion as you can. Wide-field variation will cause some parts of neurons to drop in or out of ROIs during motion, and those decreases or increases in fluorescence will tend to be correlated across ROIs, which is obviously a nightmare for correlational analysis.

There are a few ways of correcting for non-rigid motion; I had success with the method of Greenberg & Kerr (2009), which can correct for non-rigid motion with subpixel accuracy. It works on rigid-motion-corrected frames. Here’s a before and after during grooming:

rigidcomp nonrigidcomp
No compensation Only rigid compensation Non-rigid compensation

Their method is based on the Lucas-Kanade method of optic flow estimation. You have a reference frame – T for template – and another image – I – you want to align it with. You do this by finding a displacement field D such that

I(x+D_x,y+D_y) \approx T(x,y)

By itself, this is a highly underconstrained problem. You use the usual trick of constraining the displacement fields to be a linear combination of basis functions, i.e.:

D_x = D_x^i w^i

In our case, the D_x^i can be chosen to look like tent-shaped basis functions – linear B-splines – in the vertical direction and constant in the horizontal direction. This accounts for the fact that the sample is approximately stable for a small group of scan lines.

Then you solve via least-squares – ML under an assumption of Gaussian noise – for the weights of the basis function. The effect of the deformation field is first approximated via a first-order Taylor expansion. Then you use vanilla Gauss-Newton iterations until convergence.

It turns that with the choice of linear B-splines in the vertical direction a lot of the computations are quite simple, so it’s reasonably fast – alignment of a frame might take a second, and most of that time is spent doing interpolation. You can speed things up a bit by using block-wise cross-correlation to get a guesstimate of the displacements.

A final question is what to use as a template. I use a 2-pass process. First, I use the mean fluorescence image for alignment. Then, I use a local temporal average of the fluorescence as the template. During grooming, the overall activity tends to increase, and some neurons which are usually not very active light up. These neurons are hard to align with the mean fluorescence image because they don’t really appear in there. Aligning them against where they were a few frames ago is easier.

Here’s some code to get you started:

function [Id,dpx,dpy] = doLucasKanade(T,I,dpx,dpy)
    warning('off','fastBSpline:nomex');
    Nbasis = 16;
    niter = 25;
    damping = 1;
    deltacorr = .0005;

    lambda = .0001*median(T(:))^2;
    %Use a non-iterative algo to get the initial displacement

    if nargin < 3
        [dpx,dpy] = doBlockAlignment(I,T,Nbasis);
        dpx = [dpx(1);(dpx(1:end-1)+dpx(2:end))/2;dpx(end)];
        dpy = [dpy(1);(dpy(1:end-1)+dpy(2:end))/2;dpy(end)];
    end
    %dpx = zeros(Nbasis+1,1);
    %dpy = zeros(Nbasis+1,1);

    %linear b-splines
    knots = linspace(1,size(T,1),Nbasis+1);
    knots = [knots(1)-(knots(2)-knots(1)),knots,knots(end)+(knots(end)-knots(end-1))];
    spl = fastBSpline(knots,knots(1:end-2));
    B = spl.getBasis((1:size(T,1))');

    %Find optimal image warp via Lucas Kanade

    Tnorm = T(:)-mean(T(:));
    Tnorm = Tnorm/sqrt(sum(Tnorm.^2));
    B = full(B);
    c0 = mycorr(I(:),Tnorm(:));

    %theI = gpuArray(eye(Nbasis+1)*lambda);
    theI = (eye(Nbasis+1)*lambda);

    Bi = B(:,1:end-1).*B(:,2:end);
    allBs = [B.^2,Bi];

    [xi,yi] = meshgrid(1:size(T,2),1:size(T,1));

    bl = quantile(I(:),.01);

    for ii = 1:niter

        %Displaced template
        Dx = repmat((B*dpx),1,size(T,2));
        Dy = repmat((B*dpy),1,size(T,2));

        Id = interp2(I,xi+Dx,yi+Dy,'linear');

        Id(isnan(Id)) = bl;

        c = mycorr(Id(:),Tnorm(:));

        if c - c0 < deltacorr && ii > 1
            break;
        end

        c0 = c;

        %gradient of template
        dTx = (Id(:,[1,3:end,end])-Id(:,[1,1:end-2,end]))/2;
        dTy = (Id([1,3:end,end],:)-Id([1,1:end-2,end],:))/2;

        del = T(:) - Id(:);

        %special trick for g (easy)
        gx = B'*sum(reshape(del,size(dTx)).*dTx,2);
        gy = B'*sum(reshape(del,size(dTx)).*dTy,2);

        %special trick for H - harder
        Hx = constructH(allBs'*sum(dTx.^2,2),size(B,2))+theI;
        Hy = constructH(allBs'*sum(dTy.^2,2),size(B,2))+theI;

        %{
        %Compare with fast method
        dTx_s = reshape(bsxfun(@times,reshape(B,size(B,1),1,size(B,2)),dTx),[numel(dTx),size(B,2)]);
        dTy_s = reshape(bsxfun(@times,reshape(B,size(B,1),1,size(B,2)),dTy),[numel(dTx),size(B,2)]);

        Hx = (doMult(dTx_s) + theI);
        Hy = (doMult(dTy_s) + theI);
        %}

        dpx_ = Hx\gx;
        dpy_ = Hy\gy;

        dpx = dpx + damping*dpx_;
        dpy = dpy + damping*dpy_;
    end
end

function thec = mycorr(A,B)
    A = A(:) - mean(A(:));
    A = A / sqrt(sum(A.^2));
    thec = A'*B;
end

function H2 = constructH(Hd,ns)
    H2d1 = Hd(1:ns)';
    H2d2 = [Hd(ns+1:end);0]';
    H2d3 = [0;Hd(ns+1:end)]';

    H2 = spdiags([H2d2;H2d1;H2d3]',-1:1,ns,ns);
end

function [dpx,dpy] = doBlockAlignment(T,I,nblocks)
    dpx = zeros(nblocks,1);
    dpy = zeros(nblocks,1);

    dr = 10;
    blocksize = size(T,1)/nblocks;

    [xi,yi] = meshgrid(1:size(T,2),1:blocksize);
    thecen = [size(T,2)/2+1,floor(blocksize/2+1)];
    mask = (xi-thecen(1)).^2+(yi-thecen(2)).^2< dr^2;

    for ii = 1:nblocks
        dy = (ii-1)*size(T,1)/nblocks;
        rg = (1:size(T,1)/nblocks) + dy;
        T_ = T(rg,:);
        I_ = I(rg,:);
        T_ = bsxfun(@minus,T_,mean(T_,1));
        I_ = bsxfun(@minus,I_,mean(I_,1));
        dx = fftshift(ifft2(fft2(T_).*conj(fft2(I_))));
        theM = dx.*mask;

        [yy,xx] = find(theM == max(theM(:)));

        dpx(ii) = (xx-thecen(1));
        dpy(ii) = (yy-thecen(2));
    end
end

PhD thesis – Parametric Models of Visual Cortex at Multiple Scales

*Updated Sun April 6th*

Well it’s done! I successfully defended my thesis on April 3rd. I now have what I’ve been longing for all these years – an obnoxious title that I can remind people of whenever I’m about to lose an argument. Unfortunately, this only works when erguing with non-PhDs.

Entitled Parametric Models of Visual Cortex at Multiple Scales, the thesis is available here. IMHO, it’s pretty damn good. Here are the slides for my PhD defense – with narration. I’m especially thrilled about the future directions – I think we’ve got some pretty sick papers in the pipeline.