Optimizing GLM hyperparameters through the evidence

I wrote earlier about a recent paper by the Pillow lab which uses priors optimized through the evidence (aka marginal likelihood) to estimate spatially and frequency-localized receptive fields. It seems that evidence optimization might be seeing something of a revival as a technique for estimating model hyperparameters. I just posted an update on my GLM with quadratic penalties package to use this technique.

Specifically, we can assume that the prior in a GLM is given by N(0,Q^{-1}), where the precision Q is given by:

Q = Q_0 + \sum_i \lambda_i Q_i

Then the hyperparameters \lambda_i are selected by optimizing the evidence of the model via trust-region Newton, following a Laplace approximation of the posterior (Bishop, Chapters 3-5). One application of this is in models where the parameters are organized into two or more dimensions (say, x and y or space and time). Then, it’s natural to add a penalty for the smoothness of the parameters along one dimension and a second for the other dimension. What you get is a form of Automatic Smoothness Determination (ASD). Here’s an example of applying this in a logistic regression model where the filter is a 2D Gabor:

The precision is sufficiently flexible to accommodate interesting model structure. For example, in the poster that Theo and I presented at SFN, I used it to track functional connectivity as a function of time. Functional connectivity can be estimated by using the firing patterns of other neurons as inputs to a GLM whose output is the target neuron. Each functional connection is defined by 3 parameters, corresponding to Laguerre basis functions. Only a handful of possible connections will turn out to be significant; therefore, it’s natural to impose a penalty on the magnitude of each group of 3 parameters. Now if in addition we let these parameters change over time, it’s also possible to add one smoothness penalty per group of 3 parameters. In total, we had 28 cells, hence 27 inputs for a given cell; thus, Q_1 through Q_{27} were used to impose group sparseness, and Q_{28} through Q_{54} were used for temporal smoothness.

You can download the software here.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s