Fresh off the press from NIPS 2012 is a paper by Vintch et al on a convolutional model of V1. Unsurprisingly, the convolutional model (or as they call it, the subunit model) beats the crap out of an STC-based model since it captures the same basic features of the RFs in a much lower dimensional fashion than the STC model. Indeed a model of this class by yours truly has been leading the Neural Prediction Challenge for a couple years.

The main issue with convolutional models is that they’re a bitch to fit. Vintch et al. solve the problem by using what they call convolutional spike-triggered covariance. The idea is to form an augmented stimulus matrix which contains all the values within each possible window underlying the convolution. The augmented stimulus matrix is itself point-wise multiplied by a guessed spatial kernel corresponding to the receptive field envelope. Then STC is run on that, and the lowest and highest eigenvalue-associated eigenvectors are used as initialization for the filters of the convolutional mode – they use two kernels only for the convolutional net.

It’s an interesting technique, and it seems like it’s the right idea, but it’s really ad hoc. I would have liked to see it fleshed out, or at least see some simulations to understand under what circumstances it picks decent initial filters. I’m sure the usual suspects will take a stab at it.

I think convolutional models are the way to go to explain early and intermediate visual areas, but there’s a couple of unsolved issues:

- How do you regularize the solutions? Granted, the models are lower dimensional than STC, but they do state that they have “only” 1200 parameters. I’ve been experimenting with a modified version of boosting for the parameters of the kernels, but it’s very computationally intensive — takes a day to fit 5 minutes of data. L1 seems like a pain as well.
- How do you speed up the fitting process so that you can evaluate a lot of different model variants? The augmented data matrices are huge so fitting the kernels is very expensive. I was wondering whether it was worth it to keep the convolution matrix implicit and implement the matrix-vector products via convn or fftn. At the very least it would take less memory so that you could parallelize the inference.
- What about a nonlinearity at the very start of the cascade? Is that important? By an appropriate choice of input and intermediate nonlinearity, you can get AND or OR-like integration. Been working on an NLNLN model of this type this summer, but have yet to analyse the results (*damn you NY/Montreal girls!*).

Anyways, happy fun times in RF estimation land.