I have a poster session on Sunday afternoon at SFN 2014 in DC. It’s on a spiffy new method I’ve been working on for estimating the nonlinear transformation performed by an ensemble of sensory neurons, and its application to understanding visual representation in the dorsal and ventral visual streams.
Some background: there’s a growing consensus that the point of having hierarchical, as opposed to flat, sensory systems is to permit the creation of “good representations” of sensory stimuli. The work of Nicole Rust and Jim DiCarlo, in particular, have pointed towards the idea that hierarchical computations can and do untangle high-dimensional manifolds corresponding to image identity, pose, etc. in such a way that in high-level visual cortex, decoding of behaviourally relevant variables is trivial.
The question that I tackle here is how these representations emerge from single neuron computations, i.e. the receptive fields of neurons. I introduce systems identification methods based on deep neural networks that can capture the complexity of transformations in high-level visual cortex.
A key idea that I use here is that rather than fitting a multi-layer neural network for every neuron – this would require immense amounts of data – I fit a single multi-layer neural network common to a set of neurons, which captures the set of computations leading up to a set of neurons. This is a powerful idea: we directly infer both the representation underlying the responses of multiple neurons with an explicit computational model for this representation.
In other words, whereas traditional systems identification methods reveal the computation, and decoding methods the representation, the proposed method reveals both the representation and its computational substrate.
Here I show that it is possible to capture the computations performed in two different visual areas – MT and V2. Interestingly, we find that MT and V2 use similar computational strategies to re-encode stimuli – a combination of pooling and tuned suppression. The learned representation accounts for responses not only in MT and V2 but is necessary for accounting for responses in V4 and MST.
Let me repeat that: we used V2 data from Jack Gallant’s lab collected about 7 years ago to form an silico model of how V2 represents visual stimuli. Then we used the learned representation to account for our own V4 data in a completely unrelated task. It worked on the first try. That is pretty non-trivial. Then we did the same in MT and MST. And it works. So I think that we’ve nailed an aspect of computation and representation in visual cortex that is real, non-trivial and robust.
Interestingly, the learned representation in V2 is indeed such that image identity can be decoded in an invariant fashion. I speculate that a good representation must strike a balance between invariance at the single neuron level and high dimensionality at the population level – hence the need for tuned suppression to create novel features. Simulations show encouraging results in this direction, but the full story will have to wait for the paper (in progress).
I’m pretty enthusiastic about this work overall, come and see.
Oh, and don’t forget, Tuesday night is Neurolabware party and Josh isn’t kidding about “drinks are on him”. Bring your friends.