Mu-Ming Poo delivered a talk at the MNI a few years back on his work on spike-timing dependent plasticity (STDP). He mentioned that to get good ideas for new experiments, you should start by reading old rather than current literature (in his case, the works of Donald Hebb). His reasoning was that the questions raised by new literature are often being tackled currently by the investigators of the original studies, whereas some of the questions in the old literature have gone out of fashion, often for no better reason than the lack of appropriate methods to tackle them in those days.
For that same reason, reading works concerning neuroscience written by non-neuroscientists is great food for thought. I have been avidly reading Action in Perception by Alva Noë, an associate professor of philosophy at UC Berkeley. A lot of the book is nonsense, but the author weaves legit neuroscience with deep questions about life, the universe, and everything in the way that makes the book truly thought-provoking. Here are some notes on the first two chapters of the book, followed by some thoughts on interesting vision research items relevant to the author’s thesis.
Perception as action
The main idea of this book, as the author states in the first page, is that perceiving is a way of acting. The author frequently refers to the distinction between touch and vision. Touch is an active sense. To perceive an object through touch, for instance a bottle, requires one to actively move his hands, and thus tactile perception is entangled with a motor component to the degree of being inseparable.
Vision, on the other hand, is often thought of as a passive sense (by laymen, at least). Of course, vision is in reality an active sense. In addition to the importance of eye movements in foveated animals, gaze control, bodily movements, and the active process of attention all make vision contingent on active volitional and non-volitional processes. Noë says that “Only through self-movement can one test and so learn the relevant patterns of sensorimotor dependence”.
This reframes vision from an open-loop inference process (in the scheme of David Marr, creating a 2.5 sketch from 2D image, say) to a closed-loop inference process in which the animal is free to acquire extra information about the environment to resolve ambiguities or make more informed decisions. Or, to put this in machine learning language, vision is not a process of supervised or unsupervised learning but rather one of active or reinforcement learning. In the real world, as opposed to the controlled visual stimulation schemes of the lab, the animal always has the option to acquire more data by moving or waiting. An example of this is the resolution of ambiguities brought about by occlusion: “By tracing movements back, you can bring an occluded surface back into view. In perceptual activity the perceiver is thus able to differentiate mere occlusion from obliteration”. And later: “the information available to an active animal greatly outstrips the information available to a static retina”.
The author goes on to discuss some of the more radical aspects of Gibson’s ideas on the ecological approach to vision: “He [Gibson] argued that just as there is a fit between an animal and the environmental niche it occupies, thanks to the coevolution of animal and niche, so there is a tight perceptual attunement between animal and niche. Because of this attunement, animals are directly sensitive to the features of the world that afford the animal opportunities for action (What Gibson calls affordances).” Gibson’s next on my reading list.
An excellent example of this process (not discussed in the book) is the active process of parallax perception by head-bobbing used by certain invertebrates (mentioned in this fascinating article on jumping spiders). Closer to us hat-wearing primates, the author discusses the implications of experiments with inverting glasses (those that exchange left for right and vice-versa via prisms placed over the eyes). Apparently these glasses cause a type of experiental blindness by disrupting the link between action and perception. You can imagine a corollary discharge or efference copy system modulating visual perception so that expected motions due to locomotion are cancelled. Something along those lines seems to be happen in area MT, where fast motion due to saccades is partially compensated for if the saccade is actually made, rather than simulated though on-screen motion (Thiele et al. 2002).
The qualia of visual experience living with inverting glasses is apparently a real mindfuck. Here’s a citation from Kohler (1951), citing his patient K:
During visual fixations, every movement of my head gives rise to the most unexpected and peculiar transformations of objects in the visual field. The most familiar figures seem to dissolve and reintegrate in ways never before seen. At times, parts of figures run together, the spaces between disappearing from view: at other times, they run apart, as if intent on deceiving the observer. Countless times I was fooled by these extreme distoritions and taken by surprise when a wall, for instance, suddenly appeared to slant down the road, when a truck I was following with my eyes started to bend, when the road began to each like a wave, when houses and trees seemed to topple down, and so forth. I felt as if I was living in a topsy-turvy world of houses crashing down on you, of heaving roads, and of jellylike people.
As you can easily imagine, I immediately ordered some inverting glasses online (unfortunately I could only find up-down inverting glasses rather than left-right). The cool thing about the glasses is that somehow, after a while, the brain adapts to them, and one can go about the inverted world more or less normally (though, for instance, the ability to read at a normal speed seems to be lost).
Noë makes an argument against a generative perspective of visual perception. Basically the argument goes like this: you don’t need a detailed internal representation of the world around you when the world is a perfectly appropriate model for itself. Some consequences of this argument I buy somewhat. Noë takes the view that perceptual filling in at the level of the blind spot is not an act of adding something to the representation (true in-filling) but failure to notice what is outside of our capabilities of perception (much like we don’t notice whatever’s outside of our field of view). I really need to brush up on the blind spot literature to figure out if this is indeed physiologically correct; perhaps readers can add to this. The extreme version of his argument however implies a kind of visual solipsism that I think is probably (and perhaps provably) wrong; ya know, babies and object permanence and whatnot.
The view that internal models are less relevant in active vision than in passive vision I think makes some amount of sense, however. For instance, figure-ground segregation, which is often thought of as very difficult and has been the subject of many a rambling 100-page paper in the 90’s, is probably not that difficult in the real world because it’s cheaper to move and let parallax solve the problem for you.
I can’t help but cite (totally out of context, by the way) Noë’s accidental pervasive argument as to why we should do vision research rather than sitting at home speculating about these things: “we are deluded as to the character of our visual experience”. Actually, we’re deluded about a lot of things, eheh (Nisbett and Wilson, 1977).
Relevance to vision research
As a vision scientist, I can’t help but speculate that there must be motor signals from broader sources than eye movements being integrated and processed in the dorsal visual stream. We know that area MST processes visual ego-motion, that it receives vestibular input and corollary discharges from eye movements. Thus, it seems entirely plausible that MST (or perhaps V7A, LIP, VIP, and all those other areas of the dorsal visual stream where we have no idea what’s going on) receives input concerning upcoming gaze changes due to whole body movements. Conversely, there’s the intriguing fact that M1 has cells which are apparently selective for visual patterns of expansion. Thus, dorsal and motor cortex are probably more entangled in their processing than current research suggests, and the author consequently uses the term “how pathway” rather than “where pathway” to designate the dorsal visual stream.
On a vaguely related note, it reminds me that the story of the ventral visual stream is all-in-all more coherent than that of the dorsal visual stream. In the ventral stream, there’s a consensus that object recognition is accomplished by the consistent application of selectivity and invariance operations on earlier levels in a hierarchically structured fashion (that is, the center-surround -> edges/lines -> combos of lines -> object parts -> OJ cell idea). By comparison, the dorsal visual stream is a real mess. It follows the basic plan of the ventral visual stream from V1 to MT to MST, but after that somehow in the parietal lobe is stops being about motion in a strict sense but rather about eye movements and attention and a bunch of other stuff.
Meanwhile, we know very little about how gait, locomotion and gaze changes are processed in the dorsal stream, even though these things are clearly very important and informative. Giese and Poggio (2003) speculate that the dorsal visual stream is organized in a way similar to the ventral visual stream, and that gait processing is the dorsal analogue of object and facial recognition in the ventral stream. This idea appeals to me, but it really doesn’t solve the all-important question of why the dorsal stream participates in gaze control and attention. Perhaps the dorsal visual stream is hierarchical only to a certain degree, and at some point the signal diverges in parallel pathways? I honestly don’t have the flippest idea.