There’s a neat paper on the psychophysics of scene and object recognition in super-low resolution scenarios in Visual Neuroscience by A. Torralba (2009). The author sought to answer a rather interesting question: what image resolution is needed to support scene and object recognition? He took images from databases and created several different versions of them, differing in resolution from 4×4 (!) to 128×128 pixels.
In the first experiments observers were asked to identify a scene (“bedroom”, “beach”, “forest”, etc.) based on such images. Even at the lowest resolution (4×4), people were frequently above chance. At 16×16, 75% accuracy was reached for outdoor scenes. To keep things in perspective, 16×16 pixels is the size of a favicon, the tiny icon that is used to visually identify a website in a browser; it sits to the left of the address bar (for this site, it’s a big black X on a white background). For indoor environments higher resolution was required to reach this accuracy, yet even at 16×16 many scenes were clearly recognizable. Here’s an example:
Did you figure it out yet? If you saw a bedroom, you are correct. To be clear, the image was created by downsampling the original image to 16×16 and then presented after upsampling through interpolation. It contains the same info as a 16×16 image without the distracting appearance of square pixels. Here’s a tougher one:
That’s a car in front of an office building (street scene). Now, as I mentioned, indoor images need higher resolution to be categorized. Part of the reason is that color is much more informative about scene identity in outdoor scenes. Green is usually associated with forest environments while blue and beige are associated with beaches. Outdoor scenes tend to contain less surfaces, as well.
Okay, so 16×16 or 32×32 seems like pretty low resolution to identify a scene, but what about identifying an object. If a scene has been identified, it seems likely that at least a few objects have been recognized correctly within it. Yet at these resolutions the objects must be tiny! In a second set of experiments, observers were asked to both identify a scene and the objects within it. Here’s an example of how a psychophysical observer tagged a bedroom at 16×16:
That’s 6 objects right there in a measly 256 pixels. The area of the headboard is probably not more than 30 pixels, and yet it can still be recognized! Presented alone, of course, such an object is unrecognizable; context furnishes the additional information. Here’s a dramatic example of this:
What is that? Context gives the answer:
It’s a sink, and it was correctly tagged by an observer despite being only about 8×4 pixels. Of course if you know already that the scene is a bathroom, prior information tells you that there is probably a sink in there. But the observers here were asked both to classify the scene and tag its objects. So the scene must be classified despite its objects being unrecognizable, and the objects somehow tagged despite the scene identity (and thus context) being unavailable. An explanation involving a hierarchical Bayesian scheme is almost too tempting. A few relevant papers — Yuille & Kersten (2006), Bengio & LeCun (2007), Rao & Ballard (1999).
In any case, pretty thought provoking, and a zippy read.
Torralba A (2009). How many pixels make an image? Visual neuroscience, 26 (1), 123-31 PMID: 19216820