Gamers care a lot about framerate. You will find endless threads on the internet where people argue that 60 (or even 120!) frames per second is better than 30, and any game that performs below that threshold is sacrilege.
Why is 24 fps good enough for cinema, but the Oculus Rift need 90? What’s the maximal framerate that humans can perceive?
This is a surprisingly complex question, and the answer involves both the interaction of the visual system with properties of the medium that does the presentation. So what is the right refresh rate for a movie, a game, or a VR display? Let’s break this down into three questions:
- At what frequency should a display refresh so that it doesn’t appear to flicker?
- At what frequency should a movie or video game be displayed so that it doesn’t appear choppy?
- (Added June 2016) What framerate should a VR display use?
Let’s look at them in turn.
The first question is straightforward. It’s been known for more than a hundred years that light flickering at high frequency appears stable. It’s an important phenomenon in everyday life, as many display and lighting technologies work by displaying very brief flashes of light several times a second. This includes the incadescent light bulb, fluorescent lights and cathode ray tubes (CRTs, the bulky glass tubes in old TVs).
It’s easy enough to measure the critical flicker fusion frequency in human observers by asking them to report their sensations when viewing a stimulus flickering on a CRT display with fast phosphors.
The critical fusion frequency depends on the luminance of the stimulus and its size, as shown in the graph above (Hecht and Smith, 1936). For a large, high luminance stimulus covering the fovea, like a full screen white field on a CRT, flicker fusion occurs at about 60 Hz.
It’s interesting to note that there are cells in the LGN of primates (lateral geniculate nucleus, a relay between the eye and the brain through which the visual signal is forwarded) which respond to higher temporal frequencies than 60 Hz and are more sensitive than human observers to flicker (Spekreijse et al. 1971). That means that the signal is available but somehow it isn’t available to consciousness. Cells in the visual cortex appear to discard the high-frequency information. It is possible, however, that such signals reach the brain through various other means (blindsight or miscellaneous projections to non-visual areas). Indeed, the 120 Hz flicker of older fluorescent lights has been found to cause cognitive deficits and headaches (Veitch and McColl 1995).
It’s also interesting to note that many insects have much faster visual systems than us. They have poor visual acuity, however, because they have so few photoreceptors — hence their vision is much more focused on the analysis of motion than us. Ruck (1961) finds a critical fusion frequency of more than 200 Hz in the housefly, and cites a figure of 300 Hz for the honeybee. Indeed, you need special hardware to do visual stimulation for insect vision research (oscilloscopes were used in the old days, although this might have changed).
Smoothness in video games and movies
Flicker fusion is not especially relevant when you see a movie or play a video game. A typical movie is shot at 24 Hz, yet that doesn’t mean that it will feel flickery when played back. Flicker fusion would result if you flashed each image with a fast display like a CRT. You can get around flicker fusion in a movie by presenting it at a higher frame rate, repeating each frame several times, or simply presenting it on a display with built-in persistence like an LCD screen.
Each frame in a movie is only slightly different than the last — this is what enables compression algorithms like MPEG4. This is a very different different situation than in the flicker fusion setup, where each frame is black, then white, then black, etc. The modulations in luminance are much smaller in a movie. Hence fusion occurs at much lower framerates than in the flicker fusion scenario, as is visible in the graph of flicker fusion frequency versus illuminance above.
Importantly, however, flicker can occur from a different source. If you look at the image of a DLP projector, and move your gaze outside the screen, you will notice fringes of color. This also occurs if you wave your hand in front of the projector, as shown above. In a single-chip DLP projectors, the DLP chip produces black and white images, and the colors red, green and blue are created by a spinning color wheel. Wikipedia states that the wheel of common projectors can rotate at 4 or 5x the frame rate of the signal, so 240 or 300 Hz. This is a very high rate, yet clearly you’re able to discriminate such large temporal frequencies.
Between each flash of the projector’s image, your eyes can move up to a third of a degree of visual angle (assuming a peak saccadic velocity of 300 degrees per second, 300 Hz framerate, and 3 colors). Hence, slightly offset images in red, green and blue are captured on the retina, which you perceive as fringes of color.
What’s happening is that temporal frequencies are translated to spatial frequencies through motion. So even if you can’t discriminate flicker through its temporal aspect, adding motion will allow it to be discovered through its spatial signature. In fact, the visual system appears to exploit this effect to enhance its sensitivity to high spatial frequencies – fixational eye movements shift the spatio-temporal frequency spectrum of natural scenes towards frequencies neurons are more sensitive to (Rucci et al., SFN 2011)
Temporal aliasing is the main reason why videogames can look choppy at frame rates of 24 or 30 Hz. Let’s say a line moves from the left of the screen to the right. Its path can be represented as a two-dimensional image where one dimension represents its x position and the other represents time. In the graph shown below at the left, the line moves slowly, and its retinal image, which integrates over several frames, appear smooth with a bit of motion blur.
On the right, the same line is now moving faster. Now the retinal image shows a faded trail of the path of the line; this will be interpreted as choppy when viewed by a human observer. The problem is that a video game video rendering pipeline generates instantaneous snapshots of the video game world at a set sampling frequency (say 30Hz). But the Nyquist-Shannon sampling theorem implies that if stuff happens in the virtual world at higher temporal frequencies than half of the sampling rate (the Nyquist frequency), the result will be temporal aliasing, the creation of artifacts caused by high temporal frequencies interactions with a slow sampler.
Note that aliasing has little to do with the visual system; it’s a signal processing effect, and it occurs in many modalities other than time. In the spatial domain, aliasing can causes artifacts like Moire patterns. The image above shows the corresponding effect when a brick wall with spatial frequencies above the sampling frequency is sampled without first eliminating high frequencies through an analog filter — note the interference pattern at the bottom right.
The solution is to sample the video game world at a higher temporal frequency, then downsample before display. In the image domain, the corresponding trick is to render the image at, say 4x the target display rate, then pool the image in regions of 4×4 before display. This is known as anti-aliasing.
Cinema cameras apply their own analog anti-aliasing by integrating photons over the entire period the camera shutter is open (usually 1/24th of a second). This means that frames per second in video games and video cameras are not comparable.
It’s possible to fake spatio-temporal anti-aliasing by combining the current image with a spatially blurred version of previous frames. From a signal processing standpoint, what is done is using filters to match the spatio-temporal frequency content of the aliased image sequence and an ideally sampled version of this same sequence.
So then what’s the minimum frame rate at which a video game should be rendered to ensure that it doesn’t suffer from jitter or choppiness? There’s no maximal rendering framerate above which aliasing effects are guaranteed to be eliminated. Once again, it has little to do with the human visual system and everything to do with the Nyquist-Shannon sampling theorem. You can always cook up a hypothetical situation which demands arbitrarily large numbers of frames per second to be viewed without artifacts.
And the relevant question is really whether the choppiness bothers you or not, rather than whether it’s visible. Claypool, Claypool and Damaa (2006) report that performance saturates in a first-person shooter game at about 30 fps (above; notice that confidence intervals overlap between 30 and 60 fps). So these non-experts seemed to be equally good at 30 vs. 60 fps.
Of course, it might be different for an expert player – if you think that 60fps makes you a better player, by all means, go buy the latest GPU. But I haven’t heard a lot of people make the claim that anything above 60 fps is worth it – except, very importantly, for virtual reality (VR).
Cues in motion: VR displays
To orient ourselves in the world, we use a mix of vision, vestibular cues, and proprioception. Neurons in area MST of the visual cortex are exquisitely sensitive to matches or mismatches between vestibular cues and visual cues.
When different modalities of the input are contradictory, the effect is often a strong perception of self-motion. When there’s a mismatch between disparity and motion cues, for example, you get a feeling of vertigo. When there’s a mismatch between vestibular input and motion cues, like at the top of roller coaster, you get a feeling of disorientation.
MST neurons control, with very low latency, visual reflexes which cause movement of the eyes and the head the VOR. The problem is that under contradictory input, these reflexes can create counterproductive self-motion which only serves to exacerbate the problem. This causes motion sickness.
VR is particularly prone to this issue because unlike in traditional video game, you’re moving and therefore generating your own visual motion. You also get disparity cues from the two screens, which is another potential source of mismatch that the brain tries to deal with. This one-two punch – delayed motion and disparity cues – causes nausea.
In fact, the Oculus Rift refreshed at 90fps – driving such a screen is hard for an entry-level GPU, especially when you have to render two screens at one! Yet doing so will definitely increase your comfort level with the technology. So if you’re going to get a Rift – shell out the money out for the computer to go with it!
Hecht S, & Smith EL (1936). Intermittent stimulation by light : VI. area and the relation between critical frequency and intensity. The Journal of general physiology, 19 (6), 979-89 PMID: 19872977
Spekreijse H, van Norren D, & van den Berg TJ (1971). Flicker responses in monkey lateral geniculate nucleus and human perception of flicker. Proceedings of the National Academy of Sciences of the United States of America, 68 (11), 2802-5 PMID: 5001396
Veitch, J., & McColl, S. (1995). Modulation of fluorescent light: Flicker rate and light source effects on visual performance and visual comfort. Lighting Research and Technology, 27 (4), 243-256 DOI: 10.1177/14771535950270040301
Ruck, P. (1961). Photoreceptor Cell Response and Flicker Fusion Frequency in the Compound Eye of the Fly, Lucilia sericata (Meigen). Biological Bulletin, 120 (3) DOI: 10.2307/1539540
M. Rucci, J. D. Victor, A. Casile, X. Kuang (2011). Fixational eye movements enhance sensitivity to high spatial frequencies in the retina and LGN. SFN 2011
Mark Claypool, Kajal Claypool, & Feissal Damaab (2006). The Effects of Frame Rate and Resolution on Users Playing First Person Shooter Games. Proceedings of SPIE