Updated March 2020
Gamers care a lot about framerate. You will find endless threads on the internet where gamers argue that 60 (or even 120!) frames per second is better than 30, and any game that performs below that threshold is sacrilege.
Why is 24 fps good enough for cinema, but the Oculus Rift need 90? What’s the maximal framerate that humans can perceive?
This is a surprisingly complex question, and the answer involves both the interaction of the visual system with properties of the medium that does the presentation. So what is the right refresh rate for a movie, a game, or a VR display? Let’s break this down into three questions:
- At what frequency should a display refresh so that it doesn’t appear to flicker?
- At what frequency should a movie or video game be displayed so that it doesn’t appear choppy?
- What framerate should a VR display use?
Let’s look at each of them in turn.
The first question is straightforward. It’s been known for more than a hundred years that light flickering at high frequency appears stable. It’s an important phenomenon in everyday life, as many display and lighting technologies work by displaying very brief flashes of light several times a second. This includes the incadescent light bulb, fluorescent lights and cathode ray tubes (CRTs, the bulky glass tubes in old TVs).
It’s straightforward to measure the critical flicker fusion frequency in human observers by asking them to report their sensations when viewing a simple stimulus flickering on a CRT display with fast phosphors. You can recreate this experiment at home on a very low budget by flashing an LED with an Arduino (DYI instructions here).
The critical fusion frequency depends on the luminance of the stimulus and its size, as shown in the graph above (Hecht and Smith, 1936). For a large, high luminance stimulus covering the fovea, like a full screen white field on a CRT, flicker fusion occurs at about 60 Hz.
It’s interesting to note that there are cells in the LGN of primates (lateral geniculate nucleus, a relay between the eye and the brain through which the visual signal is forwarded) which respond to higher temporal frequencies than 60 Hz and are more sensitive than human observers to flicker (Spekreijse et al. 1971). That means that the signal is available somewhere in the brain but it isn’t available to consciousness. Cells in the visual cortex appear to discard the high-frequency information. It is possible, however, that such signals reach the brain through various other means (blindsight or miscellaneous projections to non-visual areas). Indeed, the 120 Hz flicker of older fluorescent lights has been found to cause cognitive deficits and headaches (Veitch and McColl 1995).
Many insects have much faster visual systems than us. They have poor visual acuity, however, because they have so few photoreceptors — hence their vision is much more focused on the analysis of motion than us. Ruck (1961) finds a critical fusion frequency of more than 200 Hz in the housefly, and cites a figure of 300 Hz for the honeybee. Indeed, you need special hardware to do visual stimulation for insect vision research (oscilloscopes were used in the old days, although this might have changed).
Smoothness in video games and movies
Flicker fusion is not especially relevant when you see a movie or play a video game. A typical movie is shot at 24 Hz, yet that doesn’t mean that it will feel flickery when played back. Lack of flicker fusion would result if you flashed each image with a fast display like a CRT. You can get around flicker fusion in a movie by presenting it at a higher frame rate, repeating each frame several times, or simply presenting it on a display with built-in persistence like an LCD screen.
Each frame in a movie is only slightly different than the last — this is what enables compression algorithms like MPEG4. This is a very different situation than in the flicker fusion setup, where each frame is maximally different from the last: black, white, black, etc. The modulations in luminance are much smaller in a movie. Hence fusion occurs at much lower framerates than in the flicker fusion scenario, as is visible in the graph of flicker fusion frequency versus illuminance above.
Importantly, however, flicker can occur from a different source. If you look at the image of a DLP projector, and move your gaze outside the screen, you will notice fringes of color. This also occurs if you wave your hand in front of the projector, as shown above. In a single-chip DLP projector, the DLP chip produces black and white images, and the colors red, green and blue are created by a spinning color wheel. The wheel of common projectors can rotate at 4 or 5x the frame rate of the signal, so 240 or 300 Hz. This is a very high rate, yet clearly you’re able to discriminate such large temporal frequencies.
Between each flash of the projector’s image, your eyes can move up to a third of a degree of visual angle (assuming a peak saccadic velocity of 300 degrees per second, 300 Hz framerate, and 3 colors). Hence, slightly offset images in red, green and blue are captured on the retina, which you perceive as fringes of color.
Eye movements transform temporal frequencies into spatial frequencies. Human eyes have very good spatial frequency selectivity.
What’s happening is that temporal frequencies are translated to spatial frequencies through motion. Although you can’t discriminate flicker through its temporal aspect, adding motion will allow it to be discovered through its spatial signature. In fact, the visual system appears to exploit this effect to enhance its sensitivity to high spatial frequencies – fixational eye movements shift the spatio-temporal frequency spectrum of natural scenes towards frequencies neurons are more sensitive to (Rucci et al., SFN 2011)
Temporal aliasing is the main reason why videogames can look choppy at frame rates of 24 or 30 Hz. Let’s say a line moves from the left of the screen to the right. Its path can be represented as a two-dimensional image where one dimension represents its x position and the other represents time. In the graph shown below at the left, the line moves slowly, and its retinal image, which integrates over several frames, appear smooth with a bit of motion blur.
On the right, the same line is now moving faster. Now the retinal image shows a faded trail of the path of the line; this will be interpreted as choppy when viewed by a human observer. The problem is that a video game video rendering pipeline generates instantaneous snapshots of the video game world at a set sampling frequency (say 30Hz). But the Nyquist-Shannon sampling theorem implies that if things happens in the virtual world at higher temporal frequencies than half of the sampling rate (the Nyquist frequency), the result will be temporal aliasing, the creation of artifacts from inadequate sampling.
Note that aliasing has little to do with the visual system; it’s a signal processing effect, and it occurs in many modalities other than time. In the spatial domain, aliasing can causes artifacts like Moire patterns. The image above shows the corresponding effect when a brick wall with spatial frequencies above the sampling frequency is sampled without first eliminating high frequencies through an analog filter — note the interference pattern at the bottom right.
The solution is to sample the video game world at a higher temporal frequency, and optionally downsampling it before display. This is temporal anti-aliasing. Of course, that means more polygons to blit over time, which means you need a beefier video card. Cinema cameras apply their own analog anti-aliasing by integrating photons over the entire period the camera shutter is open (usually 1/24th of a second). This means that frames per second in video games and video cameras are not comparable.
So then what’s the minimum frame rate at which a video game should be rendered to ensure that it doesn’t suffer from jitter or choppiness? There’s no maximal rendering framerate above which aliasing effects are guaranteed to be eliminated. Once again, it has little to do with the human visual system and everything to do with the Nyquist-Shannon sampling theorem. You can always cook up a hypothetical situation which demands arbitrarily large numbers of frames per second to be viewed without artifacts.
And the relevant question is really whether the choppiness bothers you or not, rather than whether it’s visible. Claypool, Claypool and Damaa (2006) report that performance starts to saturate in a first-person shooter game at about 30 fps, with only marginal benefits at 60 fps. So these non-experts seemed to be equally good at 30 vs. 60 fps.
Of course, it might be different for an expert player – if you think that 60fps makes you a better player, by all means, go buy the latest GPU. But I haven’t heard a lot of people make the claim that anything above 60 fps is worth it – except, very importantly, for virtual reality (VR).
Cues in motion: VR displays
To navigate in the world, we use a mix of:
- Vision. Moving forward creates a lot of expansive motion in the visual field, which gets analyzed by specialized brain areas.
- Vestibular cues. The vestibule is the inner ear organ that reports the direction of gravity, acceleration and head rotation. It acts as the brain’s accelerometer and gyro.
- Proprioception. Proprioception is the sense of knowing where our body and our limbs are. Little sensors attached to our muscles tell our brains how much they’re stretched, and that helps us know where our body is in space.
Neurons in area MST of the visual cortex are exquisitely sensitive to matches or mismatches between vestibular cues and visual cues. When different modalities of the input are contradictory, the effect is often a strong perception of self-motion. MST neurons control, with very low latency, visual reflexes which cause movement of the eyes and the head the VOR. The problem is that under contradictory input, these reflexes can create counterproductive self-motion which only serves to exacerbate the problem. This causes motion sickness.
VR is particularly prone to this issue because:
- The display takes up most of the field of view. Bigger things cause a larger amount of motion discomfort.
- You’re generating your own motion through your body. Your brain knows what to expect, and when that expectation is not met, the brain is not happy, and you get motion sickness.
The discomfort can be much attenuated by keeping the motion-to-photon-latency to an absolute minimum. The motion-to-photon latency is the amount of time your own movement affects what’s displayed on the screen. This is very well-explained by Michael Abrash in this prescient article from 2012, who went on to becoming chief scientist at Oculus Research (and my skip-manager when I was an engineer at Facebook Reality Labs). You can keep the motion-to-photon latency number low in a number of ways:
- Having a fast (low-persistence, low-lag) display
- Having a high framerate. The motion-to-photon-latency is lower bound by how long a frame is displayed. At 90 frames per second, movements in the controllers right after a frame is presented will be lagged 1/90 Hz = 11 milliseconds because that’s how long a frame lasts.
- Having very low-latency motion sensors and computer vision pipelines to estimate the location of controllers and the headset. Even if your display is 1000Hz, if it takes 200 ms to estimate the location of controllers, you will feel motion sick. Bad tracking really ruins the experience.
- Having well-optimized rendering pipelines that can use the controller and head location information at the very last minute. For example, asynchronous reprojection can take a stale frame and reproject it according to the latest estimate of controller and head location to fake a frame when the GPU cannot render frames fast enough. There tricks can really help improve the perception of a smooth framerate and help minimize motion sickness, as highlighted by John Carmack.
The original Rift refreshed at 90Hz, while newer machines like the Quest are now refreshing at lower framerates of 72Hz. This doesn’t mean that the visual system runs at 72Hz or 90Hz or anything like that. What it means is that the predictions the brain does about vision, based on bodily movement, are sensitive to changes on the order of ~10-15 milliseconds. This is pretty remarkable when you think that the latency between a photon and the wave of activity in high-level cortex can run up to 200 milliseconds.
Over the years, many a reddit thread has linked to this article to make an argument for or against the necessity of high framerates for gaming. My own take is: it depends. If you’re a professional doing e-sports, every millisecond counts. For the rest of us folk playing Beat Saber on medium difficulty on the Quest, if it feels comfortable, that’s probably all that matters.
Hecht S, & Smith EL (1936). Intermittent stimulation by light : VI. area and the relation between critical frequency and intensity. The Journal of general physiology, 19 (6), 979-89 PMID: 19872977
Spekreijse H, van Norren D, & van den Berg TJ (1971). Flicker responses in monkey lateral geniculate nucleus and human perception of flicker. Proceedings of the National Academy of Sciences of the United States of America, 68 (11), 2802-5 PMID: 5001396
Veitch, J., & McColl, S. (1995). Modulation of fluorescent light: Flicker rate and light source effects on visual performance and visual comfort. Lighting Research and Technology, 27 (4), 243-256 DOI: 10.1177/14771535950270040301
Ruck, P. (1961). Photoreceptor Cell Response and Flicker Fusion Frequency in the Compound Eye of the Fly, Lucilia sericata (Meigen). Biological Bulletin, 120 (3) DOI: 10.2307/1539540
M. Rucci, J. D. Victor, A. Casile, X. Kuang (2011). Fixational eye movements enhance sensitivity to high spatial frequencies in the retina and LGN. SFN 2011
Mark Claypool, Kajal Claypool, & Feissal Damaab (2006). The Effects of Frame Rate and Resolution on Users Playing First Person Shooter Games. Proceedings of SPIE