Well, I’m back in the old country for the holidays, and boy do I feel like the smartest pickle in the jar for moving to California when it’s below zero here in Montreal (either scale).
There is one thing I miss about the old country: French Scrabble. I’ve spent an inordinate amount of time memorizing word lists and playing the game, and now it’s a pain to find a partner in good ol’ US of A.
So I decided to start studying English Scrabble, and I figured I could use some stats to prioritize my learning*. Quackle is a Scrabble solver that has a (well-hidden) command line interface that can play games against itself. The AI agent, called ‘speedy player’, uses heuristics rather than Monte Carlo simulations to determine which move to play; it’s very fast, however, has access to the full dictionary (TWL06, the North American tournament dictionary**) and plays a kick-ass game.
I let it run for 100,000 games and, with some Python glue (pandas mostly), compiled a list of the best words by various criteria: points per play, plays per game, points per game (= points per play * plays per game). I grouped words by form (ie. axe, axes, axed, etc. count as one root word) using a dictionary I found in Zyzzyva. The most useful word in Scrabble is…
Qi. Indeed, Qi is the only two letter word that contains a Q and one of a few dozen that contains a Q but not a U. Therefore, it is played very frequently, in 7 out of every 10 games. The next 5 are:
- RE – as in do, ré, mi…
- ZA – short for pizza
- ER – an expression of hesitation
The top 50 is in fact dominated by such two-letter words, which, while often not valuable in themselves, can be used to attach other, more useful words.
It is surprising how skewed the distribution of word values is. While knowing Qi leads to a whopping +22 points per game advantage, the 101th most common word, joe, gives an expected improvement of around a single point. Thus, the most useful words are by far the short 2 word letters, followed by three letter words containing high-paying letters, followed by a very long tail of infrequently used words.
Verbs are an interesting subset of words, because they have a lot of alternate forms, and thus by learning a verb’s root you add several to your vocabulary. Here’s the top 20, which includes quite a few surprises:
- BE – including rare forms ART, WAST, WERT
- EAT – including rare form ET
- IN – to harvest: I IN, he INS, we INNED, you’re INNING
- EX – to cross out; I EX, he EXES, we EXED, you’re EXING
- AX – to work with an ax – like ex
- DO – including rare forms – DIDST, DOEST, DOST, DOTH, DOETH
- PI – to spill into disorder – PIED, PIING, PIEING, PIES
- AH and AAH – to express surprise
- OH – similar to AH
- UP – to raise
- ZAG – to turn sharply – ZAGGED, ZAGGING, etc.
- OPE – to open – OPED, OPING, OPES
- EYE – including EYING, EYEABLE
- JOW – to ring a bell
OUZO >> RAKI > ARAK >> PASTIS in Scrabble. Exactly the opposite of real life.
What about high-paying words (per play) that are used rather frequently (more than 1 out of every 1000 games)? We have:
- DISRATE – to lower in rank
- STEARIN – the solid portion of a fat
- RETSINA – a resin-flavored Greek wine
- TERTIAN – a recurrent fever
This list shows a consistent pattern – bingos created with low-paying letters (modulo the Q in antique) – letters which are highly likely to co-occur if you manage your leaves well.
This brings us to the important subject of the optimal leave, that is, which combination of letters on the rack leads to the best plays. We can repeat the same exercise and compute the average number of points on a turn given a certain rack, etc. The top 10 in terms of total points per game are:
The best leaves in terms of total points are dominated by the letters EITRNASL, forming LATRINES. If we look at the most paying leave per play, however, we find leaves dominated by the blank tile ? and high-paying letters – these leaves are infrequent, but when they can be placed for a bingo, with high-paying letters and assorted double and triples, they’re worth a ton.
Here is the full list of plays, with annotations for letter count, whether they contain high paying letters, etc. It’s a Google Fusion Table, which is a pretty nice tool for sharing large databases.
* IOW I double-nerd-sniped myself
** A new version, the TWL2014 has been announced, but it is not in electronic form yet.