What if a computer program revealed what people want to read, even down to the punctuation? It could tell the likelihood of any given book becoming a bestseller. It could tell whether a given book had been written by a man or a woman. It could even tell who wrote it, as long as there was a large enough sample of prior work. And for those of us who are neither novelists nor publishers, it could tell us something important about our culture.
Reader, it exists. After four years of work, Jodie Archer, a former acquisitions editor, and Matthew Jockers, an academic specializing in computational analysis of style, have been able to “predict” which books were bestsellers and which were not with “an average accuracy of 80 percent.” This means that, out of a randomly selected group of 50 bestsellers and 50 non-bestsellers, the algorithm would predict 40 of each correctly.
The authors are curiously secretive about what books went into deriving their algorithm, since the precise mix shouldn’t matter much if they are finding universal traits. They built a collection of “just under 5,000 books,” including “a diverse mixture of non-bestselling ebooks and traditional published novels, and just over 500 New York Times bestsellers.” (Note that the algorithm is going to reveal the tastes of the American reader, not all English-language readers, much less readers of other languages.) And we should prepare to have some assumptions challenged. There’s a prejudice among many readers of esoteric fare that bestsellers are badly written, escapist, and driven by cringe-making sex and implausible plot turns. But the results of the authors’ program suggest that sex doesn’t sell but realism—of a sort—does, and that bestsellers are carefully, even masterfully, crafted, down to the level of the individual sentence.
As to escapism, Americans’ idea of that means inhabiting somebody else’s job. Work is a riveting topic. The authors don’t explore this in detail, but those jobs tend to be emergency-room doctor or fiery litigator, not insurance analyst or dental hygienist. Other favored topics are “intimate conversation” and “human closeness.” Television caught on to this interest in work and talk long ago: Think of the Mary Tyler Moore Show, Friends, Seinfeld. But many so-called serious novelists avoid the world of work, unless it’s university teaching, presumably due to lack of experience.
The list of turnoffs is revealing as well: Fantasy, science fiction, revolutions, dinner parties, very dressed-up women, and dancing, as well as “the body described in any terms other than in pain or at a crime scene.” Sex, drugs, and rock ‘n’ roll account for less than 1 percent of bestsellers’ content; sex sells only in a niche market. All in all, the no-dancing world of English-language bestsellers is one in which the Puritans of early New England would have been surprisingly comfortable.
Bestselling characters are American go-getters. They “need” rather than “want,” they “know, control, and display their agency. Their verbs are clean and self-assured. Characters in bestsellers more often grab and do, think and ask, look and hold. . . . [T]hey make things happen.” Characters in non-bestsellers are more apt to “murmur, protest, and hesitate.” The verb “do” appears often in bestsellers, “very” not so much. “Okay” and “ugh” are common. This frontier vigor extends even to titles: “ ’The’ remains the most successful way to begin a title because it is a word that implies agency focused somewhere.”
As to structure, focus and simplicity work: “To get to 40 percent of the average novel, a bestseller uses only four topics.” One of these should be something many people fear: an accident, illness, or involvement in a lawsuit. And oddly enough, despite such relentless practicality, 9 of 10 recent debut novels that became instant bestsellers were written by women.
The authors are given to the adjective “winning,” as in “winning style,” “winning over readers,” and “winning prose.” They don’t like “long-winded syntax” and “the endless sentences of some classic writers who will write for three paragraphs without a period point.” Yet people still buy James Joyce and Henry James, and despite our apparent lack of interest in characters who hesitate, people still buy and go to see Hamlet.
I have a methodological quibble, too: Many of these books are also bestsellers in European translation, where the syntactic elements wouldn’t have the same weight. “His” and “her,” say, wouldn’t indicate much in languages where all nouns are gendered. So how important is syntax as opposed to plot and character?
We may have to wait for the sequel to find out.
Ann Marlowe is a writer in New York.