Artificial Intelligence | From the blog of Nicholas C. Rossis, author of science fiction, the Pearseus epic fantasy series and children's booksNo one knows anything. William Goldman’s aphorism concerns Hollywood, but it’s also true of book publishing. Who would have guessed that 50 Shades would become a publishing phenomenon? Or Dan Brown’s The Da Vince Code?

As an article on Wired explains, it was the latter’s success that prompted Jodie Archer, a Penguin UK employee, to wonder what made a successful book. She was still pondering that very question when she met Matthew L. Jockers, a cofounder of the Stanford Literary Lab, whose work in text analysis had convinced him that computers could peer into books in a way that people never could.

Soon the two of them went to work on the “bestseller” problem: How could you know which books would be blockbusters and which would flop, and why?

The result of their work—detailed in The Bestseller Code, out this month—is an algorithm built to predict, with 80 percent accuracy, which novels will become mega-bestsellers.

The Ingredients of Success

So, what does their algorithm like?

  • Young, strong heroines who are also misfits (the type found in The Girl on the Train, Gone Girl, and The Girl with the Dragon Tattoo).
  • No sex, just “human closeness.”
  • Frequent use of the verb “need.”
  • Lots of contractions.
  • Not a lot of exclamation marks.
  • Dogs, yes; cats, meh.

And that’s just the tip of the iceberg: in all, the “bestseller-ometer” has identified 2,799 features strongly associated with bestsellers.

Everyone’s Doing it

Archer and Jockers are hardly alone in their efforts to crack the bestseller puzzle. Simon & Schuster hired its first data scientist last year. Macmillan Publishers has acquired the digital book publishing platform Pronoun, in part for its data and analytics capabilities. And a number of startups have created similar algorithms.

It might be easy to dismiss such endeavors but remember how much data is now available, compared to the recent past. Publishers only had unit sales to rely on. Now, Amazon knows not only how many pages you read on your Kindle but also how long it took you to do so (did you race or slog through the book?) The only problem? Much like Joye not sharing his food, Amazon doesn’t share its data.


So, publishers are turning to services like Jellybooks. They can hire Jellybooks to conduct virtual focus groups, giving readers free ebooks, often in advance of publication, in exchange for their sharing data on how much, when, and where they read. Javascript is embedded in the books, and at the end of each chapter, readers are asked to click a link that sends the data to Jellybooks.

In almost two years, the company has run tests for publishers in the US, England, and Germany, and uncovered one sobering fact: Most novels are abandoned before readers are halfway through them. Jellybooks’s findings can guide publishers on their marketing, and even whether it’s worth signing an author again.

“Hollywood moguls might do test screenings for movies to decide on how much [marketing] budget a movie should get,” says Andrew Rhomberg, the founder of Jellybooks. “That was never done for books.”


The ability to know who reads what and how fast is also driving Berlin-based startup Inkitt. The website invites writers to post their novels for all to see. Inkitt’s algorithms examine reading patterns and engagement levels. For the best performers, Inkitt offers to act as literary agent, pitching the works to traditional publishers and keeping the standard 15 percent commission if a deal results. The site went public in January 2015 and now has 80,000 stories and more than half a million readers around the world.

We’re about to find out if the approach works. Inkitt recently announced it’s partnering with Tor Books, part of Macmillan Publishers, to publish the young adult fantasy novel *Bright Star *next summer. Author Erin Swan, a 27-year-old marketing writer who lives in Spanish Fork, Utah, couldn’t get an agent or publisher’s attention when she tried the traditional route, but Inkitt dubbed Bright Star a winner—and now it’s heading to stores.

From Google Search to Amazon

Publishers are also trying out the reverse process. Callisto Media uses big-data analysis to find out where there’s an audience clamoring for a nonfiction book that doesn’t yet exist—then hires someone to write it.

The company’s founder and CEO says his company collects about 60 million pieces of consumer data a month. For example, Callisto studies the search terms Amazon suggests when users start typing in the first few letters. It found that people would frequently search for something that led to no results. “Consumers are searching for a piece of information, but no product exists to satisfy that consumer demand,” Wayne says. The approach has yielded titles that range from obvious (The Medical Marijuana Dispensary: Understanding, Medicating, and Cooking with Cannabis) to the less so (Everyday Games for Sensory Processing Disorder).

Callisto eagerly pursues niche topics, hence titles like The Hashimoto’s 4-Week Plan, which is geared at readers suffering from the autoimmune disease. The company can be profitable on a book that sells about 1,500 copies, whereas the traditional industry has to sell a multiple of that before they’ll begin to break even. Callisto authors follow an outline dictated by data analysis and write quickly—the company aims to bring books to market in as little as nine weeks. After all, readers are Googling that information right now.

Interestingly enough, Publishers Weekly named Callisto Media one of the fastest-growing independent publishers for 2015 and 2016.

So, what do you think? Do you care if the next bestseller is picked by a human or a machine? Read the full post on Wired and tell me what you think!