The New York Times recently profiled Narrative Science, a start-up with roots at Northwestern University, that generates stories from data. It cites a roundup of a Wisconsin-UNLV football game, based on the stats from that contest. You’d be hard-pressed to tell that it came from a machine, not a human. It works faster, and does it cheaper than a person, too. Is this the future of journalism?
It many ways, it is – but not just because this opens up a flood of new content to the world and potentially undercuts a generation of journalists who might otherwise have been hired to write those stories. It also highlights is the increasing importance of data and how it’s being used in news and information.
Machines-generated stories aren’t entirely new. Organizations like Reuters and Bloomberg have been using computers to turn out simple stories for some time now – from generating headlines when economic data is released to short items about changes in analysts’ recommendation. They work reasonably well, there aren’t many mistakes in them, and they fill a basic need. It’s true they won’t win any Pulitzer Prizes, but then again they’re not meant to – any more than game roundups or stock market reports are supposed to.
But technology marches on, and the example cited by the NYT piece looks like a step up in sophistication. As machines learn and evolve, who knows how good such stories might become? As the Times notes:
The innovative work at Narrative Science raises the broader issue of whether such applications of artificial intelligence will mainly assist human workers or replace them. Technology is already undermining the economics of traditional journalism. Online advertising, while on the rise, has not offset the decline in print advertising. But will “robot journalists” replace flesh-and-blood journalists in newsrooms?
It’s a real question. Perhaps they won’t replace prize-winning investigative reporters, but what about the people who cover markets or games? The classic advice for people in situations like that is that you should learn a higher-value skill and leave the more routine work to machines, but it’s much easier to say that than to actually do it. For journalists caught in that position, it means having to move up the value-added chain pretty quickly.
Still, that assumes that the key job of a journalist is writing stories, rather than uncovering information – or even creating data. After all, the fundamental underpinnings of many of these machine-writing programs is data, and the increasing availability of it in forms that are easily processed.
The Narrative Science software can make inferences based on the historical data it collects and the sequence and outcomes of past games. To generate story “angles,” explains Mr. Hammond of Narrative Science, the software learns concepts for articles like “individual effort,” “team effort,” “come from behind,” “back and forth,” “season high,” “player’s streak” and “rankings for team.” Then the software decides what element is most important for that game, and it becomes the lead of the article, he said. The data also determines vocabulary selection. A lopsided score may well be termed a “rout” rather than a “win.”
Which is another way of saying that the program is based on parsing data – both from the game, and from previous games. Without those game stats, there would be no program, and no story. Sports and markets have long generated such data, of course, but it’s rarer in other areas, as are easy connections between disparate types of data. And that’s an opportunity for smart journalism enterprises, such as Politifact, which in effect created its own data about political promises and then used it to build meta-stories about how politicians are doing at keeping their word. It’s a small step from that to using the technology in Narrative Science to generate stories based on Politifact’s data.
So is the journalism in that hypothetical tie-up in the creation of the data or in the generation of the story?