Posted by: structureofnews | September 20, 2011

Machine Language

The New York Times recently profiled Narrative Science, a start-up with roots at Northwestern University, that generates stories from data.  It cites a roundup of a Wisconsin-UNLV football game, based on the stats from that contest.  You’d be hard-pressed to tell that it came from a machine, not a human.  It works faster, and does it cheaper than a person, too.  Is this the future of journalism?

It many ways, it is – but not just because this opens up a flood of new content to the world and potentially undercuts a generation of journalists who might otherwise have been hired to write those stories.  It also highlights is the increasing importance of data and how it’s being used in news and information.

Machines-generated stories aren’t entirely new.  Organizations like Reuters and Bloomberg have been using computers to turn out simple stories for some time now – from generating headlines when economic data is released to short items about changes in analysts’ recommendation.  They work reasonably well, there aren’t many mistakes in them, and they fill a basic need.  It’s true they won’t win any Pulitzer Prizes, but then again they’re not meant to – any more than game roundups or stock market reports are supposed to.

But technology marches on, and the example cited by the NYT piece looks like a step up in sophistication.  As machines learn and evolve, who knows how good such stories might become?  As the Times notes:

The innovative work at Narrative Science raises the broader issue of whether such applications of artificial intelligence will mainly assist human workers or replace them. Technology is already undermining the economics of traditional journalism. Online advertising, while on the rise, has not offset the decline in print advertising. But will “robot journalists” replace flesh-and-blood journalists in newsrooms?

It’s a real question.  Perhaps they won’t replace prize-winning investigative reporters, but what about the people who cover markets or games? The classic advice for people in situations like that is that you should learn a higher-value skill and leave the more routine work to machines, but it’s much easier to say that than to  actually do it.  For journalists caught in that position, it means having to move up the value-added chain pretty quickly.

Still, that assumes that the key job of a journalist is writing stories, rather than uncovering information – or even creating data.  After all, the fundamental underpinnings of many of these machine-writing programs is data, and the increasing availability of it in forms that are easily processed.

The Narrative Science software can make inferences based on the historical data it collects and the sequence and outcomes of past games. To generate story “angles,” explains Mr. Hammond of Narrative Science, the software learns concepts for articles like “individual effort,” “team effort,” “come from behind,” “back and forth,” “season high,” “player’s streak” and “rankings for team.” Then the software decides what element is most important for that game, and it becomes the lead of the article, he said. The data also determines vocabulary selection. A lopsided score may well be termed a “rout” rather than a “win.”

Which is another way of saying that the program is based on parsing data – both from the game, and from previous games.  Without those game stats, there would be no program, and no story.   Sports and markets have long generated such data, of course, but it’s rarer in other areas, as are easy connections between disparate types of data.  And that’s an opportunity for smart journalism enterprises, such as Politifact, which in effect created its own data about political promises and then used it to build meta-stories about how politicians are doing at keeping their word.  It’s a small step from that to using the technology in Narrative Science to generate stories based on Politifact’s data.

So is the journalism in that hypothetical tie-up in the creation of the data or in the generation of the story?

About these ads

Responses

  1. [...] a profile of Narrative Science, a start-up that takes data and turns out stories from it. (I wrote about that piece late last year as well.) The NYT blog post riffs about the value not just of [...]

  2. [...] of these are real concerns – and that future may not be as far off.  The quality of machine-generated stories is steadily improving, and the world is increasingly awash in the kind of data that such programs [...]

  3. [...] Plus where the new frontiers of journalism and information – such as data-driven journalism or automation – are to be [...]

  4. [...] some background.  Narrative Science, as various people have written (including here), is a company in Chicago that has done some astounding work in generating stories and reports from [...]

  5. [...] written about machine-generated stories before, and touched on the advantages it can confer, not least [...]

  6. [...] that appeal to each subgroup.  Short of machine-generated text – although we’re already on the threshold of that age – it’s simply impossible to write all the stories that people want or [...]

  7. […] It should be interesting, not just because the automated production of stories from data is rapidly coming into the mainstream, but also because it throws up potentially huge questions about how news judgments can be embedded into algorithms, and how newsrooms might have to evolve to allow that to happen as part of their workflow. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 132 other followers

%d bloggers like this: