The machine-writing capabilities of Narrative Science have been well covered, and pretty much every (human-written) story on the subject has mused – nervously – about the replacement of journalists by algorithms. So, too, does a new piece in Wired; but this one takes the discussion a step further – and points to a future where collaboration between machines and humans can enrich not just news but newsrooms as well.
If that conjures up images of the Borg in Star Trek, well, OK. It does. But there are good reasons to explore – and even embrace – this future.
First, some background. Narrative Science, as various people have written (including here), is a company in Chicago that has done some astounding work in generating stories and reports from data. The Wired piece cites a sample:
Friona fell 10-8 to Boys Ranch in five innings on Monday at Friona despite racking up seven hits and eight runs. Friona was led by a flawless day at the dish by Hunter Sundre, who went 2-2 against Boys Ranch pitching. Sundre singled in the third inning and tripled in the fourth inning … Friona piled up the steals, swiping eight bags in all …
OK, so it’s a little old-fashioned in style. But it sure doesn’t read like a machine wrote it. More importantly, it shows how just analyzing a pile of baseball statistics – given the machine doesn’t have any other information – can yield understanding of an event. And that’s an important point to note in a world which is increasingly awash in data – more data than any person (or newsroom) can possibly analyze at any kind of scale and speed.
Leaving aside the actual writing that Narrative Science’s algorithms do, what matters here is how the program sifts through data to identify key elements in an event, and then surfaces them through machine-generated text. It’s not just that it’s smart data analysis; it’s smart automated data analysis.
So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?
Humans do this sort of pattern analysis well, of course. And algorithms are limited by their programming and the factors that they’re designed to look at. But machines have the advantage of being able to sift much more information than humans can, and at much faster speeds. So while (human) computer-assisted reporting teams will almost certainly do better work than automated data analysis systems, there’s only so much humans can do in the course of a day.
There are lots of reasons machine-generated text will find its way into newsrooms in the near future: Lower costs, broader coverage, greater personalization. But it may be that automated trawling for insights in large datasets is the most useful one in the long run.
When I sent by Reuters in 1990 to help cover the Asian Games in Beijing, I prepared a filofax (remember them?) full of statistics on swimming, one of the events I was assigned to cover. I didn’t want to be caught out if a new Asian Games, Olympic or world record was broken, and I knew I certainly wasn’t going to able to keep all that information in my head. That was in the pre-internet days; these days I would probably be using Google to check statistics before filing a story. But it would make much more sense if I was paired with a machine/algorithm that could check the latest results against a host of databases, even before I started writing.
Shouldn’t we be building systems that do that automatically in newsrooms? Before a reporter writes a market report, shouldn’t an algorithm be checking that day’s close against a database of market data, alerting him or her to new records, 52-week highs, and so on? Similarly, shouldn’t smart algorithms be trawling through databases and regularly throwing up insights for beat reporters to follow up on – or dismiss? As the Wired piece notes:
Computers, with their flawless memories and ability to access data, might act as legmen to human writers. Or vice versa, human reporters might interview subjects and pick up stray details—and then send them to a computer that writes it all up.
Of course, doing all this at scale requires building systems that can learn about new datasets relatively quickly, and it sounds from the Wired piece that Narrative Science is well down that path as well. And it also highlights the need for newsrooms not only to have easy access to data, but also the value of having newsrooms create data from daily reporting as well, to make it easier for machines to help sift through patterns. (A simple, if non-sophisticated, example: If newsrooms logged the location every car accident they reported on, for example, an algorithm could look for patterns or at least give background to a reporter sitting down to bang out a piece on the latest accident.)
This isn’t to say that this will be – or should be – the center of a modern newsroom. But just as data analysis and visualization skills, interactive graphics, multimedia are part of the new toolkit that journalists have to be able to access, so too should be automated data trawling, to help surface insights in the mass of data we have access to. Without that help, all-human newsrooms risk drowning in data.