Posted by: structureofnews | April 29, 2012

The Cybernetic Newsroom

The machine-writing capabilities of Narrative Science have been well covered, and pretty much every (human-written) story on the subject has mused – nervously – about the replacement of journalists by algorithms. So, too, does a new piece in Wired; but this one takes the discussion a step further – and points to a future where collaboration between machines and humans can enrich not just news but newsrooms as well.

If that conjures up images of the Borg in Star Trek, well, OK.  It does.  But there are good reasons to explore – and even embrace – this future.

First, some background.  Narrative Science, as various people have written (including here), is a company in Chicago that has done some astounding work in generating stories and reports from data.  The Wired piece cites a sample:

Friona fell 10-8 to Boys Ranch in five innings on Monday at Friona despite racking up seven hits and eight runs. Friona was led by a flawless day at the dish by Hunter Sundre, who went 2-2 against Boys Ranch pitching. Sundre singled in the third inning and tripled in the fourth inning … Friona piled up the steals, swiping eight bags in all …

OK, so it’s a little old-fashioned in style.  But it sure doesn’t read like a machine wrote it.  More importantly, it shows how just analyzing a pile of baseball statistics – given the machine doesn’t have any other information – can yield understanding of an event.  And that’s an important point to note in a world which is increasingly awash in data – more data than any person (or newsroom) can possibly analyze at any kind of scale and speed.

Leaving aside the actual writing that Narrative Science’s algorithms do, what matters here is how the program sifts through data to identify key elements in an event, and then surfaces them through machine-generated text.  It’s not just that it’s smart data analysis; it’s smart automated data analysis.

So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?

Humans do this sort of pattern analysis well, of course. And algorithms are limited by their programming and the factors that they’re designed to look at.  But machines have the advantage of being able to sift much more information than humans can, and at much faster speeds.  So while (human) computer-assisted reporting teams will almost certainly do better work than automated data analysis systems, there’s only so much humans can do in the course of a day.

There are lots of reasons machine-generated text will find its way into newsrooms in the near future: Lower costs, broader coverage, greater personalization.  But it may be that automated trawling for insights in large datasets is the most useful one in the long run.

When I sent by Reuters in 1990 to help cover the Asian Games in Beijing, I prepared a filofax (remember them?) full of statistics on swimming, one of the events I was assigned to cover.  I didn’t want to be caught out if a new Asian Games, Olympic or world record was broken, and I knew I certainly wasn’t going to able to keep all that information in my head.  That was in the pre-internet days; these days I would probably be using Google to check statistics before filing a story.  But it would make much more sense if I was paired with a machine/algorithm that could check the latest results against a host of databases, even before I started writing.

Shouldn’t we be building systems that do that automatically in newsrooms?  Before a reporter writes a market report, shouldn’t an algorithm be checking that day’s close against a database of market data, alerting him or her to new records, 52-week highs, and so on?  Similarly, shouldn’t smart algorithms be trawling through databases and regularly throwing up insights for beat reporters to follow up on – or dismiss?  As the Wired piece notes:

Computers, with their flawless memories and ability to access data, might act as legmen to human writers. Or vice versa, human reporters might interview subjects and pick up stray details—and then send them to a computer that writes it all up.

Of course, doing all this at scale requires building systems that can learn about new datasets relatively quickly, and it sounds from the Wired piece that Narrative Science is well down that path as well.  And it also highlights the need for newsrooms not only to have easy access to data, but also the value of having newsrooms create data from daily reporting as well, to make it easier for machines to help sift through patterns.  (A simple, if non-sophisticated, example: If newsrooms logged the location every car accident they reported on, for example, an algorithm could look for patterns or at least give background to a reporter sitting down to bang out a piece on the latest accident.)

This isn’t to say that this will be – or should be – the center of a modern newsroom.  But just as data analysis and visualization skills, interactive graphics, multimedia are part of the new toolkit that journalists have to be able to access, so too should be automated data trawling, to help surface insights in the mass of data we have access to.  Without that help, all-human newsrooms risk drowning in data.


Responses

  1. Professor Chua,

    I stumbled on your blog a while ago as I was doing research for my startup. I call you professor because you taught at my school, NYU!

    My team and I have been working for long time on the startup where we hope to change the way people access the news, and more broadly in the future, all content the internet. I’ve been waiting for the right time to contact you, and we’re going into beta soon, so I think it’s appropriate to reach out now. I’m doing it here because I couldn’t track down your email address.

    I know you must be incredibly busy and may not be available to meet in person (your ideas on journalism are really exciting, I’d love to discuss them with you), but perhaps you’d be interested in paying our site a visit? You can contact me at alebrahim.ali @ gmail.com.

    Thank you!

  2. […] Science which has earned coverage by the likes of the New York Times, Wired, and numerous blogs for its ability to automatically produce actual, readable stories of things like sports games or […]

  3. […] journalism – all very important advances – there’s less said about how we can marry all that power into our day-to-day work. Can investigative journalism aggressively leverage computational […]

  4. […] aren’t a good adjunct to human newsrooms. Indeed, integrating humans and machines in a “cybernetic newsrooms” is one great possible outcome, and one we need to pursue more […]

  5. […] Because this time around it features some very new technology that helps surface interesting statistical factoids (Donald Trump’s support is weak among Republicans making $100,000 or more a year) to help users dig into data – and take us one step closer to a cybernetic newsroom. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: