On the other hand, sometimes work really helps feed it, as when we at Reuters – shameless plug time again – relaunch the Polling Explorer, a (free!) site that lets you dig into the nearly 100 million responses we’ve been collecting from Americans since 2012. It’s a great poll, a great resource, and now even easier to explore.
How much easier? I’m glad you asked.
Because this time around it features some very new technology that helps surface interesting statistical factoids (Donald Trump’s support is weak among Republicans making $100,000 or more a year) to help users dig into data – and take us one step closer to a cybernetic newsroom.
Which is not a bad thing, even if it conjures up images of the Borg. Honest.
Poynter had a nice piece on us, which explains the gist of the idea:
…this crush of data also means that many reporters are stepping up to a proverbial dinner buffet with a butter plate. With so much information and so little time to crunch it, data journalists have to make tough choices about the kinds of data sets they dive into and how long they can afford to spend analyzing them.
To help provide context for this sprawling repository, Reuters is using algorithms that sift through the data and surface potentially interesting interpretations.
Ken Ellis, the genius head of technology at Reuters, who with Mo Tamman created the Polling Explorer in the first place, built systems that trawl through the poll responses – from 2,500 people each week – as they come through and looks for statistically significant deviations from the main trends. It could, for example, find subsets of groups that have significantly different responses, as in the case with Trump’s support mentioned above. Or it could find shifts in trends over time. Or it might note a particular data point staying flat even as all the others around it are rising – say, if support for a candidate among young people isn’t increasing even while his or her overall numbers are rising.
That’s not always the easiest kind of information to find, even for experienced data mavens, and while this capability doesn’t take away the need for a human brain to assess the quality of the insights presented, it certainly helps sort through the flood of information out there.
Ken’s algorithms pick the handful that are most statistically significant, then turn them into sentences to make them more understandable. In the case above, it offers a “suggested poll reading” to look at how:
OK, so it’s not Shakespeare. And some of the suggested poll readings may be somewhat obvious – Democrats generally approve of the president’s handling of his job compared to Republicans, for example.
Others are less obvious: In a question about the most important issue facing America, for example, suggested poll readings include:
However you read them, this is a first step towards having algorithms regularly trawl through data, throw up suggested facts or factoids that we may not have noticed, and give us fresh leads to explore. And that’s a way to combine what machines do well – sift through huge chunks of information at speed – with what humans do well – assess information for meaning – to build a better newsroom.
Ken’s system currently works on polling data, but there’s no reason why it can’t work on financial information, crime statistics, city hall budgets or anything else. And building on the language generation capabilities here means we’re moving closer to a more robust and intelligent in-house machine-generated story capability.
Sometimes work can even be fun.