Posted by: structureofnews | October 18, 2011

Mining The Exhaust

Decades ago, I attended – or was made to attend – an army course on what they called “operational security,” or “opsec.”  The idea behind it was to learn to minimize inadvertent clues about what we’re planning to do – from increased radio chatter just before an attack to loads of senior officers’ cars parked outside camp during key meetings – that smart spies would pick up on those cues.

Call it pre-internet mining of data exhaust.

Understanding this kind of indirect information certainly isn’t an internet phenomenon – just before Pearl Harbor, for example, US monitors noted a sharp increase in criticism of the US by Japanese radio stations.   But the huge amount of information that we throw off these days, and the relative ease of collecting and crunching it, means we can analyze and understand those signals much better.  According to Foreign Policy, which cited the Pearl Harbor example:

Kalev Leetaru, the assistant director for text and digital media analytics at the University of Illinois’s Institute for Computing in the Humanities, Arts, and Social Science, is one of the leading researchers in the emerging field of conflict early-warning. In a paper published this month in the peer-reviewed online technology journal First Monday, Leetaru argues that “computational analysis of large text archives can yield novel insights to the functioning of society.”

We know data exhaust mining already works at simpler levels. Consider the twitter weather map.  While none of us plans on helping twitter build an accurate map of conditions around the country, tapping into the twitter feed and looking for locations and weather-related words (“sunny,” “rainy,” “cloudy,” etc.) results in an impressive real-time look at weather across the US.

Similarly, Google realized that people searching for flu-related terms were a strong indicator of where flu outbreaks were likely to happen next.   And, of course, there’s sentiment analysis as well – trying to get a sense of how players in the market are feeling and hence predicting which way the market will go.

In many ways, these kinds of cues are much better than surveys or wisdom-of-crowds exercises like the Iowa Electronic Markets, which harness the collective conciousness of traders to predict political outcomes.   Mining the data exhaust isn’t about any concious preference; it’s about them examining your collective behavior to come to an understanding of what you really like – something you yourself might not even know.

And that may well include more complex issues than the weather or whether you’re feeling under the weather.  Google, for example, is trying to extend its flu-search example to see if searches for political terms could yield any understanding of trends.

As the campaign season heats up, one of the ways our Politics & Elections team has begun to participate in the conversation is by highlighting some of the more interesting trends from our search data around candidates, issues, and campaigns.

And the New York Times reports on US government efforts to understand Big Data as well, starting with scans of available online information in a number of Latin American countries.

All of which points to the brand new opportunities and challenges ahead of us – beyond privacy and use of our personal data.  Even without knowing any individual data, there’s potentially huge value in the aggregated information about what we’re doing – at least for whoever can collect that data, analyze it, and find correlations that matter.  And given how much of that exhaust we’re already throwing off – and collecting – it’s hard to imagine it can easily be regulated.

“People talk about oceans of information,” Leetaru says. “We’ve spent the last few decades looking at the waves. If you look below the surface, there’s a whole world of latent information that we’re just beginning to try to understand.”



  1. […] is really in Montana; and overall the broader availability of such data and “data exhaust” is good for journalists, who now have the means to really take on bigger and more complex data projects – […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: