Or so the famous New Yorker cartoon by Peter Steiner asserted in 1993. And it may have been true then, but it’s less and less true now – thanks to the huge amounts of data we’re throwing off daily, and the massive increases in computing power since then.
And that raises all sorts of issues that go well beyond the normal debates over privacy and the use of personal data online, and into broader discussion about how best to manage a world where journalists can uncover lots more information about people – and so can everyone else, including governments, companies and criminals.
Exhibit A in this area is a 2009 term project at MIT – recently published at the peer-reviewed online journal First Monday – that figured out whether people in Facebook were gay or not based on their social connections. In other words, the researchers didn’t need to ask if you had disclosed your sexual orientation on Facebook; they just needed to see who your friends were, and what their orientations were. As they note in the paper:
Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network.
In other words, we’re losing control over what the world knows about us.
To be sure, we’ve always given off that kind of information. But it’s easier to track it and examine it now, because so much of it is digital; and it’s much easier to analyze it because computers are much more powerful. As the MIT paper notes:
With the advent of computer–mediated communication, it has become disturbingly easy to log and track the web of human interactions. The phone company stores data on who calls whom, for example, and that data builds a social graph.
Or consider Twitter. Very few tweeters encode their location into their tweets – but if you want to figure out where someone is from, there are a host of methods of figuring that out just from what you write, who you follow and who follows you.
A 2010 paper by Zhiyuan Cheng, James Caverlee and Kyumin Lee at Texas A&M figured out that they could place about half of twitter users within 100 miles of their actual location – and they do much better the more someone tweets.
Nice if you want to figure out if someone who purports to be an eyewitness to Greek riots is really in Montana; and overall the broader availability of such data and “data exhaust” is good for journalists, who now have the means to really take on bigger and more complex data projects – look at The Wall Street Journal’s series on Medicare abuse, for example. And even something like WhoRunsHK wouldn’t have been possible a decade ago.
But there are a lot of other uses for such information.
In Mexico, drug gangs are getting much more savvy about using mining social media information to find out who’s rallying citizens against them. One woman was killed and beheaded after she posted information about crimes and urged others to do the same online.
“The narcos have people who are experts in communications,” said a journalist from Tamaulipas state, home to Nuevo Laredo, who asked to remain anonymous. “They are monitoring Internet sites, blogs, phone calls and the famous social networks on a daily basis.”
Governments aren’t blind to the possibilities here, either. A US Air Force First Lieutenant, James Okolica, wrote a thesis to show how analyzing email traffic in an organization could help identify disgruntled individuals. His goal was to help catch spies – but his project focused on a throve of emails from Enron, and it pointed out that, using his techniques, you could have fingered Sherron Watkins, the whistleblower in that case, early on. Would that have been a good thing?
In today’s Information Age, one of the best sources of personal information available at work is an individuals’ email and internet activity. By datamining an organization’s email, it is possible to learn a lot about not only the organization, but of the individuals within it as well. Datamining can find potential insiders by finding individuals who feel alienated from the organization and/or who have interests contrary to the organization’s well-being.
The results show that by comparing the topics of emails that people send internally with the ones sent externally, a small number of employees (0.03% – 1.0%) emerge as having clandestine interests and the potential to become insider threats.
All of this isn’t necessarily a good thing or a bad thing per se – but it does mean that we’re in an age where we have much less control over who knows what about us. It’s one thing to campaign for more privacy about our personal data – but in many of these cases, it’s the information around us that’s giving us away.
So should there be more restrictions on how such data sets get joined up? How much analysis anyone is allowed to do on them? As a journalist, I’m loathe to have too many – or any – restrictions placed on what’s clearly public information. And, in any case, it’s clear that the government – and some companies, at least – will have that data. So we might as well level the playing field.
At the same time, it’s not going to be an entirely comfortable world when anyone can figure out lots about you simply by analyzing the digital crumbs you – and the people you interact with – leave around. It’s going to take a while for us to get used to this – and to figure out how to minimize the inevitable abuses.