There was a uplifting – for geeks, anyway – story in the New York Times a week or so ago, about a small team of data crunchers in City Hall helping solve some of New York’s (many) problems.
For the modest sum of $1 million, and at a moment when decreasing budgets have required increased efficiency, the in-house geek squad has over the last three years leveraged the power of computers to double the city’s hit rate in finding stores selling bootleg cigarettes; sped the removal of trees destroyed by Hurricane Sandy; and helped steer overburdened housing inspectors — working with more than 20,000 options — directly to lawbreaking buildings where catastrophic fires were likeliest to occur.
Pretty impressive stuff.
They also figured out, just by digging through and matching some public records, where the city’s most likely illegal grease-dumpers, were, and sic’ed inspectors on them. (For the best explanation of why that matters – and one of the best examples of The Wall Street Journal’s famed funny “Ahed” stories on the front page – check out this piece by Barry Newman. You gotta love a story that starts: “Why wait until the next story about coagulated fat in sewers comes along when you can read this one now?”)
Makes you love the power in joining up data sets and crunching numbers, doesn’t it? But also in the same issue of the paper was a more cautionary tale – one about fears about how to regulate the inevitable questions of privacy as more and more of our lives are digitized, and more of and more it is available to governments and corporations.
Those are legitimate worries, and probably one of the big challenges ahead of us as societies is how the tension between “good” uses of data and privacy worries can be resolved. There are lots of arguments on both sides, and no simple answers. There’s interesting research going on about how people actually value privacy, rather than what they say they value. And there are clear cultural differences as well: Tax payments are public in Scandinavia, but secret in the US. Court records are open in America but names are redacted in many European countries.
Still, regardless of anyone’s position, privacy clearly matters. I certainly don’t want people prying into my life. But then again, a lot of what journalism is – and by that I mean the good, public service type of journalism, not the tabloid press – is about violating people’s privacy, albeit in the name of the public interest.
CEOs that take corporate jets to play golf, deadbeat dads that don’t pay alimony, cheating schoolteachers, among others, all don’t want people prying into their lives, too. But there are compelling reasons to do so, and scraping public data, joining disparate databases, and analyzing that information are all tools the modern journalist have used to uncover wrongdoing.
That’s not to say that we should toss privacy concerns aside in the name of journalism. We shouldn’t, not by a long shot. The flood of data, and the processing power of computers, means that we can uncover far more about ordinary people than ever before – and there are lots of ethical dilemmas that raises. Even when the subject is of public interest – such as gun ownership – there are real questions about what officially public information should and shouldn’t be revealed.
But there’s also a lot of right that can be done with all that data, and all those tools.
And we don’t often – or often enough – make that case. The headlines – the ones we ourselves write – are dominated by fears and concerns about Big Data, and less often about where it’s helped make our lives better. And particularly where journalistic use of it has helped make our lives better. Shouldn’t we?
If there’s going to be a debate about how much we can and can’t do with data, we ought to make the case for the good as well as the bad.