It’s a question that comes up in a recent Columbia Journalism Review piece – part of a new regular feature, Between the Spreadsheets, that focuses on data journalism and visualization – that argues that “journalists should be leading by example in the open data debate.”
And there are certainly good arguments for more openness, not least because it helps bolster trust in stories when readers can delve into the data directly. But how much should they share? What are the moral/business/public interest arguments for being more or less open?
This is one of those topics that has the potential to slip right into the ideology wars (Free vs. Paid! Bloggers vs MSM!) so I’ll tread carefully and try to steer a careful middle way. I’m sure I’ll fail.
There’s a moral argument, and the CJR piece spells it out:
Data sharing is good news for journalism. Advocates for open data hinge their argument on the democratic need for transparency. If journalists are able to get their hands on data quickly and easily, they can work with it and reveal the stories behind the numbers to the public. By publishing spreadsheets of data from which they found those stories and allowing others to use that data, they’re also acting as platforms for hosting data. They’re walking the walk, supporting what they themselves are asking for: easily accessible data.
It’s true – it’s somewhat hypocritical to ask for data to be made available, and then not do the same yourself. On the other hand, journalists get people to talk to them for free, and then charge for the news that results from that interview. (Or at least, they try to charge, not always with success. ) The value that journalists provide, in theory, is more the synthesis of information rather than simply the access to it; but it’s true that’s changing as well.
There’s a public interest argument as well: Giving access to data not only increases trust in stories, but also enables the public – and other journalists – to build on your work and take it in new and potentially valuable directions. That expands the journalist’s mission into the space filled by organizations such as Open Secrets – and maybe it’s a place we should be. Two organizations identified in the CJR piece that actively share data are non-profits – Texas Tribune and ProPublica – and it makes sense that it’s a core part of their mission.
But what about the competitive spirit? If you’ve got a great dataset – whether public or not, that you’ve put time and effort into making useable – shouldn’t you be using it to drive great stories that your competitors can’t touch? Isn’t that one of the great advantages of data-driven journalism – that you can own a whole patch of coverage that no one else can really match? After all, it’s not like anyone is clamoring to publish their list of sources after they finish a six-month investigative project; they might need those people again for another story. And even after you’re done a fantastic series of stories with a great set of exclusive data – a smart news organization knows how to go back to that well and mine it again and again.
That’s a natural tension between journalism for glory, if you like, and journalism for the public interest. They’re not always at odds – but they can be.
And then there’s the business imperative as well. How valuable is data as part of a new business model for journalism? The CJR piece cites the Guardian and the New York Times – both commercial enterprises – as enthusiastic data-sharers, and they are. And one can argue that providing open access to data can drive traffic, and in theory ad revenue.
But shouldn’t data be more than a traffic driver? What business advantages can it confer on a news organization, whether in terms of improved coverage that it can perhaps charge for, or broader reach or scale of coverage, or lower costs in creating content? Politifact, for example, has the ability to create pages that aggregate the data they have about political truth-telling – in effect, creating content out of the data structure they designed. Homicide Watch, through diligent collection of the details of every murder and its aftermath in DC, can generate statistics and stories about trends on a broad range of topics related to that.
At some level, what they’ve done is give themselves a headstart and built a (albeit small) barrier to entry for competitors; there’s no reason why others can’t do the same thing, whether through a special process of data collection or analysis, a particular taxonomy, a metadata structure, or some proprietary algorithm. (Full disclosure – large parts of Thomson Reuters, where I work, are built about such systems.)
Should they give that secret sauce away?
There’s a practical element to this, of course – you can’t and shouldn’t try to protect what’s easily available elsewhere. There’s no point hanging on to stuff if you’re not going to use it, or update it. And even for the data you want to keep, there are ways to show in a manner that allows people to use it fairly extensively, but still not give them full access to the data. For example, you can let people have access to large cuts of data, but not as it’s continuously updated. (Open Secrets, for example, makes money from custom research and licenses its data for commercial purposes.) It’s not an all-or-nothing proposition, especially given that technology can enable all sorts of fixes.
Which is to say, there’s clearly room for give and take in this area. Having more standard data structures is good, because it means it’s easier to plug one set of data into another – even if they aren’t public. So we should certainly work on things like having more common formats, metadata, taxonomies where we can.
And free access to data would certainly be good for humanity. And so would free food (and drink), music and books. So we need to figure out a balance – one that gets as much good out to the public as possible, while making it possible to keep doing the important work of getting data and analyzing it too.