Posted by: structureofnews | August 29, 2012

Data Wants To Be…

How open should journalists be about the data they have?  How much of it should they share with the world?

It’s a question that comes up in a recent Columbia Journalism Review piece – part of a new regular feature, Between the Spreadsheets, that focuses on data journalism and visualization – that argues that “journalists should be leading by example in the open data debate.”

And there are certainly good arguments for more openness, not least because it helps bolster trust in stories when readers can delve into the data directly.  But how much should they share?  What are the moral/business/public interest arguments for being more or less open?

This is one of those topics that has the potential to slip right into the ideology wars (Free vs. Paid! Bloggers vs MSM!) so I’ll tread carefully and try to steer a careful middle way.  I’m sure I’ll fail.

There’s a moral argument, and the CJR piece spells it out:

Data sharing is good news for journalism. Advocates for open data hinge their argument on the democratic need for transparency. If journalists are able to get their hands on data quickly and easily, they can work with it and reveal the stories behind the numbers to the public. By publishing spreadsheets of data from which they found those stories and allowing others to use that data, they’re also acting as platforms for hosting data. They’re walking the walk, supporting what they themselves are asking for: easily accessible data.

It’s true – it’s somewhat hypocritical to ask for data to be made available, and then not do the same yourself.  On the other hand, journalists get people to talk to them for free, and then charge for the news that results from that interview. (Or at least, they try to charge, not always with success. ) The value that journalists provide, in theory, is more the synthesis of information rather than simply the access to it; but it’s true that’s changing as well.

There’s a public interest argument as well: Giving access to data not only increases trust in stories, but also enables the public – and other journalists – to build on your work and take it in new and potentially valuable directions.  That expands the journalist’s mission into the space filled by organizations such as Open Secrets – and maybe it’s a place we should be.  Two organizations identified in the CJR piece that actively share data are non-profits – Texas Tribune and ProPublica – and it makes sense that it’s a core part of their mission.

But what about the competitive spirit?  If you’ve got a great dataset – whether public or not, that you’ve put time and effort into making useable – shouldn’t you be using it to drive great stories that your competitors can’t touch?  Isn’t that one of the great advantages of data-driven journalism – that you can own a whole patch of coverage that no one else can really match?  After all, it’s not like anyone is clamoring to publish their list of sources after they finish a six-month investigative project; they might need those people again for another story.  And even after you’re done a fantastic series of stories with a great set of exclusive data – a smart news organization knows how to go back to that well and mine it again and again.

That’s a natural tension between journalism for glory, if you like, and journalism for the public interest.  They’re not always at odds – but they can be.

And then there’s the business imperative as well.  How valuable is data as part of a new business model for journalism?  The CJR piece cites the Guardian and the New York Times – both commercial enterprises – as enthusiastic data-sharers, and they are.  And one can argue that providing open access to data can drive traffic, and in theory ad revenue.

But shouldn’t data be more than a traffic driver?  What business advantages can it confer on a news organization, whether in terms of improved coverage that it can perhaps charge for, or broader reach or scale of coverage, or lower costs in creating content?  Politifact, for example, has the ability to create pages that aggregate the data they have about political truth-telling – in effect, creating content out of the data structure they designed.  Homicide Watch, through diligent collection of the details of every murder and its aftermath in DC, can generate statistics and stories about trends on a broad range of topics related to that.

At some level, what they’ve done is give themselves a headstart and built a (albeit small) barrier to entry for competitors; there’s no reason why others can’t do the same thing, whether through a special process of data collection or analysis, a particular taxonomy, a metadata structure, or some proprietary algorithm.  (Full disclosure – large parts of Thomson Reuters, where I work, are built about such systems.)

Should they give that secret sauce away?

There’s a practical element to this, of course – you can’t and shouldn’t try to protect what’s easily available elsewhere.  There’s no point hanging on to stuff if you’re not going to use it, or update it.  And even for the data you want to keep, there are ways to show in a manner that allows people to use it fairly extensively, but still not give them full access to the data.  For example, you can let people have access to large cuts of data, but not as it’s continuously updated.  (Open Secrets, for example, makes money from custom research and licenses its data for commercial purposes.)  It’s not an all-or-nothing proposition, especially given that technology can enable all sorts of fixes.

Which is to say, there’s clearly room for give and take in this area.  Having more standard data structures is good, because it means it’s easier to plug one set of data into another – even if they aren’t public.  So we should certainly work on things like having more common formats, metadata, taxonomies where we can.

And free access to data would certainly be good for humanity.  And so would free food (and drink), music and books.  So we need to figure out a balance – one that gets as much good out to the public as possible, while making it possible to keep doing the important work of getting data and analyzing it too.


  1. […] How open should journalists be about the data they have?  How much of it should they share with the world?  […]

  2. Reg,

    This is going to sound pedantic, but one of the problems I had with Codrea-Rado’s piece in CJR was that she seems to use the terms spreadsheet and database interchangeably. Sort of like having a discussion with a bunch of car guys, and hearing one natter on about the “motor” when he actually means the “engine.” Although it’s easy enough to make the intuitive leap, the lack of such a distinction is not something an engineer would be guilty of, and makes you wonder if the guy actually knows what he’s talking about.

    It’s probably also true that anyone who has ever been exposed to an open source software project is aware of the fact that they tend to be dependent on a cooperative effort; some people crank code, others design the UI. If you like the software, there’s the implication that everyone has something to contribute even if it’s working on a minor part of the technical docs, or logging a bug report. Open data can be seen as an extension now that databases have reached an advanced level.

    The comment by Codrea-Rado that journalists are like a child not willing to share a toy truck seems to ignore the elements of quid pro quo that go into projects with a lot of moving parts and are mostly driven by volunteers. This may be unfair, but she seems to be saying people are mean if they’re not willing to share their stuff, without any indication to what extent she herself is willing to put out any effort to share stuff back.

    It sometimes seems to me that we’ve reached the point where the free sharing of information has taken on an unquestioned moral high ground to the extent that there’s no longer room for debate; it’s now into the realm of dogma.

    But looking at it another way, you could also argue that while the sharing of information can be a good thing, there may in fact be an even higher moral ground which is that in order to preserve journalism as a societal entity, there needs to be a way to maintain it with sustainable business models.

    It’s interesting that on the day following Codrea-Rado’s piece, CJR did a profile of Richmond BizSense, which has evolved into a modestly successful online news org by, at least partly, going around and gathering up esoteric data such as building permits and court filings. Here’s the link:

    So you have to ask yourself: is the greater good for Richmond Bizsense to be allowed to continue to provide a service of some value to the Richmond community, or should it simply roll over the toy truck of its scut work to open data because of some sort of perceived moral imperative of sharing?

    I don’t have an answer; this is merely observation.

  3. Perry, thanks for your comments – there’s a lot to chew over when it comes to how we perceive the value (or moral position) of open vs. less-open data. Like you, I don’t have an answer; but I think it’s worth sifting through the various types of arguments to try to come to some kind of coherent position.

    There are moral issues, certainly, as well as greater-good public interest ones, and straight-up business and commercial factors too. They don’t sit in isolation – or we may well have lived in a world where all newspapers were free.

    That’s not to say how the world evolved in the past was somehow the “right” way; but we do need to get past broad dogma – on this, and on free vs. paid and other ideological issues of the day – and try to figure out what, as business types say, what problems we’re trying to solve for.

  4. […] How open should journalists be about the data they have?  How much of it should they share with the world?  […]

  5. […] should journalists be about data they shave? How much should they share with the world?”…l-and-data-journalism I guess the same holds for academic […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: