Dan, back in 2009, had laid out his broad vision of what he called the Semantic Economy; it’s hugely ambitious at some level, and very forward-looking. (I wrote about it a little while ago, and Dan kindly commented on that post then.) This more-recent piece focuses more on the nuts and bolts – literally – of some of the steps we need to move along that road.
If my conclusions are correct, then our emerging information-based global economy will require interchangeable, robust data in the same way that our current economy requires that every finely threaded quarter-inch screw must have 28 threads per inch, machined to precisely the same height and pitch and thread-axis angle, regardless of whether the screw is manufactured in the People’s Republic of China or Alpharetta, Ga.
Which makes lots of sense. It’s not the sexiest thing to talk about, and grand theories will only go so far without a working CMS to power them, but this is critical not only for how-do-we-move-ahead practical reasons, but also because it can help change journalists’ mindsets about what’s important. As Dan points out, the current thinking about the work we do is that:
A story must be potentially interesting to a valuable audience, or it isn’t worth producing.
This is the fatal flaw with a theory of the press that is based on stories. It assumes that the only information that has value is the information that seems immediately interesting.
But what if baseball coverage worked that way? If our only records of Major League Baseball games were stories about dramatic, game-defining events, we’d never read about a bloop single in the scoreless third, because who cares about facts irrelevant to the story? The “story” is the game-winning RBI in the bottom of the ninth.
Yet if we didn’t keep track of all those overwise boring statistics, we wouldn’t have access to all the details of all those baseball games ever played – and more importantly, all the byproducts that have come out of them. (Fantasy baseball leagues and trading cards, among them). Luckily, baseball does keep all those stats (less luckily if you’re stuck between two baseball fans at a bar and all they do is cite stats at each other, but that’s another issue). Ditto stock tables vs. daily market reports and all the resulting analysis that is enabled by being able to dissect a complete – and continuously updated – dataset rather than simply having an archive of stories.
One key question, of course, is: Who collects all this data? Stock data comes from exchanges, and baseball stats come from scorekeepers. But journalists can and should be seeing how they can get into this activity as well – not because it’s exciting (it isn’t), but because it can help them write better stories, let them leverage a competitive advantage – sheer discipline – over others, and potentially power new products and business models.
As I noted in that earlier post, Dan’s vision and mine overlap to a great degree – although he generally has better analogies. I’m still largely focused on trying to come up with some specific news product in mind that calls for some discrete set of information to be collected regularly as part of news gathering, and then repurposed. That doesn’t by any means undercut the notion of standardized data structures shared across multiple organizations- and as the franchising of Politifact has shown, you can get others to adopt your data structure if you’re successful. (Now if only Homicide Watch could pull off the same feat.) Organizations that share data structures – especially for datasets that can span areas of coverage or georgraphies – all potentially stand to gain if they can work out ways to compensate each other for use of each other’s data.
And speaking of money, Dan points a neat idea in yet another post, which also looks promising. In the future, he suggests,
…journalists will produce machine-readable XML files first, with the human-readable narrative existing as a sub-set of that file.
My prediction? News organizations will give away their human-readable documents and sell their datasets, either directly to developers and researchers, or indirectly via their own informational products.
It’s an interesting riff on the current journalism business model, where readers get content at relatively low cost because advertisers are really paying the bills. Perhaps in the future data will pay the bills so content remains cheap (or free.) And if that data is in a structure that’s easily interchanged/cross-analyzed with data from another organization, it’s all the more valuable.