Posted by: structureofnews | January 7, 2011

Semantic(s) Matter(s)

Trying to explain structured journalism – or even the problem it’s trying to solve – to people can be a challenge sometimes.   So it’s great when I find someone who very succinctly and neatly describes the issues we’re facing.

Dan Conover at Xark 3.0 lays out in stark terms what’s wrong with the information age we live in, and proposes an alternative “Semantic Economy.”  I’m not so sure we can get there as quickly as he imagines, but we’re certainly rowing in the same direction.  More on that further down, but first his diagnosis:

But what goes unsaid … is how horribly inefficient the information age has become. Networked media gives us instant (and too-cheap-to-meter) access to generally relevant answers, the tech industry gives us unprecedented and highly affordable processing power, and everyone with an ISP has as much publishing capability as they need. It sounds like the beginnings of a highly productive, highly profitable era – except it’s neither profitable nor productive at the moment (at least not if you define productivity by the dollars generated from work).

Information, he points out, is being largely created in one of two formats: free text and unsearchable databases.  Neither works particularly well if we’re trying to link information and create new value.

It’s as if we’ve come up with all the technologies needed to create a modern automobile, except its 1870, and the only fuel source we have for this remarkable new machine is coal.

Add to that the issue that we don’t have a business model that effectively monetizes the text, and the whole thing looks pretty nasty.

Consequently, we are stuck with a contemporary global economy in which very few people are reasonably well compensated for producing anything. In the same sense that our manufacturing sector has collapsed because third-world factories charge next to nothing for labor (often with resulting deficits in quality), so has our media economy collapsed because our only proven model for funding the creation of news and information comes from renting consumers’ attention to commercial interests. Since most people are happy to pay attention to low-grade schlock, the business case for producing high-quality, useable information is increasingly weak. Why invest in an expensive product with a lower rate of return when the cheap product makes you more money?


So what’s a publisher/journalist to do?  Dan’s plan harks back, in some ways, to the Semantic Web, which as he points out, Clay Shirky mercilessly took apart – with humor, but mercilessly – back in 2003.  But this time he wants to create a content management system that allows for more semantic integration so that organizations can have their content, in effect, talk to each other more easily. (At least, I hope I’m understanding what he’s suggesting.)

In other words, if your story tags/structures a company name as a company name, and mine does as well, we may be able to – given scale and breadth – pull those two pieces of information into a new and hopefully profitable product.  And if you building in the appropriate tracking and licensing models, everyone can get a piece of that pie.

There are many similarities to the notions embedded in structured journalism – among them, getting more out of free text by structuring it as we write it, rather than trying to reverse-engineer meaning from stories later on.  That builds us databases of information that, if organized well, can be recombined into new products.

But I think it’s that daily structure that matters – and perhaps that’s where I differ from Dan.  It would be great to have a CMS that allowed for all kinds of semantic structuring – and it can probably be built in short order.  But I suspect it makes more sense to build – or envision – the product you want first, and then try to engineer the CMS and journalists around it.

If you want to make Politifact, you have to set things up so you have a rating on each pronouncement that Obama makes; that doesn’t come naturally with a generic CMS – and even if an industry-wide CMS had that provision, how do I know that my rating matches your rating?  That comes instead from franchising the Politifact structure to other organizations, as they’ve done.  More importantly, it comes from restructuring your newsroom so they write the stories and rate the statements a certain way.

Similarly, the data structure for WhoRunsHK is set up in a particular way; it would be great if everyone adopted the same structure and we could all share information and build an even more robust product – but what works in Hong Kong may not work as well in the Philippines.

That’s not to say that having industry-wide standards isn’t a good thing – it is, and is a critical step as we move ahead.  But that the likely path forward is probably some organizations building successful products and franchising/allowing others to duplicate the structure.  That’s organic growth, as Dan also champions, but I suspect it’ll come about through products, not standards.

But whichever way it grows, it’s a good thing as long as it grows.


  1. Thank you for the read, the link and your thoughts.

    You are most likely right about the timescale. My attempt in that first essay was to talk more about what separates us from these tools and these features, and from my perspective, they’re out there now, but not connected in useful ways. The new tasks are so frequently now just the assembly of existing capacities into new suites of tools that make exotic undertakings practical. Blogging wasn’t a new technology — it was an interface that made an old technology user-friendly and worth the effort. From my perspective, when that’s the situation, the horizon is still distant, but approaching at a speed that’s hard to determine.

    Also, I should clarify my thoughts on interoperable semantic structure. The original vision of a Semantic Web (upper-cased intentionally) was a Web of Meaning, based on people and organizations recognizing their mutual interest in making their ideas explicit. The belief was that a generic standard for conveying semantic data would be enough to spur further development, something along the lines of “If the W3C builds it, they will come.”

    I think the problem with that idea (beyond Shirky’s critique) is analogous to the development of blogging. People needed a tool that was easier to use and the prospect of better return on their investment of time. Blogging software that combined easy templating, composition, comment management and feed burning opened both of those doors simultaneously.

    We don’t have a tool like that today for writing in depth. Even our concept of hyperlinking is based on the notion that our links are elsewhere, not extensions of the piece we’re creating.

    So when I write about what might develop from the deployment of such tools, that’s not saying that the tools themselves will offer those features. To wit: news organizations using an SCMS would not necessarily share a common information architecture, or even the same XML schema. This would be determined and customized by the user.

    However, it makes sense that users would seek common standards of interoperability. So you build the tool so that it can adapt to the needs of users, but you bake in features that encourage the sharing of semantic structures. You don’t dictate whether users select NEWSML or IPTC or NITF, but you provide the mapping and normalization tools that make it easier for these schema to communicate.

    My hope is that this creation of semantic value from the ground up will be more effective than the promulgation of standards from the top down. And I suspect that offering a tool that works in particular ways will have much greater effect on our semantic structures than any number of thoughtful standards released by W3C.

    My belief is based on the thought that there is tremendous value in creating usable meaning, and that if we can produce and publish meaning, we will create supply, demand and markets. If semantic markup pays for itself and turns a profit, people will create more of it. If it doesn’t, they won’t.

    I’ve spent more time thinking about journalism business models in recent years because I’ve seen that it’s the business of journalism — not our systems of ethics, not our aspirations, not even our outdated practices — that has driven and shaped the changes we’ve experienced. We don’t employ fewer reporters and editors now because we’ve learned to do the same job more efficiently — we do it because that’s all we can afford, and if that means degrading the product, so be it.

    So the question of how we can create fact-checking and fact-linking systems that function at the heart of journalism rather than on the fringes is, to me, an economic question. When I first tried to imagine credibility/trust systems in 2005, I didn’t understand semantic concepts, and so I imagined that linking facts to statements would still be a natural language text task (I like your term “free text” better). And every solution that I imagined was based on funding organizations and tasks out of some unspecified pool of cash.

    So I looked for solutions that not only created persistent value for information, but lend themselves to journalistic functions as well. In other words, I looked for the revenues first, because that’s what will attract investors, and tried to figure out ways to make better journalism an emergent property of such a system.

  2. Dan, thanks very much for this thoughtful reply, and for clarifying all the bits that I didn’t really fully understand.

    I’m with you about the importance of focusing on business models – an area long ignored by newsroom managers. I dived into the idea of structured journalism as much because I think that’s the way people consume information as much as I wanted to see what business advantages a focused, disciplined newsroom brings to the table.

    And I think we agree, too, on the delicate balance between customization and interoperability; too much of the first and we have walled gardens that only talk to themselves, and too much of the latter and we only have basic, self-evident microtags that don’t aggregate into anything valuable (or we take forever to formulate the meaning of life.)

    You’ve helped me also think through another balancing act as well – which is the balance between heavily structured information that aggregates well into defined products, and the value of potential discovery in less-formal structures. I’ll post on that soon.


  3. […] the Semantic Economy; it’s hugely ambitious at some level, and very forward-looking.  (I wrote about it a little while ago, and Dan kindly commented on that post then.)  This more-recent piece […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: