In the spirit of meta-meta structure, some thoughts on the various levels of structured journalism, and next steps.
At the basic level, it’s simply setting up forms to force/allow people to file information in a structured way to listings – so getting restaurants to give information about location, promotions, etc in a way that just goes straight into a database, rather than having to dig it out from a lot of free text. This already happens on some scale, we should force it to happen even more. Newspapers could probably leverage the desire for people to be in print to get it adopted on a wider scale – executive appointments, events listings, contributed photos, etc. Imagine if executive appointments – so-and-so has just been named VP of sales at such-and-such company – were entered directly into a database that powered not only the story in the paper but also helped drive the paper’s recruitment business.
Next step up is to get reporters to write summaries of stories in a new data field, so that when people enter search terms they see story summaries rather than the first few grafs. Again, some places do this – but often it’s done by online editors rather than journalists; it should be become a standard practice.
Then get reporters to file two versions of each story – one for the next day’s reading, and the other for reading six months later – it shouldn’t be hard; mostly it’s adding some context and rewriting some date references. All this would add is about 5 or 10 minutes of work per story – not a big price for improving the archive reading experience.
If there was technology to tag date fields so they could be changed quasi-automatically, that would be another step forward. But it’s just as easy to get reporters to do it.
The next step is bigger: It’s building a self-contained product that is built entirely in structured journalism fields, ala Politifact. Everyone who writes for that product would write only – or largely – in that manner, and the product would be designed as a purpose-built vehicle. The South China Morning Post‘s racing section online could be redone that way, or perhaps the Wall Street Journal’s Heard on the Street. The trick here is to figure out what readers might want – is it racing tips that can be sorted by jockey, weather conditions, etc – and then building the newsroom process that feeds that information into a database. That doesn’t conflict with writing great stories about racing – but it does mean that the database is the primary beast that needs to be fed. These products could be standalones that are profitable; they could be proof-of-concept projects; or they could just be relatively low-cost things that bring value to readers. Because they’re purpose-built, they probably wouldn’t talk to other data structures well.
A big psychological step ahead would be to get additional structured data from reporters as they wrote stories – say relationship data – and then use that to create an entirely different product, such as WhoRunsHK. This wouldn’t require building an entire taxonomy around all kinds of stories, but simply finding some common, useful data set out of a large subset of all stories, and mining it. Another way of thinking of this is that it’s a way of extracting more out of reporters’ heads while the information is fresh.
Another area that could be explored is looking at classes of stories – disaster stories, market reports, sports coverage – that are to some extent heavily templated, and build data fields common to that kind of story. In disaster stories, how many people dead, how many missing, how much property damage, etc. Then when the AP or whoever files, they should be filling in those fields separately – or even just filing updates to those fields – so that the story can simply update those numbers easily, without the need for a full writethrough.
The biggest step is to work out broad taxonomies that cover huge ranges of stories – say fields such as summary, nut graf, new information, background, key dates, key people, key entities, etc – and then finding ways to build applications that would leverage that data in fresh and interesting ways. The holy grail there would be an application that can essentially rebuild a new story out of disparate story elements, addressing a reader’s specific interest rather than what the various writers tried to do.
OK, that last one may be a tad too hard. But at least it would provide stronger search results and more easily understood information.