Anyone who’s ever faced a thick file of archived material – whether diplomatic cables or personnel records – understands just how much work it can be to dredge through reams of documents to uncover gems.
Trying to power through even a hundred Wikileaked cables must be a daunting prospect, never mind the full 250,000 that will be on offer at some point. But if it’s that hard for us journalists, it’s probably no easier for the State Department officials who were trying to make sense of the world as cables flood in from all over the world.
That’s the point that my friend Jeremy Wagstaff makes in a post on his Loose Wire blog, about data management in diplomatic cables – and Afghanistan. At heart it’s about how information isn’t really structured well in the cables.
Exactly. There are tags – or the equivalent of tags, but built to State Department specs – as well as other sortable fields, such as the originating embassy or date the cable was sent, but not much else. So you can search for cables that are related to China, say, and filter it by those sent from Mexico, and only during certain dates – but it’s much harder to pull together all the bits that are related to reactions to a speech, policy or event, say. (Look here for an explanation of how to read a cable).
So the gist is that there’s a wealth of information there – but it’s hard to pull it all together. We – or they – are drowning in data. As good an argument for more structuring of information as I’ve heard.
That’s not to say that people aren’t digging through and trying to make sense of it all. For example, Jonathan Stray, at the AP, has written about their efforts to visualize 390,000 Iraq war reports – it’s a fascinating project that goes beyond simply mapping incidents or doing keyword searches, but employs social science tools to dissect a huge document dump.
Still, it doesn’t answer the question why the State Department doesn’t rebuild the architecture of its reports so that it’s faster and easier for its analysts to trawl through thousands of cables. As Jeremy notes:
If I were the U.S. government, I would take Cablegate as a wake up call. Not at the affrontery of this humiliation, but as a chance to rethink how its data is being gathered and made use of. Cablegate tells us that the world of the cable is over.
And so it should be for any organization that deals with large masses of information – not just media outlets.