Jonathan Stray of the AP has a fascinating post on how it visualized a massive document dump of the Iraq warlogs. It looks like this:
Maybe that’s not immediately clear to you what this is or says, but that’s part of the point. As Jonathan notes:
Visualization is metaphor. Certain details are thrown away, other are emphasized. The algorithms used to produce the visualization have their own sensitivities and blind spots. Without understanding these, a viewer will make false inferences.
It’s complicated, but very clearly explained, and I highly recommend putting in the effort to follow the technical details of cosine-similarity on TF-IDF vectors and other such things. Whether you care about Iraq military action reports or not, what emerges quickly is the sense that the reporter’s toolkit is getting bigger by the minute.
We learned some things about the Iraq war. That’s one sense in which our experiment was a success; the other valuable lesson is that there are a boatload of research-grade visual analytics techniques just waiting to be applied to journalism.
The question is, will we use them? And just as importantly, how will we use them? As I’ve noted before, it’s hard enough sometimes just to get reporters to embrace math; how are we going to get newsrooms full of people who skilled in statistical analysis?
The short answer is that we probably won’t. That, in itself, may not be a terrible thing. Newsrooms have always specialized, with one guy in the corner that’s great at picking apart company reports and someone else who can charm their way into an interview with anyone and yet another person who’s a dogged reporter but can’t write, and so on. We don’t expect everyone to be an expert on everything. And plain shoe-leather reporting is still a killer app.
But we are in a world where data is increasingly ubiquitous. So we need to be able to understand and analyze it if we are to bring more value to information. And if we’re in a world of smaller and smaller newsrooms – as we certainly seem to be – then the flexibility to specialize also shrinks.
There is help on the way. Jonathan and the AP – and others – are aiming to build an open-source system for journalists to be able to visualize large documents, and they’ve applied for a Knight News Challenge grant for it. (Go there and vote for it). And tools like Google Refine make it easier for non-specialists to get going.
But it takes more than tools. As Jonathan’s explanation makes clear, you have to really understand the tools you’re using, even if they’re free and freely available. That takes real effort and understanding, and adds to the already packed set of skills and knowledge journalists are expected to have.
How will we manage? How much does journalism education – both the formal, university-based kind, and shorter skills-based seminars – have to evolve to get us trained to function well in this new landscape? What collaborative networks can be formed for specialists to move from project to project, working with newsrooms that don’t have the technical skills and not enough time or money to acquire them?
Because it’s clear we can do much, much more. But first we have to master the tools.