I was talking – more precisely, listening – to some smart colleagues discussing data visualization the other day, and something one of them said struck me. He was talking about how visualizations could help users clear through complexity and a flood of information – but that depended on a level of trust. Not just in the quality of the data that’s being visualized, but how it’s selected, analyzed and ultimately presented.
Perhaps that’s self-evident – and to some degree it is – but it’s an important issue that needs to be addressed as data becomes more and more part of the daily work of journalism.
We’re very used to the notion of disclosing – where we know them – the biases of the people we interview and quote. Readers, we hope, have a fairly well-attuned sense of what statements seem right, which are self-serving, and what appears to be pure hyperbole. (OK, so maybe I’m expecting a lot here – but certainly these are skills that we use in everyday life, whether sizing up the used-car salesman or figuring out if your teenaged son really did his homework.)
Data is different. We’re much less used to examining the fundamental biases of the data we work with. True, good data journalists – and there are a lot of them out there – do this as a matter of course, and they filter out a lot of the bad stuff before it even makes it into a story. But readers as a whole have less experience in questioning how data is collected and assembled and what the basic assumptions are that go into making databases.
As Susan McGregor, a professor at Columbia University’s Tow Center for Digital Journalism notes in a video interview:
Using data is often like using responses to an interview that someone else wrote. You don’t know necessarily what the biases or objectives were that went into collecting a certain set of data. It’s up to you as the journalist to research that, find out what the implications are, why were these particular questions asked, what did the answers really mean.
To say that a number is “true” is the same as saying that quoting someone is “true.” But we know not to quote out of context.
(The video is from a series that NPR’s Lam Thuy Vo made for a course she’s teaching at UMass Amherst; there’s one of me as well. I’m sure my mother will watch it, doubling the page views.)
Here’s a real example – in a very smart story, Sasha Chavkin at CJR takes apart the contradictory numbers about ad spending in this political season. It isn’t an investigative piece in the sense that it uncovers wrongdoing; but it does dissect in detail how the data on ad buys is collected, and shows how journalists often don’t look beyond the headline number.
Which is a longish way of saying there are two somewhat contradictory forces at play here. At one level – re my colleague’s comment – it’s important that, as we turn more and more to algorithms and visualizations to help us understand the world, we need to invest in them a level of trust that their inner workings make sense and aren’t biased (or broken.) But, per Susan’s comments – it’s also important that we keep a reasonably high level of skepticism about the data that we’re using, and work to educate readers about where the flaws in the numbers are, in the same way that we point out the biases in the people we interview.
Perhaps that lowers the overall faith in the stories we create – but in the longer term, one hopes, it gives people greater confidence in, and more understanding of, the data that increasingly pervades our life.