What’s the difference between data journalism and computer-assisted reporting? Or, for that matter, database journalism (and thanks to whoever stuck my name in there ) or computational journalism? And does it matter?
At some level, it’s one of those semantic things. We kinda know what we’re talking about, and for practitioners of the dark arts of journalism, that’s probably good enough. But there are enough differences, too, given how the world has changed and how journalists (and journalism) approach data that we should care, as a post by Alex Howard at O’Reilly Radar makes clear. He interviews Liliana Bounegru of datadrivenjournalism.net, who talks about how, while there’s a continuum from past practices to the present,
…in the past, investigative reporters would suffer from a poverty of information relating to a question they were trying to answer or an issue that they were trying to address. While this is, of course, still the case, there is also an overwhelming abundance of information that journalists don’t necessarily know what to do with. They don’t know how to get value out of data.
Which is a great way of framing it. It’s still hard to get data, as Liliana notes, and much of journalists’ efforts continue to be tied up in writing FOIA requests or otherwise wrangling access to data. But the tide is turning, at least in broad terms. There’s more and more data available – even when it’s not officially available, people sometimes just create it themselves, even in China – and increasingly journalism has to focus efforts on analyzing, visualizing and presenting conclusions from the data. That’s a broad shift from old-fashioned CAR that both opens up opportunities as well as challenges for newsrooms.
The flood of data now available means newsrooms can’t depend on small, dedicated CAR teams for data analysis anymore; that’s simply not scalable. Instead, data is a game everyone – or most everyone – in the newsroom has to get into. That means lots more training of existing staff or hiring people with data and numeracy skills. And newsrooms, by and large, can’t opt out if they want to stay relevant; the public availability of data (and data analysis tools) means that non-journalists will be crunching numbers and data as well, often in competition with newsrooms. That may be great for the public interest – and there’s no reason why newsrooms shouldn’t collaborate more with their audience – but it doesn’t help the bottom line of news organizations that much.
Similarly, newsrooms don’t have the same kind of monopoly on publication that they once did. Anyone can set up a website; anyone can build a data visualization and post it for the world to see. That’s another area that journalists now face competition in.
On the flip side, there are real potential business models here – whether in creating databases and selling access to them; in offering data gathering or analysis skills as consultants, and so on. That raises other issues – about where the line between journalism and research (or private investigation) lies – but it does move us away from a dependence on only advertising, subscription or grant money.
Regardless, data analysis is already a huge part of our lives, whether we realize it or not. A recent piece in the Sunday New York Times magazine highlighted the work done by statistician Andrew Pole, who works at Target, to try to identify pregnant women in the store’s database of shoppers. The idea was to present them with targeted sales pitches near their delivery dates, so that the store could inculcate new shopping habits in them. The trick was figuring out who was pregnant without asking them.
He ran test after test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.
As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.
If journalism is to be relevant to an age where this level of data analysis is already taking place, we need to be developing the skills and understanding that allow us to gather insights from that abundance of data that’s out there.