Who do you think you are?
I was looking at an interesting book, Whistling Vivaldi: How Stereotypes Affect Us and What We Can Do, by Claude Steele last night, and was struck by some experiments he describes that show how wildly variable test results can be, depending on how they’re described to the people who are taking them. And that should be a reminder about really understanding all the biases and factors that go into creating data.
White students at Princeton who who were told their scores on a game of miniature golf reflected their natural athletic ability did much worse then those who were just told to play a round; Black students in the same experiment showed no such variation. On the other hand, when the Black subjects were told the game was a test of “sports strategic intelligence,” they did much worse that those who weren’t told anything.
What’s going on here? Steele makes the case that we internalize stereotypes – so White students are unconsciously living up to the gross generalization that they don’t have tremendous natural physical ability, and the Black students are similarly hobbled by an internalized view that they’re not great at strategic thinking.
It’s an interesting theory – and I confess I haven’t finished the book, so I don’t know much more than that – but the experimental results are remarkable. He cites another study where women do better on a math test when they’re told it doesn’t reflect gender differences than when they simply take the test – again, evidence that our internal perceptions of ourselves, and about what we are and aren’t supposed to be good at, affect our actual performance.
Which is really fascinating stuff, and throws up all sorts of questions of identity, stereotype and self-stigmatization. But what has this got to do with data journalism?
Well, OK, not much. At least, not much directly. But it does point to the need to better understand all the biases in the data we collect and use. All data has bias, of course – in terms of what it’s intended to measure, how it was collected and categorized, what it does and doesn’t cover, and so on. And good journalists recognize that, and adjust for it, in the same way that good journalists understand the biases of their sources and adjust for those, too.
What these experiments highlight is how many more things could bias the data we get that aren’t visible at all. We may have girls’ and boys’ test scores, for example, and even study the questions in detail – but it’s hard to know, unless it’s documented somewhere, or part of an experiment, what was said to the test-takers, or how the test was described to them. (Other researchers have also cataloged the impact of “priming” people with images, words or numbers before tests, but there’s clearly lots we don’t know about what factors affect test results. And that’s just tests – what about the impact of extraneous factors on other types of data-gathering?)
Of course, that’s not to say we should throw all – or any – data, or data journalism, out. It’s a key part of the modern world, and hence should be a key part of modern journalism. But this is a lesson in making sure we understand as much as we can about how any piece of information is collected – even the parts of it that aren’t readily apparent – so that we can come to smarter, and more nuanced findings.