Where does data come from? Governments, of course. Official institutions like the World Bank. Companies that vacuum it up and sell it.
But increasingly, it comes from people, too.
Consider the Adjunct Project, which invites “contingent faculty” at universities around the country to submit details of what they get paid for each 3-credit course they teach to a database (actually, a Google spreadsheet). It asks for details like whether the school is unionized or if benefits are included, and seems to have 1,500 entries so far.
True, the data isn’t particularly clean; and there’s no real verification of the information or the contributors. And certainly crowdsourcing isn’t all that novel an idea.
But it’s a huge step forward to gather information in a structured format so it can be much more easily analyzed. And it’s basically free – as opposed to having some university (or news organization) pay for a costly survey of adjunct rates.
That’s sort of the theme of a fascinating piece by Javaun Moradi at NPR about the potential of crowd-sourced data for journalism organizations, with an emphasis on automated collection of sensor data rather than crowd-contributed data. But the core idea is that citizen-sourced data is a great opportunity for news organizations.
If stage 1 of data journalism was “find and scrape data” , then stage 2 was “ask government agencies to release data” in easy to use formats. Stage 3 is going to be “make your own data”, and those sources of data are going to be automated and updated in real-time.
He writes about a site called Pachube (pronounced “PATCH bay”), which is trying to get citizens to collect data on the local quality of air by building an “open air quality sensor network.” The idea is for them to set up sensors around the country which would feed data into a central network.
Look outside your window — have you ever wondered what the quality of the air is out there? I mean RIGHT. OUT. THERE. 12 inches from your face. If so, you are out of luck. The air quality data collected by the government is likely sampled from far, far away and then applied to you on a regional level, almost completely useless from the standpoint of trying to understand or change the local dynamics of pollution that affect you. Not good.
Pachube’s goal is to be a center for such sensor data – after all, data is only valuable if there’s enough of it. And to some extent, that requires energizing people to want to collect information on a specific topic, regularly, and in a particular format. You’d have to worry about the quality of the data, but the possibilities are enormous.
Especially in less-open societies, where official data can be hard to get. The New York Times reported in January about a bunch of ordinary Chinese citizens who, fed up with government stalling on the pea-soup that passes for air in Beijing, banded together to get their own air quality monitoring equipment. And then posting the readings.
That began a chain reaction. Volunteers in Shanghai and Guangzhou purchased monitors in December, followed by citizens in Wenzhou, who are selling oranges to finance their device. Wenzhou donated $50 to volunteers in Wuhan, 140 miles inland.
In Beijing, air quality has been a huge issue for ages; but it took the release of this data to push authorities to do something about it. Which goes to show that data doesn’t have to be an official thing, or that you need large organizations to amass it. Or any formal organization.
But it also shows the importance of setting common standards so that people can easily contribute to a single dataset and use the numbers in a consistent way. And it shows the enormous impact simply collecting and disseminating numbers can have; especially when the government can’t really dispute the data – another great advantage of crowdsourced data.
What else can communities collect and release?