Posted by: structureofnews | March 17, 2012

The Data of Crowds

Where does data come from?  Governments, of course.  Official institutions like the World Bank.  Companies that vacuum it up and sell it.

But increasingly, it comes from people, too.

Consider the Adjunct Project, which invites “contingent faculty” at universities around the country to submit details of what they get paid for each 3-credit course they teach to a database (actually, a Google spreadsheet).  It asks for details like whether the school is unionized or if benefits are included, and seems to have 1,500 entries so far.

True, the data isn’t particularly clean; and there’s no real verification of the information or the contributors.  And certainly crowdsourcing isn’t all that novel an idea.

But it’s a huge step forward to gather information in a structured format so it can be much more easily analyzed.  And it’s basically free – as opposed to having some university (or news organization) pay for a costly survey of adjunct rates.

That’s sort of the theme of a fascinating piece by Javaun Moradi at NPR about the potential of crowd-sourced data for journalism organizations, with an emphasis on automated collection of sensor data rather than crowd-contributed data.  But the core idea is that citizen-sourced data is a great opportunity for news organizations.

If stage 1 of data journalism was “find and scrape data” , then stage 2 was “ask government agencies to release data” in easy to use formats. Stage 3 is going to be “make your own data”, and those sources of data are going to be automated and updated in real-time.

He writes about a site called Pachube (pronounced “PATCH bay”), which is trying to get citizens to collect data on the local quality of air by building an “open air quality sensor network.”   The idea is for them to set up sensors around the country which would feed data into a central network.

Look outside your window — have you ever wondered what the quality of the air is out there? I mean RIGHT. OUT. THERE. 12 inches from your face. If so, you are out of luck. The air quality data collected by the government is likely sampled from far, far away and then applied to you on a regional level, almost completely useless from the standpoint of trying to understand or change the local dynamics of pollution that affect you. Not good.

Pachube’s goal is to be a center for such sensor data – after all, data is only valuable if there’s enough of it.   And to some extent, that requires energizing people to want to collect information on a specific topic, regularly, and in a particular format.  You’d have to worry about the quality of the data, but the possibilities are enormous.

Especially in less-open societies, where official data can be hard to get.  The New York Times reported in January about a bunch of ordinary Chinese citizens who, fed up with government stalling on the pea-soup that passes for air in Beijing, banded together to get their own air quality monitoring equipment.  And then posting the readings.

That began a chain reaction. Volunteers in Shanghai and Guangzhou purchased monitors in December, followed by citizens in Wenzhou, who are selling oranges to finance their device. Wenzhou donated $50 to volunteers in Wuhan, 140 miles inland.

In Beijing, air quality has been a huge issue for ages; but it took the release of this data to push authorities to do something about it. Which goes to show that data doesn’t have to be an official thing, or that you need large organizations to amass it.  Or any formal organization.

But it also shows the importance of setting common standards so that people can easily contribute to a single dataset and use the numbers in a consistent way.   And it shows the enormous impact simply collecting and disseminating numbers can have; especially when the government can’t really dispute the data – another great advantage of crowdsourced data.

What else can communities collect and release?


  1. This is a really thoughtful post Reg! I’ve thought about disaster response — citizens put together an ad hoc geiger counter network after Fukushima — but hadn’t considered the applicability to closed societies.

    There’s also the promise of greater information awareness to help citizens make better choices. There’s an NYC project called “Don’t Flush Me” which uses inexpensive proximity sensor to monitor when storm sewers are about to overflow. The sensor will send data to the cloud and when an “overflow” situation is about to occur, citizens can avoid flushing their toilets to prevent sewage from flowing into the harbor:

    I learned about this one via AJ Fisher’s piece on “Sensor Commons”

    • Javauan, I’d read the NYT piece a while back, but hadn’t thought through all the possibilities until I saw your post – sensors are great as a crowd-sourced source of data, since the information from them is more difficult to fake (I think), but I suspect crowd-entered data may have a real role, too, especially if there’s a big enough stream of data that can provide some basis for cross-checking and verification.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: