Posted by: structureofnews | November 24, 2010

Data Wrangling Gets Easier

Google has a new tool – Google Refine 2.0 – which it describes as “a power tool for data wranglers.”   It’s based on Freebase Gridworks, which came along when they acquired Metaweb.

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and have used it) and we are very excited by what they and others will be able to do with this new release.

It looks pretty funky, and seems reasonably straightforward to use. I haven’t tried it yet, but  based on the instructional videos that have been posted, some stuff – basic data cleaning in a spreadsheet format – looks very simple and intuitive; the more complicated work – like linking fields across databases – looks, well, more complicated.  But not show-stopping difficult.

Overall, it looks like a great addition to any journalist’s toolkit, but it’s not like it’s a plug-and-play solution: You have to be able to visualize what you want and think through how you’re going to get there.  It’s not a magical thing that does all the work for you.  But it makes all the steps along the way easier.  Plus: It’s free.

It’s tools like this that can help spread the use and understanding of data in newsrooms; the easier it is to get to an end point, the more people are willing to invest some effort in it.   (One reason why getting journalists familiar with Excel, so they use it as instinctively as Word, is important.)

Tim Berners-Lee, in a recent speech, talked about how he sees data as the future of journalism.

“Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way some times.  But now it’s also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country.”

I hope chatting in bars never goes away, but I do think we’ll be spending more time thinking about data and figuring out what it means.  The simpler the tools are to do that, the more we’ll do it – and the better the results will be.   So here’s hoping Google Refine moves us along that path.


  1. […] clippings.  Emil came a couple of years after that.  Moving from that pace to suddenly having Google Refine, crowdsourced maps, Storify, twitter communities and much more is a tad […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: