Posted by: structureofnews | December 23, 2010


A nice, Jeff Jarvis-like title that conjures up images of goose-stepping Nazis trampling on internet freedoms (or enforcing them – whatever).  But this post isn’t about powers of the state, or freedom to publish, or anything like that.  It’s about how databases are really only valuable when they’re complete – cover a universe totally – and are up-to-date.  But it does start with Jeff and Germany.

Last month, Jeff took Germany to task for pushing Google to allow homeowners to opt-out of the company’s Street View photos of their homes.   Those who don’t want photos of their buildings shown can ask for them to be pixelated, and Jeff’s not happy about it.

It is more offensive than I had imagined, a desecration of the public demanded and abetted by German politicians and media on a supposed privacy frenzy.

His point is that you can see the building on the street, and you can even take a photo of it for your album,  But it can’t be shown in Google Street View.  Germany, he says,  “has stolen from the public.”

This is an issue of publicness. These are public visions now obscured. This is why I am writing a book about protecting the public, from assaults such as this. I can’t write it fast enough.

Needless to say, this got a lot of discussion going – a lot of it very intelligent and nuanced, about the difference between private and public ownership of information, about availability of information and ease of use and access, and the looming issues of public and private privacy.

A couple of days ago Ron Rosenbaum jumped into the fray with a scathing criticism of Jeff’s argument; while I don’t want to get into the middle of that catfight (and in any case, it looks like there’s a growing cottage industry in that business, including this earlier attack), he makes an interesting point about the principles behind a product like Street View.

In the case of Germany, a global corporation is—or was, before the opt-out was allowed—trying to monetize an individual’s privacy, a monetization that is worth more if the company can claim absolute total, or totalitarian completeness.

I’m not sure it’s all that sinister – but it’s certainly true that “completeness” adds a great deal of value to any database.  In fact, I could make the case that it’s fairly basic to creating any real value: Google Maps wouldn’t be worth much if you couldn’t be sure the maps were reasonably complete.  True, there are countries it doesn’t cover – but you know which ones, so those are universes that aren’t included in the “completeness” you expect.  What you do expect is not to come across streets that Google omitted to include because of some random glitch.

And so it is for any newsroom database project – better to be smaller, and more complete, than larger and with unpredictable gaps in it.  In other words, if you have one government department’s budget in detail, and with no omissions, it’s much better to build something based on that, than trying to reach for the entire government budget and not be sure what is and isn’t in it. At least if you’re trying to charge for it.

Similarly, databases are only good as long as the information in them is up-to-date.  Otherwise, it’s functionally incomplete, with the equivalent of random gaps in it.  Or like a Google Map that hasn’t been refreshed after major road works.

So this may seem self-evident, but it bears repeating.  Especially since it’s hard work to maintain databases over time, and even more so after prize season is over.   Unless newsrooms get drafted in to keep inputting information – even if it’s just a couple of minutes a day – it’s hard for any dedicated research or projects team to keep it going for any length of time.

True, you can build databases that just scrape or pull information from other, public sources.  But there’s limited monetary value in that in the long run; you can’t build a competitive advantage off what someone else can do just as easily.  I believe you need your own “secret sauce” to give products that edge, and usually that means some human involvement (although, presumably, it could also be some proprietary algorithm. )

Which is another way of saying – figuring out, in advance, how to keep a database going is probably as important as building it in the first place.  And that means figuring out how to define the universe it covers so that the database is complete, and can stay that way.

Otherwise it may be a fine database or site to explore – and possibly even yield great stories – but it wouldn’t be something you want to depend on to navigate from one part of town to the other, and that’s critical if you want to charge people real money for it.



  1. Who cares what Jeff Jarvis thinks? I mean wtf, is this guy Aristotle? No – But look, he’s got loads of twitter followers so he must be relevant….Bullcrap.

    These are populist thought leaders and we need to take a step back.

    • Well, I wasn’t commenting on Jarvis as much as on Rosenbaum’s observation that the value of Google Street View lies in its completeness – or expectation of completeness – of data. If enough people opt out, that diminishes its monetary worth as a product. Not that all that many people have, in the grand scheme of things.

  2. @Anon – Although I would agree that Jarvis can be annoying, it should also be said that sometimes he has the ability to provide a unique view we wouldn’t be aware of via the usual conventional wisdom. I’ve found it helpful in reading Jarvis to keep in mind the following points:

    Jarvis’ journalism background, Entertainment Weekly and earlier at TV Guide, is from a subset of news which has always had a somewhat different perspective. In the odd symbiosis of paparazzi-driven Planet Hollywood, for example, normal rules of privacy don’t apply. It’s also why we’ve wound up with public figures such as Paris Hilton or the Kardashian sisters, who, in some bizarre form of circular logic, have become famous because they’re famous.

    As the author of What Would Google Do? it’s also evident that Jarvis has had something of a difficult time defending Google when the company has veered from the “do no evil” mantra. This has also been an issue when it comes to the topic, for example, of to what extent the Google tail tends to wag the Internet dog.

    Still, while keeping that back-story in mind, it should be said that the comments on BuzzMachine often raise the level of conversation, and that Jarvis has been open to criticism which doesn’t wander into the area of personal attack.

    @Reg – It strikes me that in this and previous posts about databases, there are a some elements which have gone missing:

    The first, at least it seems to me, is that the type of data collected for the use of a newsroom should be different from what it would be for the purpose of marketing because the goals are different. One is about selling stuff; the other is about accuracy and connecting the dots. By accuracy I mean that we are able to make a distinction, semantic or otherwise, between the “Name: Bob Jones” who works as a carpenter in Chicago versus the “Name: Bob Jones” who sells used cars in Denver. From a marketing standpoint, the demographics of the two Bobs may, with the exception of locale, be virtually identical and that’s all that matters; from a news standpoint, the difference between the two Bobs can be a world apart.

    One of the things Jarvis tends to ignore as an advocate of “publicy” is the extent to which not everyone agrees with him. That some people, in the words of the Billie Holiday song, think some things “ain’t nobody’s business.” And under prior circumstances, where on the Internet no one really knows if you’re a dog after all, such a split of viewpoints hasn’t much mattered. But now it does.

    At least part of the business model for current social media such as Facebook is based on the idea that the sharing of personal data can be monetized and sold to third parties. Beyond the notion that this can be a good or bad thing depending on transparency, there is also, at least it seems to me, an insidious trend that those who don’t choose to share personal data are somehow anti-social. As if there’s an implied stigma.

    My point here is not to make a judgment about the rightness or wrongness of such a social media trend but instead to illustrate a pragmatic consideration for a newsroom wanting to get a database down to the level of making a two-Bobs fine-grain distinction.

    And the consideration is this: If you make a requirement for revealing personal data too obtrusive, you’re liable to wind up with dirty data.

    • Perry,

      Thanks for the thoughtful comments. I agree that the comment threads on Buzzmachine can actually be quite good, and are often more enlightening than the original post; but it would be much more helpful (and less annoying) if he would respond more directly to some of the questions raised.

      On the points about databases: You’re right, there are two different goals here, although I guess I’m making the distinction between the Google Street View product (which needs to be accurate, etc., in most details) and the business model of Street View (which is about aggregating demographic and behavioral data to sell targeted advertising.) I’m much less interested in the latter, mostly because I don’t think ad revenue will be a huge driver for news organizations; I think the trick for us is to build out robust data-driven information products off daily newsroom activities that we can charge for (and get ad revenue for as well.) And that does involve – or should involve – accuracy and connecting the dots.

      The site we just launched at the South China Morning Post, whorunsHK, is an example of accuracy and surfacing relationships or key people in Hong Kong that allow you to connect dots; we don’t charge for it yet, but in theory could in the future. In that case, the difference between the two Bob Jones is indeed critical. One of the key questions I think we’ll have over sites/products like this is completeness and up-to-date-ness, as I noted in the post. It’s (relatively) easy to get them going; hard to keep them going, at least on a cost-effective basis.

      The other issue you raise, about privacy and publicness, is an important one for journalism. To a large extent, what journalism does is take what’s private and make it public. It used to be very hard to do (it still is, but it’s easier now) and involve a lot of effort and (for better or worse) journalistic judgment. Now it can be done much faster, and with much less thought. Taking real estate records, say, or even the list (and photos) of people the cops booked the night before, and putting them online – even assuming 100% accuracy in those public records – significantly changes the dynamic of what’s public and what’s private. That’s despite the fact that, in theory, all that information was public before.

      Clearly increasing the degree of ease of access to information at some points tips it into a change of kind of access, and we’re just starting to come to grips with that – not just journalists, but society as a whole. Hence Germany and opting out of Street View.

      Much of the debate has centered around the power of the state and large corporations to mine and abuse that information. As journalists get better at doing this, we’ll also fall into the crosshairs. That may be a good thing, or it may not. But it’ll certainly be lively.

      I’m not sure I actually addressed your points, but it did make me think more about this.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: