Some quick thoughts, prompted by trawling through sites like Open Secrets and WhoRunsGov for the HKU talk, about data-based sites and what counts as competitive advantage in this day and age. Two things come to mind: One, building any kind of business off public records alone is a recipe for disaster – even marrying two sets of public information together isn’t much of a barrier to competition. Two, the real value in databases is in relationships, not in data fields.
To take them one at a time:
Providing cleaned-up data on campaign contributions, for example, is a great public service. Building nice visualizations around it helps people maximize use of the database, and letting it link to other products, such as Poligraft, adds to its utility. But can you make money from this? That’s a tougher challenge. The problem is that there’s very little competitive advantage here: The records are public, so anyone else can access them. It’s true that the data needs to be cleaned up, and there’s value in doing that, but the fundamental problem is that lots of other people can do that as well. Non-profits – or the government body that provides the data – can get better at giving it out for free, which makes it hard for you to charge for the service. And even if they don’t, other companies can duplicate your processes or database and undercut you. Perhaps it’s not surprising that both Open Secrets and Poligraft are funded by the Sunlight Foundation, a non-profit whose mission is to bring more transparency to government.
For a database to have value, it needs some kind of secret sauce: Something that you can provide that other people can’t. It doesn’t have to truly secret stuff; it just needs to be something that isn’t easily or publicly available, so you have an edge in providing it. It could be something as simple as a rating you give it – such as Politifact’s Truth-o-Meter readings. It could be some kind of index of crime, say, that you make up. It could be relationship information – spouses, children – that, while widely known, isn’t something that can be scraped off a database or website easily.
Rushing to put out publicly-available data, even with a great visualization, doesn’t really build competitive advantage. You have to add something to it that helps you leverage the database effect* – so that the product not only builds value every day, but it gets harder for someone to match you with each passing day.
*database effect: A term I made up. It’s like the network effect – networks increase in value as the number of participants grow – except with data. A database with one piece of information in it has little value. But each new item of information filed to it increases the overall value of the database, and increases the cost for a competitor to duplicate it.
Which raises the second point: The real value in many databases isn’t in the data entries themselves, but in the relationships between the entries. In other words, it’s not so much that China Vitae has a hard-to-find collection of senior Chines officials’ bios, although that’s not bad by itself. Its real value should be in how the people in the database relate to each other. Does official A know official B? How close are they? Did they work in the same place? Are they related to each other, by marriage or other kinship ties? And so on. The same applies to WhoRunsGov – how do congressmen and their staff know each other? How do voting records indicate affinity? How do shared campaign funders indicate likelihood of similar voting patterns? And so on.
Without strong relationship data in a database – and some way of effectively visualizing/surfacing that information – then databases become lists, and pages of lists. Do you want to see so-and-so’s CV? Here it is? Bank XYZ’s use of tarp funds? Click here. That’s all well and good – and there are times such information is really valuable. But it underutilizes the real power of databases, which is to bring out patterns and enhance understanding of large sets of information. At the very least, databases should be allowing users to compare information in some useful way.
But to do this well means thinking through the potential relationships that are important first, so that the data structure/taxonomy is created properly from the start; and it also means designing the interface/product so that those relationships are easily surfaced.
Different databases and different countries will have different things that are important. In China, it may be where people worked together; or it may be where an official’s proteges are that indicates true power or influence. In Washington, it may be how well congressional staff know each other; or it may be where funding comes from. In a database of school results, it may matter who the teacher is, or it may be the socio-economic background of the students. I have no idea – but this is where specialist – and journalist – knowledge is useful. We should have a good sense of what matters and what doesn’t, and look to build databases that show key relationships, including inventing and assigning our own indexes/rankings/scores to some of these relationships.
That helps users – and it helps us. Finding, surfacing and updating those relationships – and especially if they don’t come from a public source that can be updated easily – gives us a secret source in our databases that makes them hard to copy. Which is the first step towards finding a way to get value – meaning money – out of it.