More examples of what you can find when you dig through the tons of data we throw off each day:
The New York Times recently published an interesting piece about how researchers are trawling through the millions of data points on online dating sites to better understand, well, love. As the piece notes, people will tell you one thing – but how they actually behave is far more interesting. Not to mention far more authentic.
Andrew T. Fiore, a data scientist at Facebook and a former visiting assistant professor at Michigan State University, said that unlike laboratory studies, “online dating provides an ecologically valid or true-to-life context for examining the risks, uncertainties and rewards of initiating real relationships with real people at an unprecedented scale.”
“As more and more of life happens online, it’s less and less the case that online is a vacuum,” he added. “It is life.”
And that’s the point about mining “data exhaust,” broadly. There’s lots of information we don’t intend to tell researchers, but digital trails are the byproduct of modern living – and because it’s digitized, it’s that much easier to obtain and analyze.
Researchers of online dating information, for example, how figured out what daters are more likely to lie about, what kind of men women are really looking for (and vice versa) and how often someone will date outside their race. And much more.
Such deep dives into big data aren’t simply confined to matters of the heart, of course; they can also go into the spine and other parts of the body, as The Wall Street Journal’s great series on Medicare abuse documented last year. Digging through a huge trove of billings data allowed the paper to find anomalies they could dig into, uncovering cases of fraud.
The potential for finding gems in just haystacks of data is clear; but it does need people with skills and a desire to find them.
But a central problem is that Medicare hasn’t fully exploited its most valuable resource: its claims database, a computerized record of every claim submitted and every dollar paid out.
“That’s really the crux of the issue,” said Kimberly Brandt, who led Medicare’s antifraud efforts from 2004 through June of this year. She said the program is “definitely on the right path” to making better use of its database, “but it’s not going to be a flip of the switch or an easy transition.”
And, of course, there’s also the flip side of all that data – often public information – out there and how it can be analyzed to reveal things that perhaps shouldn’t be revealed. Another NYT story shows how smart algorithms can surface information we don’t particularly want surfaced – in this case, the identities of a blogger’s children. It focuses on how Klout, a site that figures out how influential you are on the social web, created a page for a blogger Maggie Leifer McGary’s 13-year-old son simply because she was his Facebook friend.
The Klout kerfuffle is a parable of what can happen when you have an active digital social life. Not only do you leave your own digital footprints everywhere, but you can also drag your online friends with you from site to site, even if they have no interest in going there.
There’s almost certainly a debate coming about how best to regulate or deal with a world where so much can be known about us. Who knows how that discussion will play out, but in the meantime, we’re already living that brave new world.