The question comes up as we increasingly turn to algorithms and digital platforms to manage many of the things we used to do offline. Offline, there are any number of laws and practices that regulate behavior. Online – well, that’s a whole new landscape. And one we ought to cover a lot more.
There was an interesting NYT piece from a little while back about start ups that are trying to use data analysis to make lending decisions – not so much looking into people’s credit history, but more throwing together thousands on seemingly unrelated pieces of information to predict borrowing (and repayment) behavior.
No single signal is definitive, but each is a piece in a mosaic, a predictive picture, compiled by collecting an array of information from diverse sources, including household buying habits, bill-paying records and social network connections. It amounts to a digital-age spin on the most basic principle of banking: Know your customer.
Does it make sense that people who capitalize properly are better credit risks than people who don’t? Well, the people who make the software not only don’t care, they’d rather they didn’t even try to understand it.
“It is important to maintain the discipline of not trying to explain too much,” said Max Levchin, chief executive of Affirm (one of the companies profiled). Adding human assumptions, he noted, could introduce bias into the data analysis.
True, it may be great that machine-learning systems that crunch lots of data can find correlations that allow more people more access to credit than under the traditional banking system. But what if it turns out that it’s denying credit to certain groups – not intentionally, but simply because that’s the way the data correlates?
The danger is that with so much data and so much complexity, an automated system is in control. The software could end up discriminating against certain racial or ethnic groups without being programmed to do so.
But if you wanted to program in less discrimination, how would you do it – and how much less would you want, if it wound up being less effective at channeling money to people who need it and aren’t being served by the existing banking system?
Or what if it isn’t the machine, but just a lot of people who, each acting on their own, wind up exhibiting discriminatory behavior en masse?
Consider this much-cited paper by Harvard Business School professors Benjamin Edelman and Michael Luca, who showed that non-Black hosts in New York City manage to charge, on average, 12 percent more than Black hosts on Airbnb, after correcting for location, ratings, quality and so on. In that case it wasn’t so much an algorithm determining prices as it was lots of individuals declining to pay Blacks as much as non-Blacks for similar accommodations. Is that Airbnb’s fault?
“African Americans have a lower acceptance rate than white folks in Airbnb,” Nikki Silvestri, executive director at Green For All, told her audience at SXSW Eco in October. “On Uber too, people were canceling rides if they got drivers whose appearance they didn’t like.”
(Some riders feel it may work the other way, too.)
Professors Edelman and Luca suggest that Airbnb could help cut down such discrimination by not posting photos of prospective hosts; but it isn’t clear that it’s the company’s job to police – or discourage – deplorable human behavior.
In any case, as this Fortune piece notes, there are so many proxies for race – and other factors – that simply eliminating photos may not be enough.
With only a tiny bit of data—the lowly ZIP code—it’s possible for marketers to infer a world of information about any given U.S. consumer.
With a ZIP code, a marketer can make a reasonable guess at a person’s income. With tools such as Prizm and Esri, they can probe deeper to determine education level and family composition, lifestyle and spending patterns, even hopes and dreams.
Even names can yield too much information, as a 2013 study of searches on Google showed. Searches that included “black-sounding” first names, such as DeShawn, Darnell and Jermaine, generated ads that suggested an arrest record significantly more often than searches for “white-sounding” names.
Are online ads suggesting the existence of an arrest record misleading if no one having the name has an arrest record? Assume the ads are free speech; what happens when these ads appear more often for one racial group than another? Not everyone is being equally affected by the free speech. Is that free speech or is it racial discrimination?
Good question. But we can’t even begin to really answer that question until we get a better handle on what the various algorithms in our lives are throwing out. As Nick Diakopoulos noted in a good 2014 paper, it’s not easy reverse-engineering black boxes. Sometimes even their creators don’t really know what’s inside.
And then there’s the broader philosophical question about what society ought to do if it finds the machines are discriminating, or facilitating discrimination. One suggestion is for regular government audits of algorithmic decisions. Regardless of what proposals prevail, given how fast everything moves in the digital age, and how quickly it’s spreading into our lives – will your digital trail affect your job prospects, or the prices you’re offered online? – it’s not too early for journalists to be flagging this as something we ought to discuss.