Posted by: structureofnews | June 18, 2017

Unpicking The Algorithm

Just wanted to flag an Upshot story that ran in the New York Times the other day, looking into the algorithm that the Chicago Police Department uses to predict who is most likely to be involved in a shooting (whether as the shooter or as the victim.)

As I’ve mentioned before – and Nick Diakopoulos has campaigned about – we ought to be doing more to cover the alogorithms that rule increasing parts of our lives, so it’s great when there’s a piece that does exactly that.

The story, by Jeff Asher and Rob Arthur, takes as a starting point the limited public information about the algorithm – known as the Strategic Subject List – that’s available, then tries to figure out how it works.

But using the publicly available data that (the CPD) have released, we reverse-engineered the impact of each characteristic on the final risk scores with a linear regression model. Because the department didn’t release all the information that the algorithm uses, our estimates of the significance of each characteristic are only approximate. But using what was available to us, we could predict risk score very accurately, suggesting that we are capturing much of the important information that goes into the algorithm.

It’s a nice piece of work that helps shed light on what’s almost certainly an important policy and policing tool in Chicago.  It isn’t clear if the algorithm works well or not – gun violence remains a problem – but just being able to show what factors are taken into account is already an important public service.

In particular, victims of assault and battery or shootings were much more likely to be involved in future shootings. Arrests for domestic violence, weapons or drugs were much less predictive. Gang affiliation, which applied to 16.3 percent of people on the list, had barely any impact on the risk score.

The algorithm has been updated several times, and (Illinois Institute of Technology lead researcher on the project Miles) Wernick noted that the variables of gang affiliation and narcotics arrests were dropped from the most recent version.

There’s nothing wrong in theory, of course, with using an algorithm such as this one to help prioritize the use of limited resources – or even to take human bias out of decision-making, so we shouldn’t be approaching these stories with a bias that algorithmic decision-making is a bad thing.

But while we can talk to humans about why they made certain decisions, it can be both much harder and much easier to do so with machines.  Easier because – unless it’s a machine-learning algorithm that’s a black box – there’s generally a codified path of logic we can follow to see how a particular decision was arrived at.  And harder because that codified path of logic is rarely disclosed.

Hence the need to better understand how they work, if not exactly, then at least with a sense of what kinds of variables it takes into account.

Or, as the ACLU’s director of police practices in Illinois, Karen Sheley, was quoted in the Chicago Sun-Times saying,

“If the government is going to outsource decision-making to a computer, the public should be able to examine how the decisions are made and whether it’s fair and effective.”

So far, the CPD hasn’t been tremendously forthcoming with those details, although they did release a list with names redacted that the Upshot used to reverse-engineer the algorithm.   As the NYT story notes:

To date, the Chicago Police Department has declined to release details of the algorithm, citing proprietary technology. (Last week, The Chicago Sun-Times and three independent journalists filed a Freedom of Information Act suit against Chicago and its police department to release full information on the algorithm.)

There’s clearly a lot more journalists and media organizations should do to keep an eye on how such algorithms work, what factors go into them, how they’re designed, who designs them, and how they’re regulated and overseen.  It’s a huge area that we don’t do anywhere as good a job covering as we should, partly because it’s hard to get real information, but also partly because it doesn’t fall into some of the classic themes of journalism, with reasonably well-delineated notions of right and wrong, or standards we expect to be adhered to.

Because, frankly, we’re still at the start of really understanding what we expect from the algorithms that rule us – what does fair treatment mean, especially in an age of mass and granular personalization?  Should everyone be charged the same price for the same product, or is it OK when Staples prices products different based on your location? Does it matter what factors a gun violence algorithm takes into account if it helps bring down deaths – not that it seems to?  And so on.

There’s much work that we need to dive into, and the sooner we get into high gear on this, the better.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: