Can Machines (Fairly) Identify Criminals of the Future?

You are here

axelle b / CC0 1.0
December 19, 2016
Samantha Apgar

“There’s software used across the country to predict future criminals,” a recent ProPublica article contended, “and it’s biased against blacks.” The software is COMPAS, a risk-assessment program created to evaluate the likelihood of an offender committing a future crime. An analysis by ProPublica found that COMPAS was almost twice as likely to incorrectly flag black defendants as having a high-risk of recidivism. Conversely, white defendants were frequently mislabeled as low-risk. The company that created COMPAS disputed ProPublica’s analysis, asserting that their software is race-neutral.

These issues were at the core of the third workshop of the Optimizing Government project, held on November 3, 2016. In prior seminars, speakers introduced basic machine learning techniques and addressed some of the questions about fairness that are raised by those techniques. This session sought to deepen the discussion further by exploring fairness and performance trade-offs in machine learning. Funded by the Fels Policy Research Initiative, Optimizing Government brings together researchers at the University of Pennsylvania to collaborate on studying the implementation of machine learning in government. This session featured Michael Kearns (Professor and National Center Chair, Department of Computer and Information Science; Founding Director, Warren Center for Network and Data Sciences; Founding Director, Penn Program in Networked and Social Systems Engineering) and Sandra Mayson (Research Fellow, Quattrone Center for the Fair Administration of Justice).

Mayson suggested that it is possible that both ProPublica and the software’s creators are correct in their claims COMPAS about whether COMPAS is fair or not. The real issue is that they are using different metrics of fairness. COMPAS emphasizes equal predictive accuracy among different populations. The base rate for arrest is higher among the black sub-population, which results in a higher chance of mistakenly labeling black defendants as high-risk. ProPublica used the metric of disparate impact. Ultimately, the bias uncovered in the COMPAS algorithm was a product of unfairness in the real world. Mayson uses this compelling example to demonstrate that multiple metrics of fairness can be used in a predictive situation.

Kearns, to use his own words, got “into the weeds” on the technical aspects of fairness algorithms by adapting the “multi-armed bandit”, an old mathematical learning problem, to illustrate the issue of fairness in a sequential decision making process. Ultimately, the fairness metric he presents would ensure that within a given pool of applicants a worse applicant is never favored over a better one. Within a legal framework, this metric would most likely eliminate concerns regarding disparate treatment on the basis of non-merit-relevant factors. It’s possible it would also eliminate disparate treatment on the basis of traits such as race, sex or religion, although this might require independent criterion to ensure no discrimination occurs. It seems less likely that this metric would address disparate impact fairness measures such as predictive parity or demographic/statistical parity.

Does the algorithm proposed prioritize the right fairness metrics? The Fels Policy Research Initiative further explored this conversation on machine learning December 9 with a session on Regulating Robo-Advisors Across the Financial Services Industry. The full presentation can be viewed online.

Penn LPS

The lifelong learning division of Penn Arts & Sciences

3440 Market Street, Suite 100
Philadelphia, PA 19104-3335

(215) 898-7326

Facebook   Twitter   YouTube