Auditing Algorithmic Risk

1. The Ethical Matrix is based on a bioethical framework originally conceived by philosopher John Mepham for the sake of running ethical experiments. For a detailed presentation, see C. O’Neil and H. Gunn, “Near-Term Artificial Intelligence and the Ethical Matrix,” chap. 8 in “Ethics of Artificial Intelligence,” ed. S.M. Laio (New York: Oxford University Press, 2020).

2. C. O’Neil, H. Sargeant, and J. Appel, “Explainable Fairness in Regulatory Algorithmic Auditing,” West Virginia Law Review, forthcoming.

3. See N. Bhutta, A. Hizmo, and D. Ringo, “How Much Does Racial Bias Affect Mortgage Lending? Evidence From Human and Algorithmic Credit Decisions,” Finance and Economics Discussion Series 2022-067, PDF file (Washington, D.C.: Board of Governors of the Federal Reserve System, October 2022), www.federalreserve.gov. Table A6 is particularly relevant.

4. M. Leonhardt, “Black and Hispanic Americans Often Have Lower Credit Scores — Here’s Why They’re Hit Harder,” CNBC, Jan. 28, 2021, www.cnbc.com.

5. B. Luthi, “How to Add Rent Payments to Your Credit Reports,” myFICO, Dec. 14, 2022, www.myfico.com.

6. The four-fifths rule is not a law but a rule of thumb from the U.S. Equal Employment Opportunity Commission, saying that selection rates between groups of candidates for a job or promotion (such as people of different ethnicities) cannot be too different. In particular, the rate for the group with the lowest selection rate must be at least four-fifths that of the group with the highest selection rate. See more at “Select Issues: Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence Used in Employment Selection Procedures Under Title VII of the Civil Rights Act of 1964,” U.S. Equal Employment Opportunity Commission, May 18, 2023, www.eeoc.gov.

7. “SB21-169 — Protecting Consumers From Unfair Discrimination in Insurance Practices,” Colorado Department of Regulatory Agencies, Division of Insurance, accessed April 24, 2024, https://doi.colorado.gov.

8.“3 CCR 702-10 Unfair Discrimination Draft Proposed New Regulation 10-2-xx,” Colorado Department of Regulatory Agencies Division of Insurance, accessed April 24, 2024, https://doi.colorado.gov.

9. The draft regulations also define these terms. “Statistically significant” means having a p-value of 0.05, and “substantial” means a difference in approval rates, or in price per $1,000 of face amount, of >5 percentage points. The details of the further tests are beyond the scope of this article, but the main idea is to inspect whether “external consumer data and information sources” (that is, nontraditional rating variables, such as cutting-edge risk scores, which insurers often purchase from third-party vendors) used in underwriting and pricing are correlated with race in a way that contributes to the observed differences in denial rates or prices. If inspection shows they are, then the insurer must “immediately take reasonable steps developed as part of [its] risk management framework to remediate the unfairly discriminatory outcome.”

10.10. P. Liang, R. Bommasani, T. Lee, et al., “Holistic Evaluation of Language Models,” Transactions on Machine Learning Research, published online Aug. 23, 2023, https://openreview.net.

11. D. Hendrycks, C. Burns, S. Basart, et al., “Measuring Massive Multitask Language Understanding,” arXivLabs, published online Sept. 7, 2020, https://arxiv.org.

12. Liang et al., “Holistic Evaluation of Language Models.”

13. B. Edwards, “Anthropic’s Claude 3 Causes Stir by Seeming to Realize When It Was Being Tested,“ Ars Technica, March 5, 2024, https://arstechnica.com.

14. A. Zou, Z. Wang, N. Carlini, et al., “Universal and Transferable Adversarial Attacks on Aligned Language Models,” arXivLabs, published online July 27, 2023, https://arxiv.org; and D. Ganguli, L. Lovitt, J. Kernion, et al., “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned,” arXivLabs, published online Aug. 23, 2022, https://arxiv.org.

15. L. McCarthy, “A Wellness Chatbot Is Offline After Its ‘Harmful’ Focus on Weight Loss,” The New York Times, June 8, 2023, www.nytimes.com.

16. K. Wells, “National Eating Disorders Association Phases Out Human Helpline, Pivots to Chatbot,” NPR, May 31, 2023, www.npr.org.

17. By “sketch,” we mean we are imagining the stakeholders and their concerns. Truly creating an Ethical Matrix for this use case would entail interviewing real representatives of these stakeholder groups. In this article, we approach it as a thought experiment.

18. C. Lecher, “NYC’s AI Chatbot Tells Businesses to Break the Law,” The Markup, March 29, 2024, https://themarkup.org.

19. G.A. Fowler, “TurboTax and H&R Block Now Use AI for Tax Advice. It’s Awful,” The Washington Post, March 4, 2024, www.washingtonpost.com.

“The MIT Sloan Management Review is a research-based magazine and digital platform for business executives published at the MIT Sloan School of Management.”

Please visit the firm link to site