Compliance in AI Underwriting: Fair Lending and Bias Risks

I work in compliance at a mid-size regional carrier. When our actuarial team started integrating a third-party AI scoring model into our personal auto underwriting process, I got pulled in to review it from a regulatory standpoint. What I found was that we had no clear framework for evaluating algorithmic bias — no documented methodology, no testing protocol, and a vendor contract that said almost nothing about what data the model was trained on.

That experience pushed me to build one. What follows is the framework I use now, along with the specific legal traps that catch carriers off guard and the practical steps that actually reduce your exposure.

The Problem: Why AI Creates New Fair Lending Exposure

Traditional underwriting rules are relatively transparent. You can trace a declination to a specific rating factor — credit score, loss history, ZIP code. Regulators can audit those factors and compare them to filed rate rules.

AI underwriting models work differently. A gradient boosting model or neural network might incorporate hundreds of features, and the relationship between those features and the output isn’t always legible, even to the vendor. That opacity creates three distinct compliance risks.

Proxy discrimination. A variable that seems neutral can serve as a statistical proxy for race, national origin, or other protected characteristics. ZIP code is the classic example — it correlates strongly with race due to decades of residential segregation. But AI models can find subtler proxies: shopping hour patterns, device type, payment method cadence. The CFPB has been explicit that lenders can’t escape liability by claiming ignorance of proxy relationships in their models.

Feedback loops. If a model is trained on historical approval and loss data, and that historical data reflects discriminatory underwriting decisions, the model will learn to replicate those decisions. A model trained on 20 years of commercial property approvals that systematically underserved Black-owned businesses will continue to do so, not because of malicious intent, but because discrimination is baked into the training signal.

Disparate impact without intent. Under ECOA, the Fair Housing Act, and many state insurance statutes, discriminatory intent isn’t required. If a practice has a statistically significant disparate impact on a protected class and isn’t justified by business necessity, it can violate the law. AI models that no one designed to discriminate can still fail a disparate impact analysis.

Step 1: Inventory What Your Model Actually Uses

Before you can assess bias risk, you need to know what variables are in the model. This sounds obvious, but it’s frequently a problem with third-party vendor models.

Start by requesting the full feature list from your vendor. If they refuse to disclose it under a trade secret claim, that’s itself a red flag — and something you should escalate before the model goes live. For models you build internally, document every input at the time of model development, not after the fact.

For each variable, assess:

Is it directly correlated with any protected class at the zip code, census tract, or individual level?
Has it been flagged in prior litigation as a proxy variable? (ZIP code, credit score, and education level all have significant case law.)
Does the training data reflect a period or region with documented discriminatory practices?

There’s no automated tool that does this comprehensively. Tools like IBM AI Fairness 360 and Microsoft Fairlearn can help quantify disparate impact in outputs, but identifying proxy variables at the input stage requires human judgment from someone who knows both the data and the regulatory history.

Step 2: Run Disparate Impact Testing Before Deployment

The standard methodology is the four-fifths rule, borrowed from employment discrimination law. If the approval rate for a protected group is less than 80% of the approval rate for the highest-approved group, you have a potential disparate impact problem.

For insurance underwriting, you often don’t have direct protected class data — insurance applications don’t ask for race. This is where Bayesian Improved Surname Geocoding (BISG) comes in. BISG uses surname and geographic data to estimate race probability at the applicant level. It’s not perfect, but it’s defensible methodology that regulators and plaintiffs’ attorneys both use. The CFPB and HUD have accepted it in enforcement contexts.

Run your disparate impact analysis on:

Approval/declination rates
Premium tier placement
Rate-level differences for equivalent risk profiles

Document everything. If you find a disparate impact, the business necessity defense requires demonstrating that the variable causing the impact is actually predictive and that no less discriminatory alternative achieves comparable accuracy. That’s a high bar. In practice, if BISG analysis shows a significant disparity, you need to either modify the model or be prepared to defend it.

Step 3: Document Model Governance Formally

Regulators are increasingly asking for model governance documentation. The NAIC’s AI Principles — adopted by a majority of state insurance commissioners — call for accountability, transparency, and fairness in AI systems. California DOI has issued specific bulletin guidance on algorithmic underwriting. New York has used its existing unfair trade practices authority to examine AI models. Colorado passed SB 169 in 2021 requiring carriers to test for unfair discrimination in external consumer data and algorithms.

Your model governance documentation should include:

Model purpose and scope
Training data sources and vintage
Feature list with documented justification for inclusion
Disparate impact analysis results (pre-deployment and ongoing)
Model validation results from an independent team
Escalation process for detected bias
Monitoring cadence and thresholds that trigger review

This documentation isn’t just for regulators. When a plaintiff’s attorney subpoenas your model records in a discrimination case, this is what you’ll be producing. If the documentation doesn’t exist, it looks like you weren’t paying attention — or that you were deliberately avoiding creating a record.

Step 4: Establish Ongoing Monitoring, Not Just Pre-Deployment Checks

The most common mistake I see is treating bias review as a one-time event at model deployment. Models drift. The population applying for coverage changes. Economic conditions shift the distribution of inputs. A model that passed disparate impact testing in 2022 may be behaving differently in 2025.

Build a quarterly monitoring process that re-runs your BISG disparate impact analysis on production decisions. Set thresholds: if the approval rate ratio for any protected class drops below 0.85 relative to the reference group, the model gets flagged for human review before the next underwriting cycle.

Also monitor model explanations over time. Tools like SHAP (SHapley Additive exPlanations) let you decompose individual model decisions into feature contributions. If you start seeing a proxy variable like “distance from downtown” suddenly accounting for 30% of the model’s variance in a region with documented racial residential patterns, that should trigger a review even if your aggregate disparate impact numbers still look acceptable.

Common Mistakes That Create Liability

Relying on vendor attestations. A vendor saying their model is “fair and compliant” is not legal protection. You’re the entity making the underwriting decision and you’re accountable for the outcomes. Get specific — ask for their disparate impact test results on a dataset similar to yours, ask what methodology they used, and ask what they do when bias is detected.

Using protected class exclusion as a compliance strategy. Some teams think that if they simply exclude race, sex, and national origin from model inputs, they’re covered. They’re not. Proxy variables can recreate protected class signal without the protected class label appearing in the data.

Treating adverse action notices as sufficient documentation. AI-based declinations require meaningful adverse action notices under ECOA. “Score below threshold” is not a specific enough reason. The CFPB’s 2023 guidance made clear that generic adverse action reasons don’t satisfy the requirement when AI is involved. Your notices need to reflect the actual principal reasons for the decision, which means your model needs to produce interpretable explanations at the individual level.

Skipping the legal review on training data licensing. Some AI underwriting vendors use third-party consumer data — telematics, retail behavior, alternative credit signals. Make sure your data licensing agreements permit the specific use case you’re deploying. Using data outside its licensed purpose is a separate regulatory exposure on top of the bias analysis.

What to Do This Week

If you’re using an AI underwriting model and haven’t done a formal disparate impact review, start there. You don’t need an expensive consulting engagement to run a basic BISG analysis on your approval data — someone with Python and access to the Census surname list can do a preliminary screen.

If you’re evaluating a new vendor model, add a contract requirement that the vendor provide feature documentation and disparate impact test results as a condition of deployment. Vendors who’ve done this work will have the documentation. Vendors who push back are telling you something important.

Finally, read Colorado SB 169 and California DOI Bulletin 2022-5 if you haven’t. Even if you don’t operate in those states, they’re the best preview available of where the rest of state insurance regulation is heading. The days of buying a black-box model, deploying it, and hoping no one asks questions are over.