Interpretability

Official Definition

The degree to which a cause-and-effect relationship within an AI system can be observed, measured, and tested to characterize errors and understand their origins.

Source: AIEOG AI Lexicon (Feb 2026), adapted from NIST AI 100-1

What interpretability means in plain language

Interpretability is the degree to which humans can understand how an AI system works internally. While explainability focuses on describing what a model does (its input-output relationship), interpretability focuses on understanding why and how it does it (the internal mechanics).

A highly interpretable model is one where you can trace the logic from input to output and understand the causal chain. A decision tree is interpretable because you can follow each branching decision. A deep neural network with millions of parameters is typically not interpretable because the interactions are too complex for human comprehension.

The distinction between explainability and interpretability matters for governance. Explainability can be achieved through post-hoc techniques applied to any model. Interpretability is a property of the model itself and requires choosing architectures that support it.

Why it matters in financial services

Interpretability supports multiple regulatory and governance objectives: root cause analysis when models produce unexpected results, bias identification by understanding what features drive decisions, debugging and error correction, examiner demonstrations of model understanding, and confidence in model behavior under novel conditions.

Regulators increasingly expect institutions to demonstrate an understanding of how their AI models work, not just what they produce. For high-risk applications like credit decisioning and BSA/AML, this understanding is a regulatory expectation.

Key considerations for compliance teams

Assess interpretability needs. Determine the level of interpretability required for each AI use case based on its risk profile.
Favor interpretable models. When performance allows, choose inherently interpretable model architectures over black box alternatives.
Use interpretability tools. For complex models, deploy interpretability tools (feature importance, partial dependence plots, attention maps) to provide insight.
Document interpretability assessments. Record the level of interpretability achieved for each model and any limitations.
Include interpretability in validation. Validators should assess whether the model’s behavior can be understood well enough to evaluate fitness for purpose.
Train model users. Ensure the people using model outputs understand how the model works at a level appropriate to their role.

Stay current on AI risk in financial services

Get practical guidance on AI governance, model risk, and regulatory developments delivered to your inbox. Stay up to date on the latest in financial compliance from our experts.