Data quality/validity

Official Definition

The usefulness, accuracy, and correctness of data for its application.

Source: AIEOG AI Lexicon (Feb 2026), NIST Big Data Interoperability Framework: Volume 4

What data quality/validity means in plain language

Data quality refers to whether data is fit for its intended use. Accurate, complete, timely, consistent, and relevant data is high quality. Data that is missing values, contains errors, is outdated, or does not represent the population it claims to represent is low quality.

For AI systems, data quality is not just a nice-to-have. It directly determines model performance. An AI model trained on low-quality data will produce low-quality outputs, regardless of how sophisticated the algorithm is. The principle is straightforward: bad data in, bad decisions out.

Data quality has multiple dimensions:

  • Accuracy. Does the data correctly represent reality?
  • Completeness. Is the data missing important values or records?
  • Timeliness. Is the data current enough for its intended use?
  • Consistency. Is the data formatted and coded consistently across sources?
  • Representativeness. Does the data adequately represent the population or conditions the AI system will encounter?
  • Relevance. Is the data actually useful for the intended application?

Why it matters in financial services

Data quality is a recurring theme in regulatory guidance and examination findings. Regulators expect financial institutions to maintain high data quality standards across all business functions, and AI systems add additional urgency to this expectation.

  • Model risk. The OCC’s Comptroller’s Handbook identifies data quality as a critical factor in model risk. Models built on poor data are unreliable, and institutions are expected to assess and document data quality as part of model development and validation.
  • BSA/AML. Transaction monitoring effectiveness depends on the quality of underlying transaction data. Missing, inaccurate, or delayed data can cause monitoring systems to miss suspicious activity.
  • Fair lending. Data quality issues can introduce bias into lending models. Incomplete data on certain demographic groups can lead to models that perform poorly for those groups.
  • Regulatory reporting. Inaccurate data flowing into regulatory reports (Call Reports, HMDA data, SAR filings) creates regulatory exposure.

Key considerations for compliance teams

  1. Establish data quality standards. Define minimum quality requirements for data used in AI systems, including accuracy, completeness, timeliness, and representativeness thresholds.
  2. Assess quality before model training. Conduct formal data quality assessments before using any dataset for model development. Document findings and any remediation steps taken.
  3. Implement automated quality monitoring. Deploy automated checks in data pipelines that flag quality issues in real time before they reach AI models.
  4. Document data quality decisions. When data quality issues are identified, document the decision to proceed, remediate, or reject the data, along with the rationale.
  5. Include data quality in validation. Model validation should assess the quality of the data the model was trained on and the data it processes in production.
  6. Report on data quality. Include data quality metrics in governance reporting to ensure leadership has visibility into this foundational risk factor.

Related terms

Data lineage, Training data, Bias, Data poisoning, Documentation, Structured data, Unstructured data

Stay current on AI risk in financial services

Get practical guidance on AI governance, model risk, and regulatory developments delivered to your inbox. Stay up to date on the latest in financial compliance from our experts.

Google reCaptcha: Invalid site key.