AI Data Governance

Review training data quality against EU AI Act Article 10 requirements.

What Is AI Data Governance?

Learn how Article 10 of the EU AI Act establishes data governance requirements for high-risk AI systems. Review a training dataset for representativeness, data quality, leakage, and unnecessary personal data before model training can proceed.

What You'll Learn in AI Data Governance

AI Data Governance — Training Steps

  1. Article 10: Data Governance

    Article 10 of the EU AI Act establishes data governance requirements for high-risk AI systems. Training, validation, and testing data must meet strict quality criteria: Data must be relevant to the task the AI system is designed to perform. Data must be sufficiently representative of the population the model will serve. Data must be as free of errors as possible and appropriate to the intended purpose. Data governance practices must address potential biases that could lead to discriminatory outcomes. Poor data leads to biased AI, and biased AI leads to legal liability. Data governance is not a best practice under the EU AI Act - it is a legal obligation.

  2. Dataset Review Request

    An email arrives from Marcus Rodriguez, the AI Team Lead. The team is preparing to train ChurnPredict v3, and the dataset needs a compliance review before training can begin. The email links directly to the dataset on the DataOps platform.

  3. Issue 1: Regional Underrepresentation

    The DataOps platform loads the ChurnPredict v3 dataset review. The regional distribution of the training data immediately stands out - the dataset is heavily concentrated in one region despite the model being designed to serve all four equally.

  4. Issue 2: Stale Pre-Pandemic Data

    The data collection timeline reveals another concern. A significant portion of the records predate a fundamental shift in customer behavior.

  5. Issue 3: Data Leakage

    A closer look at the feature list reveals a critical data quality problem that would undermine the model entirely.

  6. Knowledge Check: Data Representativeness

    Before continuing the review, a question about the regional distribution issue.

  7. Issue 4: Unnecessary Personal Data

    The final section of the review reveals a compliance risk that extends beyond the AI Act into GDPR territory.

  8. Review Summary

    Alice has completed the data governance review. Four critical issues must be resolved before model training can proceed: Severe regional underrepresentation - 72% North region data for a model serving 4 regions equally. The dataset must be rebalanced to adequately represent all deployment regions. Stale pre-pandemic data - 38% of records from 2019-2020 no longer reflect current customer behavior. These records should be excluded or weighted appropriately. Data leakage - the account_status feature directly encodes the target variable and must be removed to prevent artificially inflated training accuracy. Unnecessary PII - raw names, emails, phone numbers, and addresses create GDPR exposure without contributing to churn prediction. These fields must be removed or pseudonymized.

  9. File a Compliance Report

    Identifying gaps is only half the job. Under Article 10, the data governance review must be documented and routed to the AI Team Lead and the Data Protection Officer so model training is paused until the issues are resolved.

  10. Submit the Compliance Report

    Alice fills in the report with the findings, the four gaps mapped to Article 10 and GDPR, and the actions the AI team must complete before training resumes.