Hopp til innhold
All writing

Regulation

The EU AI Act for document-processing tools: what actually applies

The AI Act's risk pyramid is famous; how it lands on a document classifier or redaction tool is less so. A practical read on prohibited practices, high-risk Annex III categories, transparency obligations, and the timeline that matters.

8 min readBy SafeMediAI Editorial

The EU AI Act, Regulation (EU) 2024/1689[1], is famously structured as a risk pyramid. Most public discussion has focused on the top (prohibited practices and high-risk systems) and the bottom (general-purpose AI). The middle, where document-processing tools usually sit, has received less attention. It is also where most enforcement effort will end up, because that is where most actual deployments are.

This piece is for organisations deploying or building document-processing AI (redaction, classification, OCR-plus-NLP pipelines, generative document summarisation, contract analysis) into European work. It walks through the parts of the AI Act that actually apply, the parts that do not despite what marketing might suggest, and the timeline that determines when each obligation lands.

The risk pyramid in one paragraph

The Act establishes four risk tiers. Prohibited practices (Article 5) are banned outright.[2] High-risk systems (Article 6 plus Annex III) face the strict conformity assessment, documentation, risk-management, data-governance, logging, and human-oversight obligations.[3] Limited-risk systems face transparency obligations under Article 50 (the AI-generated-content labelling rules).[5] Minimal-risk systems face no Act-specific obligations beyond voluntary codes of practice. General-purpose AI models (GPT-class foundation models) face a separate set of obligations under Articles 51-55, regardless of where the eventual deployment sits.

For document-processing AI, the question is which tier you are in. The answer is more nuanced than the marketing wants it to be.

What is prohibited

Article 5's prohibitions came into force 2 February 2025.[6] The list is short but specific: subliminal manipulation, exploitation of vulnerabilities of specific groups, social scoring by public authorities, real-time remote biometric identification in public spaces by law enforcement (with narrow exceptions), predictive policing based solely on profiling, untargeted scraping of facial images for biometric databases, emotion recognition in workplaces and schools (with narrow exceptions), and biometric categorisation to infer sensitive attributes.

Document-processing tools almost never trigger Article 5. Two near-cases worth flagging:

  • An AI tool that infers sexual orientation, religion, political views, or trade-union membership from documents would fall under the biometric-categorisation prohibition if the inference uses biometric data, and is at least in the danger zone otherwise. Document classifiers that label by political affiliation, for example, need very careful purpose limitation.
  • An emotion-recognition feature applied to documents in an employment context (sorting CVs by inferred tone, scoring candidate enthusiasm) bumps into the workplace emotion-recognition prohibition.

For redaction, document classification by sensitivity, OCR, and standard contract analysis, Article 5 does not apply.

High-risk classification: when Annex III bites

This is where most of the work for document-processing AI lives. A system is high-risk under Article 6(2) if it falls within Annex III's list of use cases. Annex III enumerates eight high-risk categories:[4]

  1. Biometrics (remote biometric identification, categorisation by sensitive attributes, emotion recognition).
  2. Critical infrastructure (safety components for transport, water, gas, heating, electricity, digital infrastructure).
  3. Education and vocational training (admissions, evaluation, behaviour monitoring).
  4. Employment, worker management, self-employment (recruitment, performance evaluation, work allocation).
  5. Access to essential private and public services (creditworthiness, public benefits, emergency dispatch).
  6. Law enforcement (risk assessment of natural persons, polygraph-like tools, evidence evaluation, profiling).
  7. Migration, asylum, border control (risk assessment, identity verification, examination of applications).
  8. Administration of justice and democratic processes (judicial decision-making support, election-influence systems).

A document-processing tool can land in several of these depending on use. Worked examples:

  • A CV-screening tool that ranks candidates is high-risk under category 4.
  • A document classifier that supports a public administration in deciding access to a benefit is high-risk under category 5.
  • An evidence-analysis system used by police investigators is high-risk under category 6.
  • A migration-application triage tool used by an asylum authority is high-risk under category 7.
  • A legal-research assistant that supports judicial decisions is high-risk under category 8.

Conversely:

  • A general-purpose redaction tool used to redact internal documents before publication is not high-risk solely on its face.
  • A document classifier for internal records management with no decision-making impact on a natural person is not high-risk.
  • A generative summariser used for a private commercial document review is not high-risk.

The decisive question for borderline cases is whether the AI system makes or substantially supports a decision about a natural person in one of the Annex III contexts.

The Article 6(3) carve-out

Article 6(3) introduces an important softening: if an AI system in an Annex III category does not pose a significant risk to health, safety, or fundamental rights, it is not high-risk despite the category match. The provider has to document this assessment before placing the system on the market.

Specific examples in the Act:

  • The AI system performs a narrow procedural task.
  • The system improves the result of a previously completed human activity.
  • The system detects decision-making patterns or deviations and is not meant to replace human assessment without proper review.
  • The system performs a preparatory task to an assessment relevant for the Annex III use case.

For document-processing tools, this is where many deployments will actually land. A redaction tool used as a preparatory step before a human reviewer makes the final judgment on what to disclose, in an Annex III context, may qualify for the 6(3) carve-out. The provider has to write the assessment down and keep it.

The carve-out is not a free pass. The documentation has to be substantive and survive supervisory review. "Our tool is just a preparatory step" without an actual analysis is not a defensible position.

Limited-risk: Article 50 transparency

Article 50 imposes transparency obligations on a separate class of systems:[5]

  • Systems that interact directly with natural persons (chatbots, conversational AI) must disclose they are AI.
  • Systems that generate or manipulate image, audio, or video content (deepfakes) must mark the output as AI-generated.
  • Systems that generate synthetic text published in matters of public interest must label the text as AI-generated, unless the content is subject to editorial human oversight.

For document-processing AI, this lands most clearly on generative summarisation and drafting tools. If the system generates substantive text that will be published or attributed to a human, the AI-generated nature must be labelled. The exception for editorial human oversight is significant for newsroom and law-firm use cases.

General-purpose AI obligations

If you are building on top of a foundation model (GPT, Claude, Gemini, Llama), Articles 51-55 apply separately to the provider of that model. Most document-processing AI built on top of foundation models will not itself be a GPAI provider; the upstream model provider is.

That said, two indirect implications matter:

  • Downstream providers can rely on the upstream model's documentation for parts of their own conformity assessment, but only if the upstream provider has actually produced the required documentation. Builds on lightly-documented open models carry more compliance work for the downstream deployer.
  • Models with systemic risk (the very largest foundation models, currently defined by training-compute thresholds) face additional obligations the upstream provider must meet. Downstream deployers should track which models meet the systemic-risk threshold, because procurement decisions become slightly higher-stakes.

The timeline

The dates that matter for document-processing AI deployers:[6]

  • 2 February 2025: Article 5 prohibitions in force, plus the AI literacy obligation under Article 4 (organisations using AI must ensure their staff have sufficient AI literacy).
  • 2 August 2025: GPAI obligations apply to models placed on the market from this date. Member states designate competent authorities. Penalty framework activates.
  • 2 February 2026: Commission publishes guidelines on Article 6 high-risk classification.
  • 2 August 2026: The big date. High-risk system obligations (Annex III) apply to systems placed on the market or put into service from this date. Conformity assessment, risk management, data governance, logging, human oversight, post-market monitoring all become enforceable.
  • 2 August 2027: GPAI models placed on the market before August 2025 must achieve full compliance.

For a document-processing tool that may be high-risk, the practical planning timeline is now. Conformity assessment is not a paper exercise; the data-governance and risk-management requirements need to be designed into the system, not bolted on. Vendors who wait until July 2026 to start will not be ready.

A practical checklist for vendors and buyers

Six questions surface the AI Act posture of a document-processing tool.

  1. Are you high-risk under Annex III for any of your typical use cases? Worked, written assessment per use case.
  2. If yes, where are you in the conformity-assessment process? Notified body, harmonised standards, technical documentation.
  3. If Article 6(3) carve-out applies, is the assessment documented? Vendors selling into Annex III contexts without this assessment are selling a deployment risk.
  4. What Article 50 transparency labelling does your system implement? Specifically for generative outputs.
  5. What GPAI model are you built on, and what is the upstream provider's compliance posture?
  6. What documentation do you provide to deployers to support their own compliance? Conformity, data governance, post-market monitoring, incident reporting.

For buyers, the answers should be specific and on paper. For vendors, having the answers ready is a competitive advantage that compounds as 2026 approaches.

A closing note

The AI Act is the first regulation of its kind to actually pass and enter into force, anywhere. It is broader and stricter than most early-2024 predictions had it. The instinct of many vendors has been to treat it as a future problem, an Article 5 problem (prohibited practices), or a GPAI problem (foundation models). For document-processing AI, none of those are accurate.

The real shape of the Act, for this category of tools, is Article 6, Annex III, the 6(3) carve-out, and the 2 August 2026 deadline. That is the work, and it is the work that the better vendors are already doing.

References

  1. Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (the AI Act)
  2. AI Act, Article 5: Prohibited AI practices
  3. AI Act, Article 6: Classification rules for high-risk AI systems
  4. AI Act, Annex III: List of high-risk AI systems referred to in Article 6(2)
  5. AI Act, Article 50: Transparency obligations for certain AI systems
  6. European Commission, AI Act Implementation Timeline