Hopp til innhold
All writing

Compliance

HIPAA-ready AI for document processing: what 'ready' actually means

HIPAA does not certify AI tools. It sets standards for how covered entities handle protected health information, and the answers are surprisingly concrete. A working guide to Safe Harbor, Expert Determination, BAAs, and where AI vendors get this wrong.

8 min readBy SafeMediAI Editorial

The first thing to know about "HIPAA-ready AI" is that there is no such thing as a HIPAA-certified product. HIPAA does not certify software. It sets standards that covered entities and their business associates must meet. Whether an AI tool helps them meet those standards or quietly puts them in violation depends on questions that have nothing to do with the marketing copy.

This piece is for healthcare technology buyers, in-house compliance teams, and vendors who want to claim "HIPAA readiness" without bluffing. It walks through what the Privacy Rule and Security Rule actually require when AI touches protected health information, the two paths to de-identification, where Business Associate Agreements bite, and the deployment patterns that hold up.

The two doors out of HIPAA

HIPAA applies to protected health information (PHI) held by covered entities (hospitals, health plans, clearinghouses) and their business associates. The most common strategic move when building AI on top of health data is to exit the regulation by de-identifying the data. There are exactly two regulatory paths to that exit, both defined in 45 CFR § 164.514.[2]

Safe Harbor

Safe Harbor is the checklist method.[1] Remove all 18 specifically enumerated identifiers from the record, and have no actual knowledge that the residual data could be re-identified by an anticipated recipient. The 18 identifiers are concrete: name, geographic subdivision smaller than state, all elements of dates more granular than year, phone, fax, email, social security number, medical record number, health plan beneficiary number, account number, certificate or license number, vehicle identifier, device identifier, web URL, IP address, biometric identifier (fingerprint, voiceprint), full face photographic image, and any other unique identifying number, characteristic, or code.

The strengths of Safe Harbor: deterministic, auditable, fast to integrate into a redaction pipeline. The weakness: it is a blunt instrument. Removing zip code below state level can destroy the analytical utility of the data for, say, regional public-health work. Dates beyond year are routinely needed for survival analysis. Safe Harbor is the right method for high-volume, low-context disclosures (research datasets, contractor-accessible logs, AI training sets where geography is incidental). For analytical uses that need finer-grained data, it is often too coarse.

Expert Determination

Expert Determination is the risk-based method.[2] A qualified statistician or scientist analyzes the specific dataset, the specific anticipated recipients, the reasonably available external data those recipients could access, and concludes that the risk of re-identification is "very small." The expert documents the methodology and the finding. The covered entity holds that documentation, ideally with annual review.

Expert Determination is the right method for data that needs analytical fidelity, especially anything used to train or evaluate AI models. Done well, it can preserve information that Safe Harbor strips. Done poorly, it is a piece of paper that does not survive a regulator's audit.

When AI enters the picture

Three deployment patterns dominate AI use against health records. Each has a distinct compliance shape.

Pattern 1: Identified PHI in, redacted output out

This is the workflow most "HIPAA-ready" AI redaction tools claim to support: a clinical note or discharge summary enters the system, an NER model surfaces PHI, a reviewer approves redactions, the output goes downstream.

The compliance posture is straightforward if the vendor is a business associate. The covered entity needs a Business Associate Agreement under 45 CFR § 164.502(e),[4] the BAA needs the substantive provisions HHS has sample language for,[5] and the vendor's environment needs to actually implement the administrative, physical, and technical safeguards of the Security Rule. § 164.308's administrative safeguards alone include risk analysis, workforce training, sanctions, contingency planning, and periodic security evaluations.[3]

The frequent failure mode: a vendor offers a "free trial" or "self-service" tier that ingests PHI without a BAA. Under HIPAA there is no de minimis carve-out. The covered entity is on the hook the moment the data crosses the boundary without a contract.

Pattern 2: De-identified data for AI training and evaluation

This is where Expert Determination earns its keep. If a hospital wants to use its records to train or fine-tune an AI model, the cleanest path is to run the records through a de-identification pipeline that produces a dataset the expert determines to be of very small re-identification risk, then train on that dataset.

The traps are subtle. Free-text clinical notes contain identifiers in surprising places: a patient saying "I saw Dr. Hansen last week at the Lillehammer clinic" inside a transcribed appointment recording can blow Safe Harbor compliance for the whole file even if the structured fields are clean. Generative models trained on insufficiently de-identified data can memorize and reproduce PHI verbatim, which is both an Expert Determination failure and a separate breach. The 2024 wave of research on training-data extraction from large language models is the relevant technical literature here.

Pattern 3: AI assistance on PHI inside the covered entity

Increasingly common: a hospital deploys a generative assistant for clinicians on identified records, without sending data out of the covered entity's environment. This is not de-identification; it is keeping PHI inside the BAA boundary.

The compliance posture here turns on whether the model provider counts as a business associate. For on-prem or VPC-isolated deployments where the model never sees data outside the covered entity's environment, often not. For cloud-hosted models with prompts containing PHI, almost always yes. The decisive question is whether the model provider can access the PHI; access does not require a human reading it, only the technical possibility.

Where AI vendors get this wrong

Four claims recur in vendor materials and are worth flagging.

"HIPAA-compliant" with no BAA on offer

A vendor that processes PHI without offering a Business Associate Agreement is not HIPAA-compliant, full stop. The covered entity cannot use the tool on real PHI without violating § 164.502(e). "Compliant" without a BAA means "designed in a way that would be compliant if a BAA existed." That is a marketing claim, not a regulatory status.

"De-identified output" without specifying the method

If the vendor cannot tell you whether the de-identification approach is Safe Harbor or Expert Determination, the de-identification claim is not auditable. Asking which method, against which identifier list, with what risk threshold, is the first compliance test of any redaction vendor.

"Self-hosted means automatic HIPAA"

Self-hosting solves one specific problem (data egress to the vendor) and creates another (the customer now operates the system, which means the customer is on the hook for all of the Security Rule safeguards on that deployment). The vendor is not absolved; the customer is more loaded. A serious self-hosted offering ships with the documentation, configurations, and review artefacts the customer's compliance program needs to integrate.

"HIPAA equals encryption"

Encryption is one technical safeguard under § 164.312, and it matters, but it is the minimum. The Security Rule's administrative safeguards under § 164.308 require risk analyses, workforce training, sanctions, incident response, and contingency planning that a "we use AES-256" assertion does not address.[3]

A buyer's compliance checklist

For organisations evaluating an AI document-processing tool that will touch PHI, six questions separate serious vendors from marketing-led ones.

  1. Will you sign a Business Associate Agreement? Specifically: do you sign the customer's BAA, or do you provide your own, and what is non-negotiable in it?
  2. Where is data processed and stored? Region, tenancy model, encryption at rest and in transit, and what happens after the processing job finishes.
  3. Which de-identification method do you implement? Safe Harbor (18 identifiers) or Expert Determination (which expert, which methodology, last review date)?
  4. How does your model interact with the data during processing? Does training happen on customer data, ever? Are prompts retained for any purpose, ever?
  5. How do you log access? Per-user, per-record, with timestamps, retained for what period, exportable for audits?
  6. What is your breach notification process? Specifically the speed, the channel, and the level of detail.

The answers do not need to be lengthy. They need to be specific.

What "HIPAA-ready" should mean

A defensible use of the phrase is: the tool is designed to be deployed inside a covered entity's compliance posture; it is offered with a Business Associate Agreement when PHI will touch it; it implements de-identification under a named, documented method; and its Security Rule posture (administrative, physical, technical) is documented at a level a customer's compliance program can audit.

A misleading use: any version of the phrase that lacks any of the above.

The good news for genuinely capable tools is that the honest version of "HIPAA-ready" is also the better marketing position. Healthcare buyers are increasingly led by compliance and risk; the vendor who can answer the six questions above without hedging is the vendor who closes the deal. The other vendors run extended trials that never become contracts.

References

  1. U.S. Department of Health & Human Services, Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule
  2. 45 CFR § 164.514: Other requirements relating to uses and disclosures of protected health information
  3. 45 CFR § 164.308: Administrative safeguards (Security Rule)
  4. 45 CFR § 164.502(e): Business associate contracts
  5. HHS, Business Associate Contracts (sample provisions and guidance)