June 30, 2026

Getting Your AI Data Ready for the EU AI Act: A Plain-English Checklist

When companies stumble on the EU AI Act, it’s usually not the clever AI model that trips them up — it’s the paperwork behind the data. The single most common weak spot is being unable to show how an AI was built and what data it learned from. This guide turns that into a plain-English checklist focused on the data and training behind your AI, and then explains what each part means for the data itself, for your company, and for the customers your AI affects.

The short version

Under this law, if you can’t show your work, you can’t prove you followed the rules.
The data behind AI must be fair, suitable for the job, and checked for obvious problems.
You need plain records of how each important system was built, trained, and tested.
High-stakes systems must keep activity logs — generally for at least six months.
Using someone else’s AI doesn’t remove your responsibility for how you use it.
Heavily changing or rebranding a bought-in AI can make you legally its maker.

Why the data is the real test

AI learns from data, so the law cares a lot about that data being right. In plain terms, the training data behind a high-stakes AI should be relevant to the job, represent the real people it will affect, and be checked for errors and bias as far as reasonably possible. A system can be brilliantly engineered and still fail the test if nobody can explain where its data came from or show it was fair.

Training data (in plain terms): The examples an AI learns from. If those examples are narrow or biased, the AI will be too — which is why the law focuses so heavily on getting the data right.

The Checklist

Work through these and, crucially, keep the evidence. Under this law, a control you can’t prove is treated as if it doesn’t exist.

1. Sorting out your data

Write down where each batch of training data came from and how you got it.
If it involves people’s personal information, record that you had the right to use it.
Check that the data fairly represents the real people the AI will affect.
Look for bias that could unfairly affect someone’s health, safety, or rights — and note what you did about it.
Keep versions of your data so you can trace back if a problem shows up later.

2. Writing down how the AI was built

Describe, in plain records, what the AI is for and how it was designed.
Note how it was trained and which data was used.
Record how well it performs, how accurate it is, and where its limits are.
Keep your test results and update the records whenever you make a big change.

3. Keeping a paper trail while it runs

Set up the system to automatically log what it does.
Make sure those logs can’t be quietly altered.
Keep the logs for at least six months (longer if other laws require it).
Use the logs to spot problems early, not just to explain them after the fact.

4. People, roles, and ongoing duties

Make sure a real person can oversee and step in on any high-stakes AI.
Be clear whether you’re the maker or just a user of each system — the duties differ.
Register high-stakes systems where required before putting them on the market.
Report serious problems to the authorities quickly — generally within 15 days.
Review your risk checks and records regularly, not just once.

What it means for companies: maker vs. user

A common and costly surprise: buying a certified AI tool does not hand all responsibility to the seller. The maker’s paperwork proves they built it properly; it does not prove you’re using it properly. You still have your own duties — overseeing it, keeping logs, using it lawfully. The table shows who owns what:

Responsibility	If you're the maker	If you're the user
Getting the data right	Mainly your job	Check it's fit for your use
Records of how it was built	You create and keep them	Get the key documents from the maker
Official sign-off before sale	Required	Not required
Human oversight	Build in the ability	Actually do it day to day
Keeping activity logs	Make logging possible	Keep the logs (6+ months)
Reporting serious problems	Report to authorities	Report and tell the maker

One trap to flag: if you take a bought-in AI, heavily change it, or sell it under your own brand, you can legally become its “maker” — and inherit all the heavier duties. Imagine a delivery company that takes a ready-made route-planning AI, retrains it on its own data, and rebrands it. That one step can quietly turn it from a user into a maker overnight.

What it means for end customers

All this paperwork might sound like box-ticking, but it exists for the person on the receiving end. The data rules mean an AI judging your loan or reading your scan should be tested on data that includes people like you. The logging rules mean that if an AI decision goes wrong, there’s a record to investigate it. And the human-oversight rule means you’re not stuck arguing with a machine that has no off-switch. Good data discipline isn’t bureaucracy — it’s what makes AI decisions about real people fairer and more accountable.

Where Shaip fits in

Most of this checklist comes down to one thing: having good data and being able to prove it. That’s our core job. Shaip’s data annotation services deliver well-documented training data with clear records of where it came from and how it was checked — the evidence this law expects. For AI that works with images and video, Shaip’s computer vision data solutions help make sure that data is accurate and fairly represents real-world variety. In short, we help you build AI on data you can stand behind when someone asks how it was made.

The bottom line

An EU AI Act data checklist boils down to a simple rule: if you can’t show how your data was chosen, handled, and tested, you can’t show you followed the law. Sort out your data, write down how your AI was built, keep a paper trail, and be clear on who’s responsible. Companies that treat good record-keeping as part of building AI — not an afterthought — will be ready when the rules bite, and the people their AI affects will be better protected for it.

What trips most companies up under the EU AI Act?

Missing records. The most common problem is being unable to show how an AI was built and what data it learned from. Many organisations run AI with no clear record of its data sources or how it was tested, which makes proving compliance impossible.

What does the law expect from training data?

That the data behind a high-stakes AI is relevant to the job, fairly represents the people it will affect, and has been checked for errors and bias as far as reasonably possible — with records to back that up.

How long do I need to keep AI activity logs?

Generally at least six months for users of high-stakes systems, and longer if other laws require it. The logs should be automatic and tamper-resistant so problems can be traced.

If I buy a certified AI tool, am I off the hook?

No. The seller’s certification shows they built it properly, but you still have your own duties — overseeing it, keeping logs, and using it lawfully. Responsibility for how you use the AI stays with you.

Can using someone else's AI make me its “maker”?

Yes — if you heavily change it or sell it under your own brand, you can legally become the maker and take on the heavier duties, including the full set of build-and-test records.

When do these data rules actually apply?

After the 2026 changes, the strictest rules apply from December 2027 for standalone high-stakes AI and August 2028 for AI built into products. Getting your data records in order now avoids an expensive scramble later.

Enjoyed this article? Follow Shaip on LinkedIn for more updates.

Social Share

Get Exclusive Blog Insights

Talk to an Expert

Email
This field is for validation purposes and should be left unchanged.
First Name*
Last Name*
Email*
Phone*
Company*
Country*
Country
Comments*
By registering, I agree with Shaip Privacy Policy and Terms of Service and provide my consent to receive B2B marketing communication from Shaip.

What We Do Best

AI Data Services

Speciality

Off-The-Shelf Data Catalog & Licensing

Medical Datasets

Computer Vision Datasets

Speech/Audio Datasets

Solutions

By Industry

By Use Case

Getting Your AI Data Ready for the EU AI Act: A Plain-English Checklist

Why the data is the real test

The Checklist

1. Sorting out your data

2. Writing down how the AI was built

3. Keeping a paper trail while it runs

4. People, roles, and ongoing duties

What it means for companies: maker vs. user

What it means for end customers

Where Shaip fits in

The bottom line

Social Share

Talk to an Expert

Download Free Book

You May Also Like

7 Questions to Ask Any AI Data Vendor After a Supply-Chain Security Incident

Why Data Neutrality Is More Critical Than Ever in AI Training Data

Ethics and Bias: Navigating the Challenges of Human-AI Collaboration in Model Evaluation

AI Data Services

Speciality

Resources

Company

Contact Us