EU AI Act

Getting Your AI Data Ready for the EU AI Act: A Plain-English Checklist

When companies stumble on the EU AI Act, it’s usually not the clever AI model that trips them up — it’s the paperwork behind the data. The single most common weak spot is being unable to show how an AI was built and what data it learned from. This guide turns that into a plain-English checklist focused on the data and training behind your AI, and then explains what each part means for the data itself, for your company, and for the customers your AI affects.

The short version

  • Under this law, if you can’t show your work, you can’t prove you followed the rules.
  • The data behind AI must be fair, suitable for the job, and checked for obvious problems.
  • You need plain records of how each important system was built, trained, and tested.
  • High-stakes systems must keep activity logs — generally for at least six months.
  • Using someone else’s AI doesn’t remove your responsibility for how you use it.
  • Heavily changing or rebranding a bought-in AI can make you legally its maker.

Why the data is the real test

Why the data is the real testAI learns from data, so the law cares a lot about that data being right. In plain terms, the training data behind a high-stakes AI should be relevant to the job, represent the real people it will affect, and be checked for errors and bias as far as reasonably possible. A system can be brilliantly engineered and still fail the test if nobody can explain where its data came from or show it was fair.

Training data (in plain terms): The examples an AI learns from. If those examples are narrow or biased, the AI will be too — which is why the law focuses so heavily on getting the data right.

The Checklist

Work through these and, crucially, keep the evidence. Under this law, a control you can’t prove is treated as if it doesn’t exist.

Compliance checklist

1. Sorting out your data

  • Write down where each batch of training data came from and how you got it.
  • If it involves people’s personal information, record that you had the right to use it.
  • Check that the data fairly represents the real people the AI will affect.
  • Look for bias that could unfairly affect someone’s health, safety, or rights — and note what you did about it.
  • Keep versions of your data so you can trace back if a problem shows up later.

2. Writing down how the AI was built

  • Describe, in plain records, what the AI is for and how it was designed.
  • Note how it was trained and which data was used.
  • Record how well it performs, how accurate it is, and where its limits are.
  • Keep your test results and update the records whenever you make a big change.

3. Keeping a paper trail while it runs

  • Set up the system to automatically log what it does.
  • Make sure those logs can’t be quietly altered.
  • Keep the logs for at least six months (longer if other laws require it).
  • Use the logs to spot problems early, not just to explain them after the fact.

4. People, roles, and ongoing duties

  • Make sure a real person can oversee and step in on any high-stakes AI.
  • Be clear whether you’re the maker or just a user of each system — the duties differ.
  • Register high-stakes systems where required before putting them on the market.
  • Report serious problems to the authorities quickly — generally within 15 days.
  • Review your risk checks and records regularly, not just once.

What it means for companies: maker vs. user

A common and costly surprise: buying a certified AI tool does not hand all responsibility to the seller. The maker’s paperwork proves they built it properly; it does not prove you’re using it properly. You still have your own duties — overseeing it, keeping logs, using it lawfully. The table shows who owns what:

Responsibility If you're the maker If you're the user
Getting the data right Mainly your job Check it's fit for your use
Records of how it was built You create and keep them Get the key documents from the maker
Official sign-off before sale Required Not required
Human oversight Build in the ability Actually do it day to day
Keeping activity logs Make logging possible Keep the logs (6+ months)
Reporting serious problems Report to authorities Report and tell the maker

One trap to flag: if you take a bought-in AI, heavily change it, or sell it under your own brand, you can legally become its “maker” — and inherit all the heavier duties. Imagine a delivery company that takes a ready-made route-planning AI, retrains it on its own data, and rebrands it. That one step can quietly turn it from a user into a maker overnight.

What it means for end customers

What it means for end customersAll this paperwork might sound like box-ticking, but it exists for the person on the receiving end. The data rules mean an AI judging your loan or reading your scan should be tested on data that includes people like you. The logging rules mean that if an AI decision goes wrong, there’s a record to investigate it. And the human-oversight rule means you’re not stuck arguing with a machine that has no off-switch. Good data discipline isn’t bureaucracy — it’s what makes AI decisions about real people fairer and more accountable.

Where Shaip fits in

Most of this checklist comes down to one thing: having good data and being able to prove it. That’s our core job. Shaip’s data annotation services deliver well-documented training data with clear records of where it came from and how it was checked — the evidence this law expects. For AI that works with images and video, Shaip’s computer vision data solutions help make sure that data is accurate and fairly represents real-world variety. In short, we help you build AI on data you can stand behind when someone asks how it was made.

The bottom line

An EU AI Act data checklist boils down to a simple rule: if you can’t show how your data was chosen, handled, and tested, you can’t show you followed the law. Sort out your data, write down how your AI was built, keep a paper trail, and be clear on who’s responsible. Companies that treat good record-keeping as part of building AI — not an afterthought — will be ready when the rules bite, and the people their AI affects will be better protected for it.

Missing records. The most common problem is being unable to show how an AI was built and what data it learned from. Many organisations run AI with no clear record of its data sources or how it was tested, which makes proving compliance impossible.

That the data behind a high-stakes AI is relevant to the job, fairly represents the people it will affect, and has been checked for errors and bias as far as reasonably possible — with records to back that up.

Generally at least six months for users of high-stakes systems, and longer if other laws require it. The logs should be automatic and tamper-resistant so problems can be traced.

No. The seller’s certification shows they built it properly, but you still have your own duties — overseeing it, keeping logs, and using it lawfully. Responsibility for how you use the AI stays with you.

Yes — if you heavily change it or sell it under your own brand, you can legally become the maker and take on the heavier duties, including the full set of build-and-test records.

After the 2026 changes, the strictest rules apply from December 2027 for standalone high-stakes AI and August 2028 for AI built into products. Getting your data records in order now avoids an expensive scramble later.

Enjoyed this article? Follow Shaip on LinkedIn for more updates.

Social Share