Critical ReviewLaw & Policy

The EU AI Act and Copyright: Can Regulation Harmonize Innovation with Creator Rights?

The EU AI Act requires generative AI providers to disclose copyrighted training data. But does transparency actually protect creators, or does it merely formalize a system where their work is used without meaningful consent? Recent legal analyses examine the Article 53(1)(d) transparency requirement and its limits.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Generative AI models are trained on vast quantities of copyrighted material—books, articles, images, music, code. The creators of this material typically receive no compensation, no notification, and no ability to opt out. The EU has attempted to address this through two overlapping regulatory instruments: the 2019 Digital Single Market (DSM) Copyright Directive (which created text and data mining exceptions with an opt-out mechanism) and the 2024 AI Act (which requires general-purpose AI providers to disclose information about their training data). Whether these instruments actually protect creators or merely create a veneer of legitimacy over existing practices is a question that recent legal scholarship is actively debating.

The Research Landscape

The Regulatory Architecture

Bochkova (2025) provides the clearest mapping of how the EU AI Act and the DSM Copyright Directive interact—and where they conflict. The AI Act's Article 53(1)(d) requires providers of general-purpose AI models to "put in place a policy to comply with Union copyright law" and to "draw up and make publicly available a sufficiently detailed summary of the content used for training."

The DSM Directive's Article 4 allows text and data mining (TDM) of copyrighted works for any purpose, provided that the rightsholder has not "expressly reserved" their rights against mining. This creates an opt-out mechanism—creators must actively declare that their work cannot be mined, rather than AI developers needing to seek permission.

Bochkova identifies the tension: the AI Act requires transparency about training data, but the DSM Directive allows use without permission as long as no opt-out has been declared. Transparency about how your work was used is cold comfort if the use itself was legal without your consent.

Article 53(1)(d) in Practice

Khaliq and Tariq (2025) ask the pointed question: what does Article 53(1)(d) actually change for generative AI development? Their analysis examines the practical implications of the transparency requirement.

The answer, they argue, is: less than advocates hope. The "sufficiently detailed summary" requirement is vague—what level of detail is sufficient? Listing every copyrighted work in a training dataset of billions of items is impractical. Listing categories ("books published before 2024") is uninformative. The regulation does not specify what "sufficient" means, leaving enforcement to future interpretation.

More fundamentally, transparency does not create bargaining power. Even if creators know their work was used for training, they cannot retroactively negotiate compensation unless the use was illegal. And under the DSM Directive's TDM exception, the use is legal unless the creator has opted out—creating a system where the default is use without payment, and the burden falls on creators to protect their own rights.

Transparency as Partial Solution

Buick (2024), with 28 citations, provides the most balanced assessment. The paper acknowledges that transparency is insufficient but argues it is necessary as a precondition for more effective protections. Without knowing what training data was used, creators cannot identify infringement, negotiate licensing, or demonstrate harm. Transparency does not solve the problem, but its absence makes the problem invisible.

Buick proposes a "transparency stack" with three levels:

  • Dataset documentation: What data was collected, from where, and under what terms.
  • Processing documentation: How data was cleaned, filtered, and transformed.
  • Usage documentation: Which models were trained on which data, and what those models are used for.
  • Each level provides different stakeholders with different actionable information: creators can check whether their work was included (level 1), data protection authorities can verify processing legality (level 2), and regulators can assess compliance with the AI Act (level 3).

    France and the TDM Exception

    Lee (2025) examines how France—historically one of the strongest defenders of creator rights in Europe—has implemented the DSM Directive's TDM provisions. France transposed the Directive with additional restrictions, requiring TDM users to demonstrate "lawful access" to the works they mine and interpreting the opt-out mechanism broadly.

    The French approach creates a more protective environment for creators but also more friction for AI developers. The practical question is whether France's interpretation will influence other EU member states or remain an outlier. If member states implement the TDM exception differently, the "single market" in AI training data will be fragmented.

    Critical Analysis: Claims and Evidence

    <
    ClaimEvidenceVerdict
    The EU AI Act and DSM Directive create conflicting signals for AI developersBochkova's regulatory analysis✅ Supported — transparency ≠ permission creates confusion
    Article 53(1)(d) transparency requirement is practically vagueKhaliq et al.'s analysis of "sufficiently detailed"✅ Supported — no clear standard exists
    Transparency is necessary but insufficient for creator protectionBuick's transparency stack proposal✅ Supported — without transparency, other protections cannot function
    France's stronger TDM interpretation may fragment the EU marketLee's comparative analysis⚠️ Uncertain — depends on other member states' transposition choices

    Open Questions

  • Technical compliance: How can AI developers practically comply with training data transparency requirements when datasets contain billions of items from automated web scraping?
  • Collective licensing: Could collective licensing organizations (analogous to music performing rights societies) provide a practical mechanism for creator compensation?
  • Global fragmentation: If the EU, US, China, and Japan take different approaches to AI training data copyright, how do global AI developers comply with all simultaneously?
  • Retroactive application: Do transparency requirements apply to models already trained, or only to future training runs? The answer affects whether existing harms can be identified and remedied.
  • What This Means for Your Research

    For legal scholars, the EU's evolving approach provides a natural experiment in regulatory design for AI copyright—with implications for jurisdictions worldwide that are developing their own frameworks.

    For AI developers, the practical compliance challenge of Article 53(1)(d) is immediate and unresolved. Early investment in training data documentation may reduce future regulatory risk.

    Explore related work through ORAA ResearchBrain.

    References (6)

    [1] Bochkova, I. (2025). AI Regulation and IP Governance: The EU AI Act, the DSM Copyright Directive, and Ukraine's Harmonisation Path. Digital Economy and Society.
    [2] Khaliq, F., Mir, S., & Tariq, I. (2025). Licensing or litigation? Measuring what Article 53(1)(d) of European Union (EU) AI Act really changes for generative AI. Academic Journal.
    [3] Buick, A. (2024). Copyright and AI training data—transparency to the rescue? Journal of Intellectual Property Law & Practice.
    [4] Lee, W.-W. (2025). AI Learning Data and Copyright: Analysis of EU and France's TDM Regulations. Korean Journal of Digital Property Studies, 38(3).
    , & Bochkova, I. (2025). Artificial Intelligence Regulation and Intellectual Property Governance: The EU AI Act, the DSM Copyright Directive, and Ukraine’s Two-Stage Harmonisation Path. Public Administration and Law Review, 29-36.
    Buick, A. (2025). Copyright and AI training data—transparency to the rescue?. Journal of Intellectual Property Law and Practice, 20(3), 182-192.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 7 keywords →