Law & Policy

AI Regulation Meets Copyright: The Unresolved Tension at the Heart of the EU AI Act

The EU AI Act and the DSM Copyright Directive were designed to govern different aspects of the digital economy. But generative AI sits at their intersection, creating legal ambiguities that neither framework anticipated. Five papers examine how this regulatory gap is being navigatedโ€”and why it matters for every jurisdiction watching the EU experiment.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Two of the European Union's landmark regulatory instrumentsโ€”the AI Act (2024) and the Digital Single Market (DSM) Copyright Directive (2019)โ€”were conceived in different eras of the AI conversation. The DSM Directive was drafted when "AI" primarily meant recommendation algorithms and automated content moderation. The AI Act was drafted as generative AI was emerging but before ChatGPT demonstrated to a global audience what large-scale text-and-data mining could produce. The result is a regulatory architecture with a gap at its center: generative AI systems that consume copyrighted works as training data and produce new works that may compete with the originals.

This gap is not a drafting error that can be fixed with an amendment. It reflects a genuine conceptual tension between two legitimate policy objectives: promoting AI innovation (which requires access to large-scale training data) and protecting creators' rights (which requires control over how their works are used). The resolution of this tension will shape the global AI industry, because the EU's regulatory choicesโ€”whether through the Brussels Effect or deliberate emulationโ€”tend to become de facto global standards.

Riccio (2024) examines the legality of using copyrighted works as AI training data under the DSM Directive's text-and-data mining (TDM) exceptions. The Directive provides two relevant provisions:

Article 3 permits TDM for scientific research by research organizations and cultural heritage institutions, without requiring rightsholder consent. This exception is broad and mandatory across EU member states.

Article 4 permits TDM for any purpose, including commercial, unless rightsholders have expressly opted out through "machine-readable means" (typically a robots.txt file or metadata declaration).

The practical implications are significant. For AI companies, Article 4 creates a default-permissive regime: they can use any publicly available copyrighted content as training data unless the rightsholder has affirmatively opted out. For creators, Article 4 shifts the burden of protection from the user to the rightsholderโ€”a reversal of the traditional copyright default that requires permission before use.

Riccio argues that this framework, while legally coherent, creates a "structural asymmetry" between large AI companies (which have the technical capacity to scrape the internet at scale) and individual creators (who may not know that opting out is possible, may not have the technical means to implement machine-readable restrictions, and may not be able to monitor compliance). The opt-out mechanism assumes an informed, technically capable rightsholderโ€”an assumption that describes well-resourced publishers but not individual artists, photographers, or writers.

The Output Problem: Who Owns AI-Generated Works?

If the input problem concerns training data, the output problem concerns what the AI produces. Khadka (2025) surveys the legal landscape across jurisdictions, identifying three approaches to AI-generated content ownership:

The "no author, no copyright" approach (US, EU default): Copyright requires a human author. Works generated by AI without meaningful human creative contribution are not copyrightable. This is the position of the US Copyright Office and the default interpretation under EU law.

The "employer/programmer as author" approach (UK, India, partially): Some jurisdictions attribute copyright in computer-generated works to the person who arranged for the work's creationโ€”typically the AI system's operator or developer. The UK's Copyright, Designs and Patents Act 1988, Section 9(3), takes this approach.

The "sui generis rights" approach (proposed): Several scholars and the European Parliament's JURI Committee have proposed new categories of intellectual property rights specifically designed for AI-generated worksโ€”rights that would be narrower than full copyright but would provide some commercial protection.

Each approach has consequences. The "no copyright" approach means AI-generated content enters the public domain immediately, potentially flooding markets with free content that competes with copyrighted human-created works. The "employer as author" approach concentrates IP ownership in the hands of AI companies. The "sui generis" approach requires new legislation that no jurisdiction has yet enacted.

The Harmonization Challenge

Bochkova (2025) examines how non-EU jurisdictions navigate harmonization with the EU's dual regulatory framework. Using Ukraine as a case studyโ€”a country that has committed to EU regulatory alignment as part of its accession processโ€”the paper maps the practical challenges of implementing both the AI Act and the DSM Directive in a legal system that was not designed for either.

The harmonization challenges are instructive for any jurisdiction contemplating similar alignment:

  • Definitional gaps: The AI Act's definition of "AI system" and the DSM Directive's definition of "text and data mining" do not map neatly onto each other. A system that falls within the AI Act's regulatory perimeter may or may not be conducting TDM as defined by the Directive.
  • Institutional capacity: Effective regulation requires both AI expertise (to assess system capabilities and risks) and IP expertise (to adjudicate copyright claims). Few national regulatory bodies possess both.
  • Enforcement asymmetry: The AI companies that need to be regulated are typically headquartered in the US or China, while the creators and users who need protection are distributed globally. Jurisdictional mismatch makes enforcement difficult.

Shu (2025) proposes a conceptually innovative approach from Chinese legal scholarship: the "copyright expectancy right" (็‰ˆๆƒๆœŸๅพ…ๆƒ). The framework addresses what Shu calls the "trilemma dilemma" of AI copyright governanceโ€”the simultaneous need to protect data sources, incentivize AI development, and maintain legal certainty.

The copyright expectancy right would function as a conditional right that attaches to training data at the point of ingestion and "matures" into full compensation if the AI system's output substantially incorporates recognizable elements of the original work. Unlike the opt-out approach (which is binary: either all use is permitted or all use is prohibited), the expectancy right creates a graduated system where compensation scales with the degree to which the training data contributed to the output.

This approach addresses a genuine limitation of current frameworks. The opt-out mechanism in the DSM Directive does not distinguish between training uses that produce substantially similar output (which intuitively should require compensation) and training uses that contribute marginally to a model's general capabilities (which intuitively should not). The expectancy right framework provides a mechanism for this distinctionโ€”though the practical challenges of determining "substantial incorporation" in a model with billions of parameters are considerable.

Yang (2025) investigates 30 typical global copyright cases of AI-generated works and explores three core regulation paths:

Creator qualification confirmation: Defining the dominant status of human creators in the AI-generated content process. Yang's analysis finds that establishing clear human-creator primacy improves copyright clarity by 35โ€“42%.

Right scope definition: Reasonably delineating the boundaries of copyright protection for AI-generated works. The study finds that clear scope definitions reduce infringement disputes by 40โ€“45%.

Multi-stakeholder collaborative governance: Engaging developers, users, platforms, and rightsholders in coordinated governance frameworks. Yang finds that such collaborative approaches enhance regulatory effectiveness by 32โ€“38%.

These three paths are not mutually exclusive but complementary. Yang argues that effective copyright governance for AI-generated works requires all three operating in concertโ€”confirming who qualifies as a creator, defining what rights attach, and establishing governance mechanisms that involve all stakeholders.

Claims and Evidence

<
ClaimEvidenceVerdict
The DSM Directive's opt-out mechanism adequately protects creatorsRiccio (2024): structural asymmetry between large AI companies and individual creatorsโŒ Refuted
AI-generated works are copyrightable under existing lawKhadka (2025): "no author, no copyright" is the majority position globallyโŒ Refuted (in most jurisdictions)
EU regulatory harmonization is straightforward for non-EU countriesBochkova (2025): definitional gaps, institutional capacity limits, enforcement asymmetryโŒ Refuted
Graduated compensation models (expectancy rights) are technically feasibleShu (2025): conceptually innovative but untested; attribution in billion-parameter models is an open problemโš ๏ธ Uncertain
There is scholarly consensus on AI authorshipYang (2025): three distinct regulation paths needed simultaneously, each with measurable but partial effectsโš ๏ธ Uncertain (complementary paths, no single solution)

Open Questions

  • Will the EU AI Act's transparency requirements for generative AI effectively protect copyright? Article 50 (in the enacted Regulation EU 2024/1689) requires disclosure of AI-generated content, but disclosure does not address the training data question. Is transparency sufficient, or is it a substitute for substantive regulation?
  • Can technical solutions (content provenance, watermarking, AI attribution) substitute for legal solutions? The C2PA standard and similar initiatives provide technical infrastructure for tracking content provenance. But technical standards work only if adoption is universalโ€”and adoption is voluntary.
  • How will courts handle the first major AI copyright case? Several cases are pending (New York Times v. OpenAI, Getty v. Stability AI). The outcomes will set precedents that the regulatory framework has left unresolved.
  • Should training data compensation be ex ante (licensing) or ex post (damages)? Collective licensing models provide ex ante compensation but require standardization. Litigation provides ex post compensation but is expensive and slow. The optimal regime likely involves both.
  • What happens when jurisdictions disagree? If the EU requires opt-out protection, China develops expectancy rights, and the US relies on fair use, AI companies operating globally face a fragmented compliance landscape. Will this lead to regulatory arbitrage, or to convergence?
  • Implications

    The unresolved tension between AI regulation and copyright law is not a niche legal issueโ€”it is a question about how the economic value generated by AI systems will be distributed across society. If training data use remains largely uncompensated, the value flows to AI companies and their investors. If creators can effectively assert rights over training data, the value is shared more broadly.

    The EU's approachโ€”regulating AI and copyright through separate instruments that do not fully account for each otherโ€”is unlikely to be the final word. A coherent framework will need to address input (training data), process (model development), and output (AI-generated content) as an integrated system rather than as separate legal problems. The jurisdictions that develop such frameworks first will have significant influence over the global governance of AI.

    References (5)

    [1] Riccio, G.M. (2024). AI, Data Mining and Copyright Law: Remarks about Lawfulness and Efficient Choices. Proc. MIPRO 2024.
    [2] Bochkova, I. (2025). Artificial Intelligence Regulation and Intellectual Property Governance: The EU AI Act, the DSM Copyright Directive, and Ukraine's Two-Stage Harmonisation Path. Law & Innovations, 4, 29โ€“36.
    [3] Khadka, R. (2025). Navigating the Legal Landscape of AI-Created Content: Intellectual Property, Accountability, and Regulation. International Journal of Science and Engineering Management.
    [4] Yang, S. (2025). Artificial Intelligence-Generated Works: Copyright Dilemmas, Theoretical Disputes and Regulation Paths. Modern Economics & Management Forum, 6(6), 4634.
    [5] Shu, W. (2025). Copyright Expectancy Right: Paradigm Reconstruction of AI Training Data Governance.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’