Law & Policy

AI Regulation Meets Copyright: The Unresolved Tension at the Heart of the EU AI Act

The EU AI Act and the DSM Copyright Directive were designed to govern different aspects of the digital economy. But generative AI sits at their intersection, creating legal ambiguities that neither framework anticipated. Five papers examine how this regulatory gap is being navigated—and why it matters for every jurisdiction watching the EU experiment.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Two of the European Union's landmark regulatory instruments—the AI Act (2024) and the Digital Single Market (DSM) Copyright Directive (2019)—were conceived in different eras of the AI conversation. The DSM Directive was drafted when "AI" primarily meant recommendation algorithms and automated content moderation. The AI Act was drafted as generative AI was emerging but before ChatGPT demonstrated to a global audience what large-scale text-and-data mining could produce. The result is a regulatory architecture with a gap at its center: generative AI systems that consume copyrighted works as training data and produce new works that may compete with the originals.

This gap is not a drafting error that can be fixed with an amendment. It reflects a genuine conceptual tension between two legitimate policy objectives: promoting AI innovation (which requires access to large-scale training data) and protecting creators' rights (which requires control over how their works are used). The resolution of this tension will shape the global AI industry, because the EU's regulatory choices—whether through the Brussels Effect or deliberate emulation—tend to become de facto global standards.

The Input Problem: Training Data and Copyright

Riccio (2024) examines the legality of using copyrighted works as AI training data under the DSM Directive's text-and-data mining (TDM) exceptions. The Directive provides two relevant provisions:

Article 3 permits TDM for scientific research by research organizations and cultural heritage institutions, without requiring rightsholder consent. This exception is broad and mandatory across EU member states.

Article 4 permits TDM for any purpose, including commercial, unless rightsholders have expressly opted out through "machine-readable means" (typically a robots.txt file or metadata declaration).

The practical implications are significant. For AI companies, Article 4 creates a default-permissive regime: they can use any publicly available copyrighted content as training data unless the rightsholder has affirmatively opted out. For creators, Article 4 shifts the burden of protection from the user to the rightsholder—a reversal of the traditional copyright default that requires permission before use.

Riccio argues that this framework, while legally coherent, creates a "structural asymmetry" between large AI companies (which have the technical capacity to scrape the internet at scale) and individual creators (who may not know that opting out is possible, may not have the technical means to implement machine-readable restrictions, and may not be able to monitor compliance). The opt-out mechanism assumes an informed, technically capable rightsholder—an assumption that describes well-resourced publishers but not individual artists, photographers, or writers.

The Output Problem: Who Owns AI-Generated Works?

If the input problem concerns training data, the output problem concerns what the AI produces. Khadka (2025) surveys the legal landscape across jurisdictions, identifying three approaches to AI-generated content ownership:

The "no author, no copyright" approach (US, EU default): Copyright requires a human author. Works generated by AI without meaningful human creative contribution are not copyrightable. This is the position of the US Copyright Office and the default interpretation under EU law.

The "employer/programmer as author" approach (UK, India, partially): Some jurisdictions attribute copyright in computer-generated works to the person who arranged for the work's creation—typically the AI system's operator or developer. The UK's Copyright, Designs and Patents Act 1988, Section 9(3), takes this approach.

The "sui generis rights" approach (proposed): Several scholars and the European Parliament's JURI Committee have proposed new categories of intellectual property rights specifically designed for AI-generated works—rights that would be narrower than full copyright but would provide some commercial protection.

Each approach has consequences. The "no copyright" approach means AI-generated content enters the public domain immediately, potentially flooding markets with free content that competes with copyrighted human-created works. The "employer as author" approach concentrates IP ownership in the hands of AI companies. The "sui generis" approach requires new legislation that no jurisdiction has yet enacted.

The Harmonization Challenge

Bochkova (2025) examines how non-EU jurisdictions navigate harmonization with the EU's dual regulatory framework. Using Ukraine as a case study—a country that has committed to EU regulatory alignment as part of its accession process—the paper maps the practical challenges of implementing both the AI Act and the DSM Directive in a legal system that was not designed for either.

The harmonization challenges are instructive for any jurisdiction contemplating similar alignment:

Definitional gaps: The AI Act's definition of "AI system" and the DSM Directive's definition of "text and data mining" do not map neatly onto each other. A system that falls within the AI Act's regulatory perimeter may or may not be conducting TDM as defined by the Directive.
Institutional capacity: Effective regulation requires both AI expertise (to assess system capabilities and risks) and IP expertise (to adjudicate copyright claims). Few national regulatory bodies possess both.
Enforcement asymmetry: The AI companies that need to be regulated are typically headquartered in the US or China, while the creators and users who need protection are distributed globally. Jurisdictional mismatch makes enforcement difficult.

The Chinese Perspective: Copyright Expectancy Rights

Shu (2025) proposes a conceptually innovative approach from Chinese legal scholarship: the "copyright expectancy right" (版权期待权). The framework addresses what Shu calls the "trilemma dilemma" of AI copyright governance—the simultaneous need to protect data sources, incentivize AI development, and maintain legal certainty.

The copyright expectancy right would function as a conditional right that attaches to training data at the point of ingestion and "matures" into full compensation if the AI system's output substantially incorporates recognizable elements of the original work. Unlike the opt-out approach (which is binary: either all use is permitted or all use is prohibited), the expectancy right creates a graduated system where compensation scales with the degree to which the training data contributed to the output.

This approach addresses a genuine limitation of current frameworks. The opt-out mechanism in the DSM Directive does not distinguish between training uses that produce substantially similar output (which intuitively should require compensation) and training uses that contribute marginally to a model's general capabilities (which intuitively should not). The expectancy right framework provides a mechanism for this distinction—though the practical challenges of determining "substantial incorporation" in a model with billions of parameters are considerable.

The Regulatory Paths: Copyright Governance for AI-Generated Works

Yang (2025) investigates 30 typical global copyright cases of AI-generated works and explores three core regulation paths:

Creator qualification confirmation: Defining the dominant status of human creators in the AI-generated content process. Yang's analysis finds that establishing clear human-creator primacy improves copyright clarity by 35–42%.

Right scope definition: Reasonably delineating the boundaries of copyright protection for AI-generated works. The study finds that clear scope definitions reduce infringement disputes by 40–45%.

Multi-stakeholder collaborative governance: Engaging developers, users, platforms, and rightsholders in coordinated governance frameworks. Yang finds that such collaborative approaches enhance regulatory effectiveness by 32–38%.

These three paths are not mutually exclusive but complementary. Yang argues that effective copyright governance for AI-generated works requires all three operating in concert—confirming who qualifies as a creator, defining what rights attach, and establishing governance mechanisms that involve all stakeholders.

Claims and Evidence

Claim	Evidence	Verdict
The DSM Directive's opt-out mechanism adequately protects creators	Riccio (2024): structural asymmetry between large AI companies and individual creators	❌ Refuted
AI-generated works are copyrightable under existing law	Khadka (2025): "no author, no copyright" is the majority position globally	❌ Refuted (in most jurisdictions)
EU regulatory harmonization is straightforward for non-EU countries	Bochkova (2025): definitional gaps, institutional capacity limits, enforcement asymmetry	❌ Refuted
Graduated compensation models (expectancy rights) are technically feasible	Shu (2025): conceptually innovative but untested; attribution in billion-parameter models is an open problem	⚠️ Uncertain
There is scholarly consensus on AI authorship	Yang (2025): three distinct regulation paths needed simultaneously, each with measurable but partial effects	⚠️ Uncertain (complementary paths, no single solution)

Open Questions

Will the EU AI Act's transparency requirements for generative AI effectively protect copyright? Article 50 (in the enacted Regulation EU 2024/1689) requires disclosure of AI-generated content, but disclosure does not address the training data question. Is transparency sufficient, or is it a substitute for substantive regulation?

Can technical solutions (content provenance, watermarking, AI attribution) substitute for legal solutions? The C2PA standard and similar initiatives provide technical infrastructure for tracking content provenance. But technical standards work only if adoption is universal—and adoption is voluntary.

How will courts handle the first major AI copyright case? Several cases are pending (New York Times v. OpenAI, Getty v. Stability AI). The outcomes will set precedents that the regulatory framework has left unresolved.

Should training data compensation be ex ante (licensing) or ex post (damages)? Collective licensing models provide ex ante compensation but require standardization. Litigation provides ex post compensation but is expensive and slow. The optimal regime likely involves both.

What happens when jurisdictions disagree? If the EU requires opt-out protection, China develops expectancy rights, and the US relies on fair use, AI companies operating globally face a fragmented compliance landscape. Will this lead to regulatory arbitrage, or to convergence?

Implications

The unresolved tension between AI regulation and copyright law is not a niche legal issue—it is a question about how the economic value generated by AI systems will be distributed across society. If training data use remains largely uncompensated, the value flows to AI companies and their investors. If creators can effectively assert rights over training data, the value is shared more broadly.

The EU's approach—regulating AI and copyright through separate instruments that do not fully account for each other—is unlikely to be the final word. A coherent framework will need to address input (training data), process (model development), and output (AI-generated content) as an integrated system rather than as separate legal problems. The jurisdictions that develop such frameworks first will have significant influence over the global governance of AI.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 저작물에 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 반드시 확인해야 한다.

AI 규제와 저작권의 충돌: EU AI Act의 핵심에 놓인 미해결 긴장

유럽연합의 두 가지 획기적인 규제 수단인 AI Act(2024)와 디지털 단일 시장(DSM) 저작권 지침(2019)은 AI 논의의 서로 다른 시대에 구상되었다. DSM 지침은 "AI"가 주로 추천 알고리즘과 자동화된 콘텐츠 중재를 의미하던 시기에 초안이 작성되었다. AI Act는 생성형 AI가 등장하던 시기에 초안이 작성되었지만, ChatGPT가 대규모 텍스트 및 데이터 마이닝이 무엇을 만들어낼 수 있는지를 전 세계 청중에게 보여주기 전이었다. 그 결과, 규제 체계의 중심에 공백이 생겼다. 바로 저작권이 있는 저작물을 훈련 데이터로 소비하고, 원본과 경쟁할 수 있는 새로운 저작물을 생산하는 생성형 AI 시스템이다.

이 공백은 수정안으로 해결할 수 있는 입안 오류가 아니다. 이는 두 가지 정당한 정책 목표 사이의 진정한 개념적 긴장을 반영한다. 하나는 AI 혁신 촉진(대규모 훈련 데이터에 대한 접근이 필요)이고, 다른 하나는 창작자 권리 보호(저작물이 어떻게 사용되는지에 대한 통제가 필요)이다. 이 긴장의 해소는 글로벌 AI 산업의 형태를 결정할 것인데, EU의 규제적 선택은—브뤼셀 효과(Brussels Effect)를 통해서든 의도적인 모방을 통해서든—사실상의 글로벌 표준이 되는 경향이 있기 때문이다.

투입 문제: 훈련 데이터와 저작권

Riccio(2024)는 DSM 지침의 텍스트 및 데이터 마이닝(TDM) 예외 조항 하에서 저작권이 있는 저작물을 AI 훈련 데이터로 사용하는 것의 합법성을 검토한다. 이 지침은 두 가지 관련 조항을 규정한다.

제3조는 연구 기관 및 문화유산 기관의 과학적 연구 목적의 TDM을 권리자의 동의 없이 허용한다. 이 예외는 EU 회원국 전반에 걸쳐 광범위하고 강제적으로 적용된다.

제4조는 권리자가 "기계 판독 가능한 수단"(일반적으로 robots.txt 파일 또는 메타데이터 선언)을 통해 명시적으로 거부 의사를 표명하지 않는 한, 상업적 목적을 포함한 모든 목적의 TDM을 허용한다.

실질적인 함의는 중요하다. AI 기업의 경우, 제4조는 기본 허용 체계를 만든다. 즉, 권리자가 적극적으로 거부 의사를 표명하지 않는 한, 공개적으로 이용 가능한 모든 저작권 콘텐츠를 훈련 데이터로 사용할 수 있다. 창작자의 경우, 제4조는 보호의 부담을 이용자에서 권리자로 전환한다. 이는 사용 전 허가를 요구하는 전통적인 저작권 기본 원칙의 역전이다.

Riccio는 이 체계가 법적으로 일관성이 있지만, 대형 AI 기업(대규모로 인터넷을 스크래핑할 기술적 역량을 보유)과 개인 창작자(거부 옵션이 가능하다는 사실을 모를 수 있고, 기계 판독 가능한 제한을 구현할 기술적 수단이 없을 수 있으며, 준수 여부를 모니터링할 수 없을 수 있는) 사이에 "구조적 비대칭"을 만든다고 주장한다. 거부 메커니즘은 정보를 충분히 갖추고 기술적으로 유능한 권리자를 전제로 하는데, 이는 충분한 자원을 갖춘 출판사에는 잘 부합하지만 개인 예술가, 사진작가, 작가에게는 그렇지 않다.

산출 문제: AI 생성 저작물의 소유권은 누구에게 있는가?

투입 문제가 훈련 데이터에 관한 것이라면, 산출 문제는 AI가 생산하는 것에 관한 것이다. Khadka(2025)는 여러 관할권에 걸친 법적 환경을 조사하여 AI 생성 콘텐츠 소유권에 대한 세 가지 접근 방식을 파악한다.

"저자 없음, 저작권 없음" 접근 방식 (미국, EU 기본 원칙): 저작권은 인간 저자를 필요로 한다. 의미 있는 인간의 창의적 기여 없이 AI가 생성한 저작물은 저작권 보호를 받을 수 없다. 이것이 미국 저작권청의 입장이자 EU법 하의 기본 해석이다. "고용주/프로그래머를 저작자로 보는" 접근법 (영국, 인도, 부분적 채택): 일부 법적 관할권은 컴퓨터 생성 저작물의 저작권을 해당 저작물의 창작을 주도한 자, 즉 일반적으로 AI 시스템의 운영자 또는 개발자에게 귀속시킨다. 영국의 「저작권, 디자인 및 특허법」(Copyright, Designs and Patents Act) 1988년 제9조 제3항이 이러한 방식을 채택하고 있다.

"sui generis 권리" 접근법 (제안됨): 여러 학자들과 유럽의회 JURI 위원회는 AI 생성 저작물을 위해 특별히 설계된 새로운 지식재산권 범주를 제안해 왔다. 이는 완전한 저작권보다는 좁은 권리이지만 일정한 상업적 보호를 제공하는 권리이다.

각 접근법은 서로 다른 결과를 낳는다. "저작권 없음" 접근법은 AI 생성 콘텐츠가 즉시 공공 영역(public domain)에 진입하게 하여, 저작권으로 보호되는 인간 창작 저작물과 경쟁하는 무료 콘텐츠가 시장에 넘쳐날 가능성이 있다. "고용주를 저작자로 보는" 접근법은 IP 소유권을 AI 기업들의 손에 집중시킨다. "sui generis" 접근법은 아직 어느 관할권에서도 제정하지 않은 새로운 입법을 요구한다.

조화(Harmonization)의 과제

Bochkova(2025)는 EU 외 관할권들이 EU의 이중 규제 프레임워크와의 조화를 어떻게 모색하는지 검토한다. EU 가입 절차의 일환으로 EU 규제 조화를 약속한 국가인 우크라이나를 사례 연구로 활용하여, EU AI Act와 DSM Directive 어느 쪽에도 맞게 설계되지 않은 법체계 내에서 이 두 규범을 이행하는 데 따르는 실질적 난제를 이 논문은 체계적으로 정리한다.

이러한 조화의 과제는 유사한 조화를 고려하는 모든 관할권에 시사하는 바가 크다.

정의(definition)상의 공백: AI Act의 "AI 시스템" 정의와 DSM Directive의 "텍스트 및 데이터 마이닝(TDM)" 정의는 서로 정확히 대응되지 않는다. AI Act의 규제 범위에 포함되는 시스템이 Directive에서 정의하는 TDM을 수행하고 있을 수도 있고 그렇지 않을 수도 있다.
기관 역량: 효과적인 규제를 위해서는 AI 전문성(시스템 능력 및 위험 평가)과 IP 전문성(저작권 분쟁 판단) 모두가 필요하다. 그러나 이 두 가지를 모두 갖춘 국가 규제 기관은 극히 드물다.
집행의 비대칭성: 규제 대상인 AI 기업들은 통상 미국이나 중국에 본사를 두고 있는 반면, 보호가 필요한 창작자와 이용자는 전 세계에 분산되어 있다. 이러한 관할권 불일치로 인해 집행이 어렵다.

중국의 관점: 저작권 기대권(Copyright Expectancy Right)

Shu(2025)는 중국 법학계로부터 개념적으로 혁신적인 접근법을 제안한다. 바로 "저작권 기대권"(版权期待权)이다. 이 프레임워크는 Shu가 AI 저작권 거버넌스의 "삼중 딜레마(trilemma dilemma)"라고 부르는 문제, 즉 데이터 원천 보호, AI 개발 촉진, 법적 확실성 유지를 동시에 충족해야 하는 요구를 다룬다.

저작권 기대권은 학습 데이터가 수집·처리되는 시점에 해당 데이터에 부착되는 조건부 권리로 기능하며, AI 시스템의 출력물이 원저작물의 식별 가능한 요소를 실질적으로 포함하는 경우 완전한 보상 청구권으로 "성숙"하게 된다. 이용을 전면 허용하거나 전면 금지하는 이분법적 방식인 옵트아웃(opt-out) 접근법과 달리, 기대권은 학습 데이터가 출력물에 기여한 정도에 따라 보상이 단계적으로 산정되는 방식을 채택한다.

이 접근법은 현행 프레임워크의 실질적 한계를 해소한다. DSM Directive의 옵트아웃 메커니즘은 실질적으로 유사한 출력물을 생성하는 학습 이용(직관적으로 보상이 필요한 경우)과 모델의 일반적 능력에 미미하게 기여하는 학습 이용(직관적으로 보상이 불필요한 경우)을 구분하지 못한다. 기대권 프레임워크는 이 구분을 위한 메커니즘을 제공한다. 다만 수십억 개의 파라미터를 가진 모델에서 "실질적 포함"을 판단하는 실무적 어려움은 상당하다.

다중 이해관계자 협력 거버넌스: 개발자, 이용자, 플랫폼, 권리보유자를 조율된 거버넌스 체계에 참여시키는 것이다. Yang은 이러한 협력적 접근 방식이 규제 효과성을 32–38% 향상시킨다고 밝힌다.

주장과 근거

주장	근거	판정
DSM 지침의 옵트아웃 메커니즘은 창작자를 충분히 보호한다	Riccio (2024): 대형 AI 기업과 개인 창작자 간의 구조적 비대칭성	❌ 반박됨
AI 생성 저작물은 현행법상 저작권 보호 대상이다	Khadka (2025): "저자 없으면 저작권 없다"는 입장이 전 세계적으로 다수설	❌ 반박됨 (대부분의 법적 관할권에서)
비EU 국가들의 EU 규제 조화는 간단하다	Bochkova (2025): 정의상의 공백, 제도적 역량 한계, 집행 비대칭성	❌ 반박됨
단계적 보상 모델(기대권)은 기술적으로 실현 가능하다	Shu (2025): 개념적으로는 혁신적이나 검증되지 않음; 수십억 매개변수 모델에서의 귀속은 미해결 문제	⚠️ 불확실
AI 저작권에 관한 학술적 합의가 존재한다	Yang (2025): 각각 측정 가능하지만 부분적인 효과를 지닌 세 가지 별개의 규제 경로가 동시에 필요함	⚠️ 불확실 (상호 보완적 경로이며, 단일 해법 없음)

미해결 쟁점

기술적 해법(콘텐츠 출처 증명, 워터마킹, AI 귀속)이 법적 해법을 대체할 수 있는가? C2PA 표준 및 유사 이니셔티브는 콘텐츠 출처를 추적하기 위한 기술적 인프라를 제공한다. 그러나 기술 표준은 채택이 보편적일 때만 작동하며, 채택은 자발적이다.

학습 데이터 보상은 사전적(라이선싱)이어야 하는가, 아니면 사후적(손해배상)이어야 하는가? 집단 라이선싱 모델은 사전적 보상을 제공하지만 표준화가 필요하다. 소송은 사후적 보상을 제공하지만 비용이 많이 들고 느리다. 최적의 체계는 아마도 두 가지 모두를 포함할 것이다.

법적 관할권 간에 의견이 불일치하면 어떻게 되는가? EU가 옵트아웃 보호를 요구하고, 중국이 기대권을 개발하며, 미국이 공정 이용에 의존한다면, 전 세계적으로 운영되는 AI 기업들은 파편화된 컴플라이언스 환경에 직면하게 된다. 이는 규제 차익거래로 이어질 것인가, 아니면 수렴으로 이어질 것인가?

시사점

AI 규제와 저작권법 사이의 미해결 긴장은 틈새적인 법적 문제가 아니라, AI 시스템이 창출하는 경제적 가치가 사회 전반에 어떻게 분배될 것인가에 관한 문제이다. 학습 데이터 이용이 대체로 무보상으로 유지된다면, 그 가치는 AI 기업과 그 투자자에게 귀속된다. 창작자가 학습 데이터에 대한 권리를 효과적으로 주장할 수 있다면, 가치는 보다 광범위하게 공유된다. EU의 접근 방식—AI와 저작권을 서로를 충분히 고려하지 않은 별도의 수단을 통해 규율하는 것—은 최종적인 해답이 되기 어렵다. 일관된 프레임워크는 입력(훈련 데이터), 과정(모델 개발), 출력(AI 생성 콘텐츠)을 별개의 법적 문제가 아닌 통합된 시스템으로 다루어야 할 것이다. 이러한 프레임워크를 먼저 개발하는 국가들은 AI의 글로벌 거버넌스에 상당한 영향력을 행사하게 될 것이다.

References (5)

[1] Riccio, G.M. (2024). AI, Data Mining and Copyright Law: Remarks about Lawfulness and Efficient Choices. Proc. MIPRO 2024.

DOI Scholar

[2] Bochkova, I. (2025). Artificial Intelligence Regulation and Intellectual Property Governance: The EU AI Act, the DSM Copyright Directive, and Ukraine's Two-Stage Harmonisation Path. Law & Innovations, 4, 29–36.

DOI Scholar

[3] Khadka, R. (2025). Navigating the Legal Landscape of AI-Created Content: Intellectual Property, Accountability, and Regulation. International Journal of Science and Engineering Management.

DOI Scholar

[4] Yang, S. (2025). Artificial Intelligence-Generated Works: Copyright Dilemmas, Theoretical Disputes and Regulation Paths. Modern Economics & Management Forum, 6(6), 4634.

DOI Scholar

[5] Shu, W. (2025). Copyright Expectancy Right: Paradigm Reconstruction of AI Training Data Governance.

DOI Scholar

AI Regulation Meets Copyright: The Unresolved Tension at the Heart of the EU AI Act

The Input Problem: Training Data and Copyright

The Output Problem: Who Owns AI-Generated Works?

The Harmonization Challenge

The Chinese Perspective: Copyright Expectancy Rights

The Regulatory Paths: Copyright Governance for AI-Generated Works

Claims and Evidence

Open Questions

Implications

AI 규제와 저작권의 충돌: EU AI Act의 핵심에 놓인 미해결 긴장

투입 문제: 훈련 데이터와 저작권

산출 문제: AI 생성 저작물의 소유권은 누구에게 있는가?

조화(Harmonization)의 과제

중국의 관점: 저작권 기대권(Copyright Expectancy Right)

규제 경로: AI 생성 저작물의 저작권 거버넌스

주장과 근거

미해결 쟁점

시사점

References (5)

Explore this topic deeper