Training AI models and copyright challenges: Chinese and EU insights

Meyke Rietveld and Clemens Molle of Bird & Bird, along with Emma Ren from the association team at Beijing Lawjay Partners, discuss the first copyright cases in China and the EU dealing with training AI models, and newly enacted legislation in this field

As AI models evolve rapidly, the scrutiny over the copyright issues involved in training these models is increasing, leading to a global surge in litigation. Courts in China and the EU are currently hearing their first cases on this subject. AI model providers should closely monitor the outcomes of these cases, due to their potential to influence the AI training landscape. Additionally, by August 2025, providers that offer general-purpose AI models in the EU will need to comply with new copyright obligations introduced by the AI Act (Regulation (EU) 2024/1689).

China

In China, the first case to address whether AI training constitutes copyright infringement is pending before the Beijing Internet Court. The hearing of the case took place in June 2024.

The case was brought by four Chinese illustrators against the developers of Trik AI, a generative AI painting app released by the famous Chinese lifestyle platform Xiaohongshu, with a particular strength in generating traditional Chinese-style paintings. The illustrators claimed, among others, that Xiaohongshu’s use of their work to train its AI model constituted an infringement of their reproduction rights. Xiaohongshu, on the other hand, contends that using the plaintiffs’ works for AI training should be considered "fair use" under Article 24 of the Chinese Copyright Law. A judgment in this case has not yet been issued.

From a technical perspective, establishing copyright infringement in China depends on whether the AI model has made a ‘temporary’ or ‘permanent’ reproduction of a copyrighted work during the training process. Under the Chinese Copyright Law, the general rule is that it is only when a permanent copy is made that copyright infringement occurs.

The fair use doctrine in China

Article 24 of the Chinese Copyright Law exhaustively lists 12 situations in which the use of copyrighted material constitutes permitted fair use, but it is difficult to categorise the use of copyrighted materials in AI training within any of these scenarios. Nonetheless, the Chinese Supreme People’s Court’s judicial policy document offers some flexibility in applying the fair use doctrine under special circumstances, providing factors under which the use of works may be determined reasonable for promoting technological innovation and business development.

The Notice of the Supreme People’s Court on Issuing the Opinions on Issues concerning Maximizing the Role of Intellectual Property Right Trials in Boosting the Great Development and Great Prosperity of Socialist Culture and Promoting the Independent and Coordinated Development of Economy, issued in 2011, states: “Under special circumstances as necessary for promoting technological innovation and business development, a use of works may be determined as reasonable after consideration of the nature and purposes of use, the nature of works used, the quantity and quality of the part of works used, impacts of use on potential markets or values, and other factors, provided that such use neither contravenes the normal use of the works nor results in unreasonable damage to the lawful interests of the author. For acts of copying, drawing, photographing, or videotaping artistic works set up or exhibited at outdoor public sites, the use of results from such acts in a reasonable manner and within a reasonable extent shall be determined as reasonable, whether the use is for commercial purposes or not.”

However, interpreting training AI with copyrighted material as fair use would require the Beijing Internet Court to take a bold stance, as the law does not explicitly support this interpretation.

Questions to be resolved in revision of regulations

A revision of China’s Regulations for the Implementation of the Copyright Law is under way, and it remains to be seen whether it will include limitations on rights related to AI training data. In the academic field, some scholars argue that AI training does not aim to exploit or replicate the substantive expression of existing works. Instead, it would seek to learn the patterns of style and content production, transforming training data into probability distributions that generate specific texts or images, without mirroring the original work’s expression.

Additionally, AI models typically incorporate random seeds into their algorithms, making it unlikely that they will produce content substantially similar to prior works. Even if similarities do occur, they are more likely to be related to the overall style, which are elements that are not protected by copyright law. Furthermore, requiring AI model companies to obtain copyright licences for training data is challenging in China. China’s collective copyright management organisations are still developing, and the sheer volume of training data makes it practically difficult for AI companies to secure licences.

Balancing the protection of copyright holders with the needs of AI development in a practically feasible manner is an important policy and legal issue that needs to be addressed.

EU

As in China, AI litigation is on the rise in the EU, also related to the input side of AI. In 2023, the Prague Municipal Court was the first to rule on the output side of AI; namely, that copyright protection would not apply to certain AI-generated works. Recently, in July 2024, the District Court of Hamburg held a hearing in the case discussing the input side of AI, in a case between LAION e.V. and the German photographer Robert Kneschke. This is the first case to have addressed the use of copyrighted material to train AI models in the EU.

First, to outline the relevant legal framework, EU law provides for an exhaustive list of exceptions and limitations to copyright, which allow a third party to perform certain acts that a copyright holder would otherwise be able to prohibit. These exceptions and limitations are laid down in the InfoSoc Directive (2001/29) and the Directive on Copyright in the Digital Single Market (2019/790), as transposed into the national laws of EU member states.

One of these exceptions, which has gained significant traction in the EU, relates to reproductions and extractions for text and data mining (TDM) purposes. Under this exception, TDM of copyrighted works is only permitted if the rights holders have not expressly reserved their rights against such use.

This reservation by rights holders must be made in an appropriate manner, such as in machine-readable form for content that is publicly available online. With the recent enactment of the AI Act, EU legislators have taken the stance that training AI models with copyrighted material falls within the scope of TDM. Consequently, rights holders may choose to opt out of having their copyrighted works used for training AI models.

The Kneschke v LAION case

In the German LAION case, the key issue is whether downloading an image of Kneschke from a stock site and using it to train the LAION 5B open AI data set qualifies as a permissible act of TDM. Notably, the stock site’s general terms and conditions contained an ‘opt-out’ provision, aimed at preventing TDM (“YOU MAY NOT […] Use automated programs, applets, bots or the like to access the Bigstock.com website or any content thereon for any purpose, including, by way of example only, downloading Content, indexing, scraping or caching any content on the website”).

The main question that remains, however, is whether this opt-out meets the requirement of being ‘machine readable’. Kneschke argues that it is machine readable, because it is in English and formatted in HTML. LAION argues that it is not machine readable, since the opt-out is not provided in a specific standardised format, such as robots.txt. The District Court of Hamburg is expected to render its decision in September 2024, but it is already evident that there is a need for more precise guidance and standardisation on the practical implementation of opt-outs.

Meanwhile, several collective copyright management organisations in the EU are trying to collectively opt out all their works from TDM or are strongly encouraging copyright holders to use the TDM opt-out themselves, with digital tools being suggested to facilitate this process, despite the prevailing legal uncertainties (see, for example, https://pictoright.nl/en/category/news/ and https://federatiebeeldrechten.nl/optout).

New compliance obligations under the AI Act

In parallel, the hotly debated AI Act introduces copyright compliance obligations for providers of ‘general-purpose AI models’. This adds a layer of complexity but will hopefully eventually lead to more legal certainty. By August 2025, providers of general-purpose AI models must have implemented policies that identify and comply with TDM opt-outs, using state-of-the-art technologies as needed. Secondly, providers must disclose summaries of the content used to train their AI models, following a template yet to be issued by the European AI Office.

The future of copyright and training AI models

Overall, the cases in the EU and China, as well as others in the US and the UK, are merely the beginning, signalling that many more legal challenges concerning the input side of AI are yet to come.