The Battle over Copyright in the Age of AI

💡

Originally published in German translation as Wie mit den Urheberrechtsdebatten um KI umgegangen werden könnte in Tagesspiegel Background

In the last few years, Generative AI models have begun to produce strikingly fluent language, polished images, and appealing music from within themselves, arguably demonstrating autonomous creativity. Names like ChatGPT and Stable Diffusion are in the news. Jina AI is part an emerging German 'AI sector', encompassing a growing collection of startups that provide AI-centered software and services. New Generative AI technologies have important consequences for intellectual property (IP) law, and how they are ultimately handled will strongly affect the development of this sector.

There is some debate about whether AI output can be IP at all. There is little clear law on this subject. Copyright requires works to reflect human creativity, but in practice creating works using AI models requires significant human involvement. Humans construct textual "prompts" to stimulate image production, then modify those prompts based on the result, and select among multiple productions. AI-generated texts and music follow much the same pattern. This is comparable to the way a photographer takes a picture, and that has clear IP protections.

A bigger problem is the way Generative AI models are made. Large AI models are expensive to produce and come primarily from a few sources, particularly OpenAI (which is mostly controlled by Microsoft), Meta (Facebook/Instagram), and Alphabet (Google/DeepMind). Jina AI and other start-ups are highly dependent on access to them.

These models are trained to produce their outputs using large quantities of human-made written text, images, music, etc. and the most useful ones have all been trained using data taken from the Internet without the consent of their owners. Writers and musicians have not responded very strongly to this unauthorized use, but it has evoked strong reactions from visual artists. “AI art is theft” has become a viral slogan.

There are lawsuits challenging Generative AI underway in California and the UK, and we can assume that there will be cases under EU law. These suits argue that Generative AI models are derivative works infringing on IP rights in their training data. Furthermore, they allege that anything created using those models is equally an infringing work, because neural network models merely store and then “recombine” the works of others. AI advocates challenge that characterization.

Under the laws of the US, UK, and countries inheriting their legal traditions, principles of “fair use” or “fair dealing” create broad but ambiguous exceptions to copyright, which might apply to AI training. But “fair use” has no equivalent in European and German law. Allowed exceptions to copyright are listed in Article 5 of the EU Copyright and Information Society Directive, and contain nothing to help AI models. Since the 2013 Ashby decision by the ECHR, European courts have begun to find an implicit “fair use” doctrine in the European Convention’s guarantee of freedom of expression, but this is unlikely to apply to for-profit software.

We cannot predict the future, but it is possible, perhaps likely, that European courts will find that models trained with unlicensed data infringe copyright. There is a good chance that US courts will also reject claims of "fair use". Whatever the outcome, this will impact the entire field. Copyrighted works are in the training data of all major Generative AI models, including ChatGPT and AI models that generate music, not just the ones that make images.

Nonetheless, even if legal claims fail, there is a strong moral case that Generative AI profits from the work of uncompensated IP owners. Regulatory law might attempt to compensate them.

One possible regulatory solution is a compulsory licensing regime akin to the ones in the music industry. The music industry is already comfortable with them. The publishing industry is unafraid of AI and authors might readily accept such a scheme. Visual artists, however, might resist.

An alternative would be for the AI industry to fund a common training dataset and pay creators directly for the use of their work. This would completely eliminate IP claims against Generative AI models, and would create a locus for action over other concerns with AI. It would also put at least a little bit of money in the pockets of creators, who are rarely well compensated for their work.

Although such a body raises concerns about unfair barriers to market entry, it has benefits. An industry association can self-regulate in ways that governments cannot. For example, objectionable imagery may be legally protected from state action as free speech, but an industry body can forbid training AI models to produce pornography or graphic violence, and can restrict consumers' usage though the terms in their software licenses.

Furthermore, Generative AI models consume an extraordinary amount of electricity to train. An industry body can verify the environmental conduct of its members, and eliminate the temptation to train models in places where energy is cheap but environmentally unfriendly.

We want the emerging German AI sector to prosper and we need a stable, competitive, well-regulated environment. Uncertainty about these IP issues risks undermining our open source tech stacks and overturning our infant industry, causing an exodus to less scrupulous countries. Using these kinds of tools to take control of Generative AI's training data has promising prospects for regulating the industry in a way that continues to encourage German and European growth in the AI sector, relieving smaller firms like Jina AI of uncertainty while retaining a highly-competitive market environment.