Weekly take: OpenAI must be denied free rein over copyright-protected works

January 24 2024

It’s unfair to expect artists and authors to give up their rights when tech companies won’t make any concessions when others use their creations

I have never been a fan of big tech companies.

I often prefer using browsers that offer more privacy to users over Google. If work and family didn’t demand it, I’d like to spend my life at an off-grid farm far away from WhatsApp and social media.

One of the reasons I’m not a fan of big tech companies is that they tend to put profitability over everything else, sometimes including user safety and privacy.

I understand that businesses need to be profitable. But in an ideal world, one would hope these companies would go about earning profits in a way that didn't undermine the rights of communities.

For instance, let’s consider OpenAI’s submission before a committee at the House of Lords (the upper house of the UK parliament) earlier this month.

OpenAI said it would be impossible to train its artificial intelligence (AI) models without access to copyright-protected material.

Open AI made that submission in response to the Communications and Digital Select Committee’s inquiry into large language models (LLMs) – a type of AI algorithm that uses deep learning techniques and large data sets to generate new content.

LLMs are what power popular generative AI tools like ChatGPT and Google Bard.

OpenAI’s submission seems to suggest that it should be able to use copyrighted material to train its models unless rights owners specifically opt out because voluntarily securing a licence for every work on the internet is simply not viable.

However, several AI models out there use fully licensed material, including tools created by large companies like Getty Images and Nvidia.

So, what’s stopping OpenAI from taking an opt-in approach rather than forcing copyright owners to opt-out every time their works are used?

I, like many others, suspect it’s simply a question of scale and profitability. OpenAI can’t be as profitable unless it has free access to copyright-protected material.

Faulty arguments

OpenAI’s argument, according to its submission to the committee, is that: “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today's citizens.”

Throughout its submission to the committee and in previous court documents defending copyright lawsuits filed against it, OpenAI has maintained that “copyright law does not forbid training”.

I’ll let the courts decide the viability of that argument.

What OpenAI seems to be betting on is a possible legal loophole that would allow it to appropriate the work of copyright owners.

This includes individual authors, artists, and other creators.

It’s worth highlighting that many of those creators have received inadequate compensation for their protected works and are far from being as profitable as big tech companies.

I’m not sure how OpenAI has the cheek to argue that it should be able to use copyrighted works for free unless the owner opts out, especially when you consider that tech companies would be unlikely to allow others to use their proprietary technology without a licence.

A report published by Massachusetts Institute of Technology in December 2023 showed that every new entrant, startup, and research company is dependent on the computing structures of big tech companies such as Amazon, Microsoft, and Google to train their systems.

All of those major companies charge a licensing fee before allowing such use.

Indeed, tech companies have insisted for years that those who want to use their inventions must compensate them.

Take standard-essential patent owners, for example. They won’t allow implementers to use their technology for free just because the needs of citizens will be met.

Why should artists and writers be expected to make that concession to OpenAI when the company, despite what it may claim to the contrary, is primarily driven by profits rather than working towards the greater good?

Also, with Microsoft reportedly being its largest investor, one would assume that OpenAI can afford to secure the rights to swathes of copyright-protected work.

Poor efforts

OpenAI attempted to dilute some of the controversies around using copyrighted material to train its tools by introducing an opt-out mechanism for creators a few months back.

But some users have found that the opt-out procedure was “so onerous that it nearly seems as though it was created specifically to fail”.

Wouldn’t it have been fairer for OpenAI to provide an opt-in method for rights owners instead?

But that would make life difficult for the company, which seemingly wants to shift the burden onto the creators.

OpenAI’s actions seem to suggest that it’s okay to use someone else’s IP without authorisation until the user asks the infringer to stop such use.

Even then, the rights owner would not be compensated for the past infringement of their rights.

When large technology companies cry foul over others using their technology without a licence, regardless of how essential that technology might be to users, it’s unfair to expect artists and authors to give any leeway to profitmaking entities.

I sincerely hope that legislators and courts will see through OpenAI’s argument that allowing the use of copyright-protected works for AI training is for the greater good.

Otherwise, the ones who suffer the most will be those who have already been compensated inadequately for their contribution to the creative economy.

Meanwhile, the tech giants will get richer.