We have recently seen the publication of the Government’s Copyright and AI Consultation paper. This my take on it.
I co-chair the All Party Parliamentary Group for AI and chaired the AI select Committee committee and wrote a book earlier this year on AI regulation. Before that I had a career as an lawyer defending copyright and creativity and in the House of Lords, I’ve have been my Party’s creative industry spokesperson. The question of IP and AI absolutely for me is the key issue which has arisen in relation to Generative AI models. It is one thing to use tech, another to be at the mercy of it.
It is a major issue not just in the UK, but around the world. Getty and the New York Times are suing in the United States, so too many writers, artists and musicians and it was at the root of the Hollywood Actor and Writers strike last year .
Here in the UK, as the Government’s intentions have become clearer the temperature has risen. We have seen the creation of a new campaign -Creative Rights in AI Coalition (CRAIC) across the creative and news industries and Ed Newton-Rex raising over 30,000 signatories from creators and creative organisations.
But with the new government consultation which came out a few days ago we are now faced with a proposal regarding text and data mining exception which we thought was settled under the last Government. It starts from the false premise of legal uncertainty.
As the News Media Association say:
The government’s consultation is based on the mistaken idea—promoted by tech lobbyists and echoed in the consultation—that there is a lack of clarity in existing copyright law. This is completely untrue: the use of copyrighted content by Gen AI firms without a license is theft on a mass scale, and there is no objective case for a new text and data mining exception.
There is no lack of clarity over how AI developers can legally access training data. UK law is absolutely clear that commercial organisations – including Gen AI developers – must license the data they use to train their Large Language Models (“LLMs”).
Merely because AI platforms such as Stability AI are resisting claims doesn’t mean the law in the UK is uncertain. There is no need for developers to find ‘it difficult to navigate copyright law in the UK’.
AI developers have already in a number of cases reached agreement with between news publishers. OpenAI has signed deals with publishers like News Corp, Axel Springer, The Atlantic, and Reuters, offering annual payments between $1 million and $5 million, with News Corp’s deal reportedly worth $250 million over five years.
There can be no excuse of market failure. There are well established licensing solutions administered by a variety of well-established mechanisms and collecting societies. There should be no uncertainty around the existing law. We have some of the most effective collective rights organisations in the world. Licensing is their bread and butter.
The Consultation paper says that “The government believes that the best way to achieve these objectives is through a package of interventions that can balance the needs of the two sectors” Ministers Lord Vallance, and Feryal Clark MP seem to think we need a balance between the creative industries and the tech industries. But what kind of balance is this?
The government is proposing to change the UK’s copyright framework by creating a text and data mining exception where rights holders have not expressly reserved their rights—in other words, an ‘opt-out’ system, where content is free to use unless a rights holder proactively withholds consent. To complement this, the government is proposing: (a) transparency provisions; and (b) provisions to ensure that rights reservation mechanisms are effective.
The government has stated that it will only move ahead with its preferred ‘rights reservation’ option if the transparency and rights reservation provisions are ‘effective, accessible, and widely adopted’. However, it will be up to Ministers to decide what provisions meet this standard, and it is clear that the government wishes to move ahead with this option regardless of workability, without knowing if their own standards for implementation can be met.
Although it is absolutely clear that we know that use of copyright works to train AI models is contrary to UK copyright law, the laws around transparency of these activities haven’t caught up. As well as using pirated e-books in their training data, AI developers scrape the internet for valuable professional journalism and other media in breach of both the terms of service of websites and copyright law, for use in training commercial AI models.
At present, developers can do this without declaring their identity, or they may use IP scraped to appear in a search index for the completely different commercial purpose of training AI models.
How can rights owners opt-out of something they don’t know about? AI developers will often scrape websites, or access other pirated material before they launch an LLM in public. This means there is no way for IP owners to opt-out of their material being taken before its inclusion in these models. And once used to train these models, the commercial value has already been extracted from IP scraped without permission with no way to delete data from those models.
The next wave of AI models responds to user queries by browsing the web to extract valuable news and information from professional news websites. This is known as Retrieval Augmented Generation-RAG. Without payment for extracting this commercial value, AI agents built by companies such as Perplexity, Google and Meta, will effectively free ride on the professional hard work of journalists, authors and creators. At present such crawlers are hard to block.
This is incredibly concerning, given that no effective ‘rights reservation’ system for the use of content by Gen AI models has been proposed or implemented anywhere in the world, making the government proposals entirely speculative.
As the NMA also say What the government is proposing is an incredibly unfair trade-off—giving the creative industries a vague commitment to transparency, whilst giving the rights of hundreds of thousands of creators to Gen AI firms. While creators are desperate for a solution after years of copyright theft by Gen AI firms, making a crime legal cannot be the solution to mass theft.
We need transparency and clear statement about copyright. We absolutely should not expect artists to have to opt out. AI developers must: be transparent about the identity of their crawlers; be transparent about the purposes of their crawlers; and have separate crawlers for distinct purposes.
Unless news publishers and the broader creative industries can retain control over their data – making UK copyright law enforceable – AI firms will be free to scrape the web without remunerating creators. This will not only reduce investment in trusted journalism, but it will ultimately harm innovation in the AI sector. If less and less human-authored IP is produced, tech developers will lack the high-quality data that is the essential fuel in generative AI.
Amending UK law to address the challenges posed by AI development, particularly in relation to copyright and transparency, is essential to protect the rights of creators, foster responsible innovation, and ensure a sustainable future for the creative industries.
This should apply regardless of which country the scraping of copyright material takes place if developers market their product in the UK, regardless of where the training takes place.
It will also ensure that AI start-ups based in the UK are not put at a competitive disadvantage due to the ability of international firms to conduct training in a different jurisdiction
It is clear that AI developers have used their lobbying clout to persuade the government that a new exemption from copyright in their favour is required. As a result, the government seem to have sold out to the tech bros.
In response the creative industries and supporters such as myself will be vigorously opposing government plans for a new text and data mining exemption and ensuring we get answers to our questions:
What led the government to do a u-turn on the previous government’s decision to drop the text and data mining exemption it proposed?
What estimate of the damage to the creative industries it has made of implementing its clearly favoured option of a TDM plus opt out?
Is damaging the most successful UK economic sector for the benefit of US AI developers what it means by balance?
Why it has not included the possibility of an opt in to a TDM in its consultation paper options?
What is the difference between rights reservation and opting out? Isn’t this pure semantics?
What examples of successful workable opt outs or rights reservation from TDM’s can it draw on particularly for small rights holders? What research has it done? the paper essentially admits that effective technology is not there yet. Isn’t it clear that the EU opt out system under the Copyright Directive has not delivered clarity?
What regulatory mechanism if any does the government envisage if its proposal for a TDM with rights reservation/opt out is adopted? How are creators going to be sure any new system would work in the first place?
12th August 2022