KR21 Principles on Artificial Intelligence, Science and Research

Today KR21 launches two papers on AI, Science and Research. One is a short briefing summarising some of the key questions and main policy issues around AI. The second, more in-depth document, takes the perspective of the need for AI regulatory developments to take a sector-specific approach that identifies the particular requirements of a European AI environment which proactively supports science, research and knowledge valorisation.

AI is a general purpose technology which can be deployed horizontally across numerous sectors of our economy and society to the benefit of humanity. It has, in particular, a major role to play in accelerating science, research and innovation, which will create jobs, boost competitiveness, and help solve global challenges. In fact, hardly a day goes by without news of a momentous medical discovery by researchers using AI models.1

As with so much of the wider research landscape, public-private partnerships are an essential part of the AI research scene. Such collaborations help bring new investment into public research, but are also in many senses synonymous with the knowledge valorisation agenda.

AI, as with any technology, brings with it both benefits as well as challenges, and so raises the question of how governments can act to maximise the positives while addressing the negatives.2 It is possible to regulate AI in ways that make the most of its ability to promote research productivity, but it is also possible to do so in a way that will hold Europe back indefinitely.

This blog explores the particular case of copyright regulation. This is because, as a body of law which governs access to information and data, it affects how information technologies, such as AI, can work, and in turn how far they can deliver on their potential to accelerate research and enable new markets to develop.

Why copyright matters for AI in – and for – research

KR21 has become increasingly concerned that as an information technology frequently reliant on working with third party content – most obviously the internet but many other sources too including scientific journals, books, TV, etc. – the way copyright is designed has a defining impact on how far AI supports science, research and innovation.

This is because in order to function such information technologies must make copies, and so (while AI uses were not conceived of at the time of the Statute of Anne or the Berne Convention) copyright applies. As a result, it can provide rightsholders with excessive control over all the different ways in which these technologies are applied.

As is well known, volume, and more accurately volume of the right type of data (“veracity”), supports the creation of high-performing AI models. This is why maximising the amount of data a model can train on is so important. In short, it reduces the possibility of bias and facilitates a model being able to make accurate predictions.

By contrast, restrictions on the data available (for example, by allowing arbitrary opt-outs which have nothing to do with veracity, and a lot to do with profit extraction) increase the chance of bias and poor predictions.

As pointed out in the recent AI paper from the European Commission’s Group of Chief Scientific Advisors:

“access to vast datasets is critical for AI development, yet copyright laws … constrain data accessibility. Future data needs are forecasted to outstrip supply. This poses challenges for training data-intensive AI algorithms.”

Illustration of the impact data volume and veracity on the input side have on the accuracy of AI’s output

In short, an unfavourable copyright regime which limits the amount of data that an organisation can train their model on in spite of them already having legal access to it, makes a jurisdiction more hostile for AI-based R&D and promotes bias. This is the case in the EU, where Article 4 of the Directive on Copyright in the Digital Single Market (CDSM Directive), by allowing for opt-outs, clearly does have major negative implications for the public-private partnerships that are essential to advancing research. The situation in the UK is also less than clear, where researchers must count on the vagaries of the temporary copying exception.

As a result, while a publicly funded organisation that has licensed content is free to mine it or use it for training, their private partner cannot. To do so would require a renegotiation of all the licences ensuring that each and every one provides the rights needed – something that given the likely volume of negotiations is highly unlikely. Irrespective of the cost,  the chilling effect is such that projects of any complexity are likely to just not get off the ground. Further, it incentivises private actors to relocate to more favourable legal regimes – most commonly the US – while European researchers can only look on in envy at what colleagues elsewhere are able to do.

In sum, this is a critical issue when we are talking about research. There is much to question (and plenty of risk) in a regulatory regime which actively reduces the quality of research outputs and increases the chance of poor predictions, and yet this is what such provisions do.

More broadly, in making Europe a less safe place to do machine learning, it stands in the face of a many a political statement from the likes of Vestager and von der Leyen about making the EU a haven for high-quality AI. While this may be the political aspiration, the reality is that European copyright law frustrates and undermines this goal.

Right problem, wrong solution?

As highlighted in the introduction, there are plenty of questions and concerns around the development of AI. One that has received particular attention, due to the loud voice that the entertainment industry has in public debate, is the impact of generative AI models on artistic/creative outputs, even if these are still of widely varying quality.

While without a doubt there are problems posed by AI that warrant careful consideration, the core issues for the entertainment industry are nothing new. This is because copyright was designed centuries ago to prevent “substantial similarity” (essentially plagiarism) where the works of an author are being imitated without their authorisation. Thus, if an AI model is trained on someone’s work and produces something substantially similar to that of the original it is an open and shut case of copyright infringement.

It is in this vein that in 2022 the Israeli Ministry of Justice issued an opinion stating that using copyrighted protected work for AI training is generally permitted under the flexible copyright exception of fair use. However, where the work of one copyright owner has been used and the output is similar to their work, and used in the same market to compete, this is an infringing act.3 This is entirely as it should be – copyright legitimately protecting what it is designed to do – the use of an author’s work to produce something similar.

KR21’s concern is that the all-too-frequent prophecies of doom from entertainment industry lobbyists (similar to what we saw when the internet entered the mainstream in the 1990s), lead to interventions that put science, research and innovation at risk (as we see in Art. 4 of the CDSM Directive). Moreover, it can take a long time to deal with mistakes. Let us recall that nearly three decades later the removal of explicit freedoms in copyright law by technological protection measures (TPMs) still has not been adequately resolved by European legislators. Much the same can be said of the failed EU efforts around sui generis database rights or orphan works.

Of course, it is likely that the goal of the entertainment and wider content industry is not just to prevent the production of similar works that directly compete with existing work. Rather, this is a strategy from publishers and others to control the AI production cycle – from the inputs and the application of the model to their outputs. This is not what copyright was intended to do, but aligns with a wider trend to seek to control and/or monetise all citizens’ and corporations’ use of information, and in turn stretch control over all aspects of the digital economy.

This is not in the public interest. While today we are focused on using (legitimately accessed) content to support AI training and data analysis across the economy, the issues were fundamentally the same with web-indexing to support search before. Given the public interest in search engines and high-quality search algorithms, regulators determined that placing such technologies beyond the reach of copyright law would serve society best.

In sum, where outputs from AI are substantially similar, they are a clear infringement and no changes to the law are required as this is what copyright is designed to do. By contrast, frustrating the ability of computers to analyse data where outputs are not similar is not the goal of copyright law and shouldn’t be prevented simply because computers copy. In the interests of science, technological advancement and the public interest a back to basics approach to copyright’s application to AI is needed that focuses on substantial similarity and not the unavoidable mechanical workings of computer technologies.

A solution that fits – A better deal for R&D

Copyright is a cultural policy and in spite of its regulation of all forms of communication in the digital era, policy makers still often act as if the cultural industries are the only thing it affects – arguably an example of highly successful regulatory capture. All too often the wider needs of society and industry are ignored when it comes to regulating the scope of copyright.

An example of this was that lobbying of Members of the European Parliament by the entertainment industry resulted in major amendments to the EU AI Act with no impact assessment or informed discussion on how they will affect scientific collaborations,  public-private partnerships and the knowledge valorisation agenda. For example, while Art. 52c.1(d) of the AI Act which requires a summary of the works used in the training process is potentially reasonable (and in line with pre-existing practices in research ethics), it could also open the door to monitoring (and so control) by rightsholders such as the scientific publishers or the newspaper industry.

AI transparency in the context of knowledge valorisation needs to be balanced with academic freedom, and in particular the right of academic researchers to pursue their own goals without the chilling effect of being constantly monitored and questioned by content owners or their representatives. Transparency is integrally interwoven with research ethics and rightfully should be led on by academic institutions and others who already work in the field  –  not corporate rightsholders whose interests lie in profit, not progress.

The ability of the research sector to develop its own solutions underlines that it is not only realistic, but also beneficial, to take a more intelligent approach to AI regulation. This would avoid the populist traps of seeing generative AI as the only type of AI, and (comparatively tangential) entertainment uses as the only application of AI.

In developing policy around AI it will be vital to differentiate those sectors of the economy and society where the interests of the creative industry are legitimately relevant and those  where they are not. Market norms around popular consumer-oriented AI models have little to do with regulating AI models that drive progress in robotics, health diagnosis, science, logistics, medicine, the environment, and more.

In short, a route to a regulatory framework that helps realise the potential of AI research and knowledge valorisation through identifying domain specific issues and solutions based on the understanding that AI is deployed across all sectors of the economy and society – there is no one-size-fits-all. Furthermore, given the importance of science, R&D, and knowledge valorisation the needs of the research sector should be front and central in all policy making around AI.4

Other countries are leading by example.

Japan and Singapore have passed copyright laws (often explicitly in order to support innovation) that maximise the amount of data that non-commercial and commercial researchers can train on to ensure that locally based AI initiatives can flourish and thrive. The US has, through case law, confirmed that the data analysis that takes place to create machine learning algorithms is fair use, while as noted above Israel’s government has issued a legal opinion stating that, with the exception of training on the works of a single copyright owner, training of AI falls as fair use.5

As a result,  science, research, knowledge valorisation and innovation are enabled, not least by allowing public-private partnerships that maximise both investment and impact.

Meanwhile in Europe, public research institutions and knowledge valorisation is held back by the artificial and unworkable distinction that EU / UK copyright law makes between commercial and non-commercial research. Despite European governments’ claimed strong support for public-private partnerships in research, the reality is that the distinction between commercial and non-commercial research that European copyright law makes means that working with and sharing information between partners in the context of knowledge valorisation projects is impossible.

Ultimately, Europe’s hostile regulatory approach to AI-based R&D makes it an unattractive place to develop information technologies. Such an approach is self-defeating, weakening or even cancelling out the impact of policy and financial support to encourage collaborations between public and private actors to drive research and innovation.

If Europe is serious about a better deal for European innovators, it’s time to remove artificially created barriers that hinder growth. Like the rest of the world, we need to support research as it is today, with a regime that not only enables research, but also allows for it to feed through into real-world results.

New KR21 Publications

  1. ↩︎
  2. Note that this article is not intended to cover important issues such as privacy, data protection, online manipulation of individuals and electorates etc. ↩︎
  3. n3. ↩︎
  4. ↩︎
  5. See the following KR21 funded report looking at how flexible copyright exceptions which make no distinction between commercial and non-commercial research are increasingly being adopted in different civil, common and hybrid law countries across the globe to support business and technological development. Copyright and Open Norms in Seven Jurisdictions: Benefits, Challenges & Policy Recommendations ↩︎

23 April 2024