Knowledge Rights 21 co-signs public statement calling on the UK government to safeguard AI innovation

Knowledge Rights 21, alongside other representatives of the education and research sector, has been participating in the UK government’s multi-stakeholder discussions on an AI Code of Practice. Today we co-sign a public statement (see below) alongside many others (including representatives from the education and research sector, big and small industry, Wikimedia UK and Creative Commons) calling on the UK to introduce a Code of Practice supportive of AI innovation while making it a safe place to do AI.

The stakeholder dialogue on a Code of Practice is the result of a U-turn by the British Government in February 2023, abandoning its prior commitment to introduce a broad copyright exception for text and data mining that would not have made an artificial distinction between non-commercial and commercial uses. Given that applied research so often bridges these two, treating them differently risks simply chilling innovative  knowledge transfer and public institutions working with the private sector.

Unfortunately, and in the face of significant lobbying from the creative industries (something we see also in Washington, Tokyo and Brussels), the UK government moved away from clarifying language to support the development of AI in the UK. 

As both this and this (paywalled) article highlight, AI could be stopped in its tracks by copyright. Clearly, there are certainly many challenges that AI poses – from product safety and privacy, job security, to excessive market concentration by a few large tech actors – copyright issues however are often, at best, tangential. Where they are relevant, we believe that it is not copyright but regulation, targeted on potentially competing downstream uses, as well as meaningful cultural funding and support programmes where we should focus. 

For example, we have favoured tax regimes for certain forms of publishing, public lending rights, and there has been much discussion of a universal basic income. (This could in part even be funded by general taxation on large digital firms as the G7 has started to realise.) Amendments to copyright law are however not part of this. 

Bringing together actors from the education and research sector, entertainment industry, publishers, big and small tech, startups, scaleups and think tanks, the Code of Practice discussions that Knowledge Rights 21 has participated in over the course of the year have covered a wide array of topics – resulting in much debate and exploration of commonalities and differences of opinion from the different sectors. It has, however, underlined that too often a very narrow view of what AI is and does prevails. If this is allowed to guide decision-making, we risk seeing major harm to the things that matter. 

Therefore, in addition to safeguarding existing flexibilities in UK copyright law, in signing the statement below Knowledge Rights 21 is particularly keen that the Code of Practice:

  • is sector specific, and that it differentiates those sectors of economy and society where the interests of the creative industry  are legitimately relevant and where not. For example we believe that the market norms around popular consumer-oriented AI models have little to do with AI models that regulate robotics, health diagnosis, science, logistics, medicine, the environment etc.

  • supports the development of safe AI. By seemingly questioning existing legal norms that organisations rely on to undertake data analytics, the UK government is potentially making the UK an unsafe place to do AI. Given the pivotal role the UK has played in the Bletchley Declaration it seems the left hand of British government doesn’t know what the right hand is doing. In B2C markets where AI outputs are purely artistic in nature, limiting access to data on the face of it presents few societal issues. However, given that volume, and intertwined, veracity of data are crucial for training safe AI models,  anything that affects these two factors can promote bias and poor predictive AI. This would mean that models developed in more innovation and research friendly countries than the UK that as a matter of statute allow use of publicly accessible information and legally licensed content (like Japan, Singapore, US, Israel, Taiwan etc) are likely to have safer and more accurate outputs. This is vitally important in scientific, medical and environmental fields but also to combat undesirable online phenomena such as fake news that will inevitably permeate and corrode politics and society. For example if your AI model doesn’t train on a wealth of good news, how can it learn to identify fake news?

  • supports flexible access to online and licensed data for all. Not only will restrictions on access to online information promote bias, but they will also unduly benefit (digital) market incumbents who have deep pockets and can afford to pay for information that others cannot. This will disproportionately affect not only UK scaleups, startups and SMEs but impact on UK universities and researchers who are engaged in knowledge transfer activities, job-sharing and partnerships with the private sector which greatly benefits the economy. The UK is good at knowledge transfer (KT) and UK taxpayer funded research (via the government agency,  UK Research and Innovation (UKRI), etc.) is funded in order for it to have a real-world market impact. Research and knowledge sharing does not stop at the doors of academia – the UK government should support KT not hinder it.

  • ensures that universities and other research organisations have speedily reinstated access to subscribed to content they have paid for. The Code of Practice could and should make a real difference to the ability of universities to undertake text and data mining in line with s29A of the UK Copyright Act which is currently being frustrated by technological protection measures – often resulting in databases suddenly being shut down by publishers when text and data mining is detected. This study by LIBER and the UK Libraries and Archives Copyright Alliance found that in over 20% of cases access to paid for content from publishers took a month or more to be switched back on. In some instances it was never reinstated at all. We believe a Code of Practice on this point could really make a difference to the ability of researchers to undertake text and data mining on their digital collections. International best practice should be followed and access should be reinstated by publishers within 72 hours of being informed of erroneous suspension by technological protection measures.

If as an organisation you would like to support the statement please let us know: info@knowledgerights21.org


We urge the UK Government to ensure the UK is a favourable place to develop and use safe AI, by clarifying that public and legally accessed data is available for AI training and analysis in its Code of Practice.  

We, the undersigned organisations, welcome the Intellectual Property Office’s efforts in taking forward the AI recommendations contained within the Vallance Report. While many other countries have clarified their intellectual property laws to support AI and innovation, the UK has yet to introduce a text and data mining exception to explicitly support knowledge transfer and commercial AI. Given this, the Code of Practice provides a particularly important opportunity to provide clarity and ensure that the UK remains an attractive place to undertake and invest in machine learning.  

As rights holders, researchers and innovators, we understand the importance of a well-functioning IP system which strikes an appropriate balance between protecting intellectual property rights and providing the necessary limits and exceptions to those rights, in order to ensure we have the right incentives to create, innovate and develop knowledge. Even without an explicit commercial text and data mining exception, other exceptions and legal doctrines will allow for text and data mining on copyrighted works. 

Whilst questions have arisen in the past which consider copyright implications in relation to new technologies, this is the first time that such debate risks entirely halting the development of a new technology.  

AI relies on analysing large amounts of data. Large-scale machine learning, in particular, must be trained on vast amounts of data in order to function correctly, safely and without bias. Safety is critical, as highlighted in the Bletchley Declaration. In order to achieve the necessary scale, AI developers need to be able to use the data they have lawful access to, such as data that is made freely available to view on the open web or to which they already have access to by agreement. 

Any restriction on the use of such data or disproportionate legal requirements will negatively impact on the development of AI, not only inhibiting the development of large-scale AI in the UK but exacerbating further pre-existing issues caused by unequal access to data. 

It will create barriers to entry and raise costs for new entrants. 

Unlike other countries, it also would mean that AI model developers would be unable to train their models on publicly available data in the UK without an explicit licence from each rightsholder. In addition to making the UK uncompetitive in AI markets it will disproportionately impact small to medium enterprises, knowledge transfer and hinder open source development of AI.  

Importantly, text and data mining techniques are not only used to train AI. Text and data mining techniques are necessary to analyse large volumes of content, often using AI, to detect patterns and generate insights, without needing to manually read everything. Such analysis is regularly needed across all areas of our society and economy, from healthcare to marketing, climate research to finance.   

We believe that in order to support and incentivise researchers and innovators, the UK is best served by a balanced copyright system that encourages the many exciting economic and social opportunities that AI makes possible.  

In order that the UK remains competitive in scientific and technology markets, the government should ensure that a Code of Practice: 

  • Recognises that even without an explicit commercial text and data mining exception, exceptions and limits on copyright law exist that would permit text and data mining for commercial purposes.  
  • Recognises that the UK operates in an international environment where global norms to support AI are well developed. We observe that countries such as the US, Israel, South Korea, Singapore and Japan have broad fair use doctrines or text and data mining exceptions of differing levels of flexibility aimed at supporting research and technological advancement. As these countries have concluded, we believe that even small differences in clarity in lP regimes can result in big effects on the economy. 
  • Recognises the broad application of AI across many other sectors of the economy; not least health, the environment, bio-science, agriculture, transport, logistics etc. The Code of Practice meetings have focused on the requests of the creative industries. The report should reflect this by focusing on the outputs of AI systems that are relevant to that sector.
  • Supports the Prime Minister’s vision that the UK becomes a world leader on safe and responsible AI. The ability to train AI models on broad and varied data sets that are publicly available or legally accessed under agreement will enable the development of safe, ethical and unbiased AI. The code should emphasise the role of government in supporting high-functioning AI by ensuring that all entities are able to develop AI using the necessary scale of data. 
  • Avoids introducing frictions around using data necessary to develop safe AI. Any measures that discourage the use of broad and varied data sets will have a serious and negative impact on all sectors of the economy using AI. AI is predicated on the three Vs – velocity of processing power as well as volume and veracity (data that is required to ensure the models are accurate). Introducing frictions that hinder use of publicly available or legally accessed data for training therefore prevents the necessary use of data of all types that is required to make sure the models have high levels of predictive accuracy and avoid bias. We must support accurate AI and not hinder it. 

In terms of specific features of a Code of Practice, the following should be included: 

  • Explicit reference to the idea-expression dichotomy, existing exceptions, limitations and the implied licence doctrine that allow the processing of data that a person or organisation already has legal access to. We must not interfere with laws that support the very functioning of the internet.
  • Clarifies that access to broad and varied data sets that are publicly available online remain available for analysis, including text and data mining, without the need for licensing.
  • Clarification that the code of conduct does not undermine the operativity of s29A (Copies for text and data analysis for non-commercial research.)
  • In order to promote the uptake of AI, support and encourage the creative industry sector to develop standardisation of data access agreements, data and schemas. 
  • The establishment of service level agreements with content providers to swiftly address instances where access to paid for and legally accessed data is erroneously suspended, contrary to the terms of the agreement entered into. This is a particularly acute issue for scientific researchers in universities. The UK should follow international best practice and require that access for data analysis and AI is reinstated within a maximum of 72 hours.
  • Support collaboration between AI developers and content creators to mitigate the risk of AI tools from being used to infringe intellectual property rights and promote disinformation.  

IP Federation

European Alliance for Research Excellence

The Entrepreneurs Network

Knowledge Rights 21

Creative Commons

Wikimedia UK

Research Libraries UK

Libraries and Archives Copyright Alliance

OpenUK

BSA | The Software Alliance


Featured image by luckey_sun (CC BY-SA 2.0)