With the availability and popularity of easy-to-use large language model (LLM) artificial intelligence (AI) systems such as ChatGPT, New Zealand (NZ) copyright law provides both benefits and risks for AI users who educate the systems and use them to generate written materials, images and songs. Even AI systems like GitHub Copilot used to help write computer programs may encounter copyright issues because programs fall within the copyright category of literary works.
AI can be viewed like a human brain – albeit a rather simple one – with functional architecture comprising input/outputs, a processor and a memory. Learning information specific to designated subject matter is fed into the input, passed to the processor which indexes the information and stores it in the memory. When a human has a problem relating to the input subject matter this is set out in appropriate terms and input to the processor. The processor making use of the memory index accesses all the memory information that might match the defined problem, combines it to provide potential answers and outputs this information to the human questioner.
As to the risks legal issues are now more likely to arise than with the longer standing scientific use of AI for things like managing orchards and vineyards, let alone autonomous robots, cars and lawn mowers. For example, when using new generative AI there is a possibility that the output could include parts of copyright protected text or images which were included in the data inputted during the AI learning phase. Furthermore, such 'data' could also include music in the form of sound recordings, for example. These all might constitute copyright works and unless their partial inclusion in generated AI outputs was authorized by the copyright owner such outputs may potentially constitute copyright infringement.
Copyright works and copyright infringement
Copyright is a legal right (a property right) given to 'creators' such as authors and artists. For there to be a copyright issue in relation to AI some of the data collected and input into AI must constitute copyright 'works'. Those that are relevant to AI include literary works (texts and even computer programs), artistic works (images including photographs), musical works, films and sound recordings.
However, data which is pure information does not normally attract copyright and nor do ideas as opposed to the creative expression of an idea. Where 'data' is a copyright work and not just information, the copyright in it will be infringed by making an unauthorized copy of it. However, using data from open datasets will avoid potential copyright issues because its use is licensed (authorised) by the data providers.
Copyright is infringed by carrying out without permission one or more of the exclusive acts restricted to copyright owners. Two of those particularly relevant to AI training and AI outputs are (i) copying a work and (ii) communicating (transmitting) a work to the public over the internet
While the new AI is giving rise to many legal issues and governments are looking at law reform to deal with these this article focusses on copyright issues. Like human intelligence, AI must be provided with and memorise information on topics which it can then subsequently use when required to provide potential solutions to specific questions, requests and prompts put to it. There is a possibility for copyright issues to arise from both inputting 'training' materials and the generation of 'answers'.
The new AI has caused copyright issues to arise for (i) creators of the materials which are used for teaching an AI system, (ii) AI companies and (iii) the latter’s users.
New Zealand copyright benefits for those who arrange for the creation of works generated by AI
The New Zealand Copyright Act 1994 provides that in the case of a literary, dramatic, musical, or artistic work which is 'computer generated' copyright may subsist and designates that the author of such a work will be 'the person by whom the arrangements necessary for the creation of the work are undertaken'. The Act also specifies that a 'computer generated work' is a work generated by a computer in circumstances such that there is no human author of the work. These provisions cover works generated by AI and even those which may be derived from high level ideas or requirements inputted in an AI system by a human.
The owner of this copyright will be the author unless they initiated the work as an employee, in which case the employer will be the owner.
The NZ provisions were derived from those incorporated in the UK Copyright Designs and Patents Act (CDPA) 1988. Similar copyright law was adopted for computer generated works in countries such as Ireland, India and Hong Kong. This is contrary to US copyright law which does not allow for copyright subsistence in anything not authored by a human as has been confirmed in the 16 March 2023 guide issued by the US Copyright Office which states that an AI generated work will only attract copyright if there is sufficient human authorship of the work. An example given in the guide that meets this requirement is 're-elaborating' a work initially generated by AI.
Along with the US, Australia, and most of the European Union countries do not provide for copyright to subsist in purely computer generated works. As in the US, the European Union Software Directive requires that an author of a copyright work must be a human.
Despite AI, in its modern form, not existing in 1988-1994, computers were used in that era to generate new works or at least derivative works or adaptations. For example, a computer running a compiler program would convert (or translate) a new program written in source code by a human into machine code which was the only code a computer could execute to provide any function or solve any problem. Computer generated works under the NZ Copyright Act must be differentiated from those works produced by a human using the computer as a 'tool' to produce something. The simplest example in this category would be someone using Microsoft Word to type and revise a document. A technical drawing produced by a computer added design (CAD) system would also fall into this category. The computer users would in these cases be the author of the literary or artistic works generated.
As with all works that may potentially receive copyright protection a computer-generated work must be 'original' to attract copyright. It is considered that this criterion should be assessed objectively in the same manner as a work created by a human author. The question should be 'if this work was created by a human would it be considered original?'
New Zealand copyright risks for those who educate AI or use AI to generate works
AI inputs and outputs
Collecting LLM AI learning data
Apart from tedious manual data extraction from websites the following three techniques and sources may be commonly used with each presenting greater or lesser risk of copyright infringement.
- Web scraping: Using readily available software, including scraping software provided as apps in AIs such as ChatGPT. Scraping web pages results in copying and storing text, images, sound recordings from web pages.
- Open-source datasets: There are many open-source datasets available on the internet covering a multitude of topics and being 'open-source' are usually free of copyright problems. For example, Kaggle.
- Synthetic datasets: These datasets are generated using computer programs rather than extraction from real world data. While they are useful when real world data is difficult to obtain, an additional benefit is that the data they contain will not have been acquired by simple copying.
AI generated outputs
The potential copyright issue with AI outputs is that they may contain, for example, text which includes passages literally taken from input texts stored in the AI memory. Under NZ law it could be an infringement of copyright if such text passages while not a complete copy, are a copy of a substantial part of any input text. Further, substantiality is not determined by the quantity of text taken, but rather by its quality.
The same criterion is used for assessing whether an image may infringe copyright and therefore if the AI generates a synthetic image which, say, includes a female face extracted from a stored photograph of an Olympic pole vaulter clearing the bar that may be sufficient to constitute copyright infringement.
While NZ courts have not yet considered AI copyright matters, cases have been launched in the US. An example of a case where photographs (millions) have been copied off the net into an AI system is the 2023 US case brought by Getty Images against Stability AI who has used them to teach the Stable Diffuse AI art generation system which has generated outputs incorporating some of the Getty photographs.
In 2023 there has also been an AI generated song launched on TikTok and Spotify which sounded like a Drake and The Weeknd song and attracted millions of listeners. Universal Music Group persuaded the streamers to remove the song, but a US court has yet to decide on whether the song infringed copyright or any other law. An important copyright issue will be whether a song which mimics artists voices, lyrics and musical styles amounts to an infringement of copyright.
Assessing the risks of court cases
A website owner will often not own any copyright in texts and images contained in its web pages which are scraped to form AI inputs. Such copyright may be owned by companies or individuals who have limited ability to finance legal actions.
Scraped materials may constitute pages from online journals and magazines. New Scientist is one of them and has argued that copyright is failing them because like their regular readers private AI scrapers should at least have to pay a subscription to increase the odds that the publisher may withhold commencing court proceedings to enforce copyright.
New Zealand copyright distinguished from US copyright
The purpose of having AI generate a work, such as a text, an image or music may also be relevant in determining if there has been copyright infringement. However, NZ copyright law provides less exceptions to acts which could constitute copyright infringement than the US, for example. The US has a significant exception to copyright infringement which is the doctrine of 'fair use'. Under this doctrine a copier will have a defence if, for example, the copies they make are not made for commercial purposes and/or if they are 'transformative' versions of the original work.
Thus, in the US, web page scraping for AI teaching data might be considered fair use and AI generation outputs which simply incorporate the 'style' of one or more inputs might be considered as transformations of them and not copies. To the contrary, NZ simply has somewhat restricted 'fair dealing' exceptions. For example, where the copying of a work is purely for the purposes of research or private study.
However, the NZ Copyright Act does have similar remedies to those provided under the US Digital Millennium Copyright Act in that where a streamer who falls within the definition of an 'internet service provider' stores, say, a pirated sound recording then they must delete it as soon as they become aware it is infringing to avoid themselves becoming a copyright infringer. This remedy is available even before a case against the 'pirate' and streamer is brought to a court.
Want to know more?
 There have been no court decisions on these provisions yet.