The New York Times files a lawsuit against OpenAI and Microsoft, alleging copyright violation.

The New York Times has initiated a lawsuit against OpenAI and Microsoft, alleging that these companies engaged in copyright infringement. According to the complaint, OpenAI and Microsoft used their artificial intelligence technology to unlawfully copy millions of New York Times articles. This copying was done to train AI models like ChatGPT, enabling them to provide instant access to information, thus creating competition for the Times.

This lawsuit is part of a broader trend of legal actions seeking to address the scraping of online content without compensation for training large language AI models. Creators, including actors, writers, and journalists, who publish their work online are concerned that AI might utilize their material to develop competitive chatbots and information sources without appropriately compensating them.

The New York Times’ lawsuit stands out as it targets prominent AI entities like OpenAI and Microsoft. Microsoft holds a significant investment in OpenAI and a position on its board.

In its complaint filed recently, the New York Times argues that it has an obligation to inform its subscribers, and the unauthorized use of its content by Microsoft and OpenAI to develop AI products that compete with the Times threatens its ability to provide this service. The Times acknowledges that OpenAI and Microsoft used content from various sources in their wide-scale copying, but they placed particular emphasis on the Times’ content, essentially benefiting from the newspaper’s substantial journalistic investment without permission or compensation.

OpenAI responded by expressing its respect for content creators and owners, noting that they are committed to working with them to ensure they benefit from AI technology and new revenue models. They have had productive conversations with the New York Times and hope to find a mutually beneficial way to collaborate, as they do with many other publishers.

Microsoft has not yet responded to the lawsuit. The New York Times initially objected when it learned that its work had been used to train these companies’ AI models. Starting in April, the Times attempted to negotiate with OpenAI and Microsoft for fair compensation and agreement terms but claims they were unable to reach a resolution.

The heart of the dispute lies in whether the use of the Times’ content falls under “fair use” by OpenAI and Microsoft. Fair use is a legal doctrine that allows the use of copyrighted material for transformative purposes. However, the New York Times strongly disputes this claim, asserting that ChatGPT and Microsoft’s Bing chatbot (also known as “copilot”) essentially offer similar services to the New York Times, and thus, using the Times’ content without payment for such purposes does not qualify as fair use.

Resisting or opposing AI technology’s advancement.

The Times, along with other prominent news organizations such as CNN, took measures earlier this year to prevent OpenAI’s web crawler, GPTBot, from scanning their websites for content. In separate but related legal actions, comedian Sarah Silverman and two authors filed lawsuits against Meta and OpenAI in July, alleging that the AI language models used by these companies were trained on copyrighted material from their books without their consent or knowledge. Both companies have not made any public comments regarding these lawsuits, and in November, a judge dismissed most of the claims in these lawsuits.

Furthermore, a group of well-known fiction writers, in partnership with the Authors Guild, initiated a separate class action lawsuit against OpenAI in September, claiming that the company’s technology was unlawfully utilizing their copyrighted works.

In its lawsuit, The Times asserts that the datasets employed to train OpenAI’s latest large language models, which power its AI tools, likely incorporated millions of works owned by The Times. In a 2019 snapshot of one of these datasets, known as Common Crawl, the New York Times website ranked as the third most prevalent source of information in English, following Wikipedia and a collection of US patent documents, according to the complaint.

The Times argues that because these AI tools were trained on its content, they have the capacity to generate output that reproduces Times content verbatim, closely summarizes it, and emulates its writing style, as evidenced by numerous examples. Additionally, these tools incorrectly attribute false information to The Times, as stated in the complaint.

For instance, the complaint cites an incident in which ChatGPT provided a user with the first three paragraphs of the 2012 Pulitzer Prize-winning article “Snow Fall: The Avalanche at Tunnel Creek” after the user expressed difficulty accessing the article behind The Times’ paywall.

The news organization also alleges that Microsoft’s Bing search engine, which integrated OpenAI’s technology earlier this year, copies and categorizes Times content to generate more extensive and detailed responses compared to conventional search engines.

The complaint further states that by providing Times content without permission or authorization, the defendants’ tools undermine The Times’ relationship with its readers and lead to financial losses in terms of subscriptions, licensing, advertising, and affiliate revenue.

Adopting AI technology while imposing certain restrictions or boundaries.

Opposing the advancement of AI is akin to trying to stop an unstoppable force. AI technology is inevitable, and publications like The New York Times acknowledge the need to embrace this future. However, they aim to ensure that this future includes fair compensation for their contributions.

Diane Brayton, the Executive Vice President and General Counsel of The New York Times, conveyed in a memo to the publication’s staff that they recognize the potential of generative AI for the public and journalism. Simultaneously, they believe that the success of AI development should not come at the expense of journalistic institutions. They insist that the use of their content to create AI tools should be accompanied by permission and an agreement that reflects the fair value of their work, as prescribed by the law.

In their lawsuit, The New York Times is seeking billions of dollars in damages but has not specified the exact compensation they are requesting for the alleged infringement of their copyrighted materials. They are also seeking a permanent injunction to prevent Microsoft and OpenAI from continuing this alleged infringement. Additionally, The Times is pursuing the “destruction” of GPT and any other AI models or training datasets that incorporate their journalism.

This lawsuit by The Times has the potential to set a precedent for the broader industry. The question of whether using copyrighted material to train AI models violates the law remains an unresolved legal matter. Dina Blikshteyn, a partner in the artificial intelligence and deep learning practice group at law firm Haynes Boone, anticipates an increase in similar lawsuits in the future, and she believes that the issue might eventually reach the Supreme Court, providing definitive case law. Currently, there are no specific legal precedents for large language models and AI because the technology is relatively new.