The intellectual property (IP) world — the complex labyrinth of laws, lobbyists, court rulings, corporate interests, and legal treatises that protects creativity — can be bewildering at times. This is the world where Disney lobbied Congress to extend the copyright term, helping keep Mickey Mouse out of the public domain as the U.S. copyright term grew from 28 years to 95 years. This is also the world that has stood in the way of the Internet’s holy grail: a universal digital library — every book findable, preserved, and lawfully accessible.
Tech has long chased the “all the books in the world” dream. Google pursued it with Google Books, which co-founder Sergey Brin dubbed, “A library to last forever.” Today, Anthropic, Meta, and other AI companies are making fresh runs at comprehensive digital text collections to train their large language models (LLMs). Many of them, as the array of copyright cases against them show, have been trained on collections from shadow libraries such as LibGen.
In June, the U.S. District Court (N.D. Cal.) in Bartz v Anthropic PBC noted that Anthropic, maker of Claude, had amassed a central library of “all the books in the world” to retain “forever”, built from collections from shadow libraries such as LibGen and mass scanning. In August, closer home, the Delhi High Court, in Elsevier Ltd. v Alexandra Elbakyan, passed an interim order blocking access to Sci-Hub, another shadow library. There is a stark, structural inevitability to shadow libraries today: either you open the stacks yourself, or you consult an AI model that already has.
Shadow libraries such as LibGen and Sci-Hub now power much of AI training — and they are in courts on both sides of the Atlantic: U.S. cases focus on model training; India’s on access to knowledge. It is a tragedy of our times that despite our technology, there is no central, digital library of all the books of the world — the grand collection of mankind’s creative works in a remotely accessible form — because of one obstacle: copyright laws.
The closest we got to in creating one was with the creation of the Library of Congress — the world’s largest collection of printed books — which ironically was the byproduct of a provision of the U.S. Copyright law of 1790 called the “mandatory deposit rule”. The legal deposit system that originated in France is a rule that requires publishers to give free copies of every published work to a national library so that they can be collected and preserved for the future. This system was later adopted by many countries in building their national libraries. Some of them have provisions for collecting non-print (digital) copy of publications. Japan — an early mover on flexible AI training rules — has used electronic legal deposit since 2013 to funnel digital works into the National Diet Library. As for searchable digital collections, the European Patent Office boasts the “world’s largest prior art collection”, which had to be built to satisfy a requirement in the patent law — the need to examine patent applications with all the knowledge that has gone before.
Cost of a legal copy
In Bartz v Anthropic PBC, Anthropic had to destructively scan the print copies it purchased to create a legal digital copy which it could use for training. It “stripped the bindings from the print books, cut the pages to workable dimensions, and scanned those pages — discarding each print copy while creating a digital one in its place”. The court held this format change as fair use. Fair use is a copyright rule that lets one use a copyrighted work without permission. However, the court held that downloading from shadow libraries was not lawful access and did not amount to fair use.
So, you will now have to “strip the bindings” of every book you have purchased, “cut the pages”, “scan” them, and “discard each print copy”, if you need to create a legal copy that qualifies for fair use. What a staggering waste of trees, water, electricity, and other resources — just to satisfy the mandate of a broken law. But there is hope: as Disney’s lobbying to extend the copyright terms show, in the IP world anything is possible. A country should be able to legislate provisions for treating access to knowledge as fair use, allowing AI models to train on material, regardless of their origin and mandate legal deposit rules for building its national digital library.
Knowledge is a public good
The Delhi High Court in Elsevier Ltd. v Alexandra Elbakyan has grappled with the question whether shadow libraries such as Sci-Hub and LibGen can lawfully operate. In August, it issued an interim order directing the Department of Telecommunications and the Ministry of Electronics and Information Technology to block access to Sci-Hub for alleged copyright infringement. But the better question would have been why such libraries exist. LibGen and Sci-Hub fill a gap left by the state’s failure to build large-scale, digitally accessible libraries for every citizen — precisely what shadow libraries attempt to provide. In economic terms, they are a response to a public-goods problem: a market failure to provide access to knowledge.
Any nation that accepts the moral responsibility for this failure will not cut access to knowledge for its citizens till it builds its own digital library of all books of the world. And it is not hard to build such library using copyright laws. If a country sets out to build the next Library of Congress, it will very likely use the mandatory deposit rule to cover submission of DRM-free digital copies and enable gifting digital copies to the libraries, just as the Library of Congress did.
The unification of administering different IPs through one office — as has happened in India, the U.K., Singapore and Saudi Arabia — is yet another reason to go for these changes. The mandatory deposit allows you to collect a copy of everything published using the copyright laws, which will, in turn, serve you to discharge the examination function under the patent law (which requires large databases for searching prior art).
The surest thing any country can do to improve its human capital is to provide unhindered access to knowledge. Building a library of all the books in the world will be the first step. The fact that it can now be built on the foundation of a digital intellectual property office is something all countries should ponder about.
Feroz Ali is a WIPO Neutral. He held the inaugural IPR Chair at the Indian Institute of Technology, Madras. Views expressed are personal
Published – September 03, 2025 12:21 am IST