The Surprising History of Google's Push to Scan Millions of Library Books

Nearly 20 years ago, Google made an ambitious play to digitize the content of some of the world’s largest research libraries.

It seemed like the beginning of a new era, when scholars and the public could make new connections and discoveries in the kind of mass digital library that had previously been the stuff of science fiction. But it soon became clear the actual plan would turn out to be far more controversial than its organizers probably ever imagined.

On this week’s EdSurge Podcast, we tell the story of this ambitious book-scanning effort that sparked an epic legal battle among publishers, authors and technologists. Somehow, it’s a story that seems largely forgotten.

To do that, we connected with Roger C. Schonfeld, co-author of the new book, “Along Came Google: A History of Library Digitization.” Schonfeld is a longtime leader in the library community and is a program director at Ithaka S+ R a nonprofit education consultancy.

We came away wondering why people don’t talk more about this bit of recent edtech history, and what lessons could still be learned from it.

EdSurge: Not too long ago, it was pretty rare to have full texts of books scanned and available, right?

Roger C. Schonfeld: Not that long ago at all, you know, 15 years ago it was pretty uncommon, actually. So the way that folks discovered books was really different. You browsed a card catalog or you went to a bookstore, or you browsed the stacks. It was a very, very different experience.

So remind us what Google did back around 2004.

There had been a whole series of efforts to digitize library materials beforehand. And that's something that's really important to bear in mind. Our story isn't just, there was zero and then there was Google. Our story is that there was actually a lot of activity taking place. The Internet Archive was active. Carnegie Mellon University was active. Lots and lots of individual libraries and library collaborations were active in digitization.

But the efforts were separated. They didn't scale. They were often risk-averse and concerned about digitizing copyrighted materials that were still in copyright. There were all sorts of limitations—and that's to take nothing away from the great work that was done.

And then along came Google. And what actually happened was that this dream that librarians and technologists and others had for decades—of expanding access to knowledge and making access to book collections widespread—that dream found the catalyst that was necessary in order to make it happen at the scale that was necessary to potentially achieve the vision.

And the catalyst had a couple of elements to it. Some people will really focus on, ‘Well, gosh, Google had unlimited money, you know, comparatively speaking.’ But in, in fact the amount of money that Google invested was an amount of money that some foundations might have been willing to invest—that certainly 50 or a hundred universities collectively easily could have invested. So it was not actually literally the amount of money that they brought.

They also brought some technology, they engineered some new ways to do book scanning faster and more effectively. But I would argue that the thing that Google bought was actually a kind of catalytic role of saying, ‘This is going to happen, and this is going to happen quickly, and we're gonna work with whoever's willing to do it with us.’

And instead of trying to do something in a kind of consensus-driven collaboration across dozens, or a hundred, major university libraries, they said, ‘let's find five that are willing to work with us, and we're gonna use secrecy and other kinds of approaches to get these five to move on the speed that we want to move on’—if I can call it this—on sort of a Silicon Valley timeline rather than a more traditionally academic timeline.

Listen to the entire interview on this week’s EdSurge Podcast.