HathiTrust offers full-text search of millions of digitized books and journals

February 9, 2010
  • umichnews@umich.edu

ANN ARBOR—A year after its launch by 25 leading U.S. research libraries, HathiTrust Digital Library announces a service that will transform how researchers use the more than 1.6 billion pages (4.6 million volumes) in its collections.

The breakthrough allows for full-text searching capabilities across the entire library. Researchers can now search public domain and in-copyright works by keyword or phrase.

Based on open source Solr/Lucene technology, the service expands on an experimental search of public domain volumes introduced in November 2008. Full-text search will continue to be supported across the repository as it grows at a rate of hundreds of thousands of volumes every month.

“The HathiTrust partners are pleased to offer a search service that helps mine this growing body of authoritative library materials,” said John Wilkin, HathiTrust executive director and associate university librarian at the University of Michigan. “HathiTrust continues to distinguish itself with its reliability and with its efforts to broaden the availability of digitized library collections in the flow of scholarly discourse. We see this valuable discovery service as one in a series of major steps HathiTrust is taking to shed light on this vast body of material.”

In combination with the HathiTrust Digital Library’s carefully curated bibliographic data, the new functionality allows researchers to more efficiently locate items relevant to their research. It also lays the foundation for future services such as full-text search with faceted browsing, advanced search, “more like this” options, and tools that can be used in computational research.

The effort to provide full-text searching capabilities across the repository has yielded valuable benchmarking data, methods, and code to the broader large-scale search community, said Wilkin.

The HathiTrust partners are committed to developing the repository and its services to meet the long-term needs of their academic communities, and offer a unique resource on the Web for scholarship and research.

HathiTrust (http://www.hathitrust.org) is a collaboration of the thirteen universities of the Committee on Institutional Cooperation, the University of California system, and the University of Virginia, and currently includes digitized volumes from the University of Michigan, University of California, Indiana University, and the University of Wisconsin.