Search News


Browse Archives

News

Google Who?

March 28, 2011

Share This Story

FREE Daily News Alerts

Advertisement

The Google Books project has been put on ice, delaying what some academic librarians had hoped would be a watershed moment in the accessibility and searchability of digital texts. But a pair of library services scheduled to be announced today show that even as the world’s most high-profile digital search-and-retrieval effort has been set back, smaller, academically oriented projects are hoping to continue making electronic texts more discoverable.

The first is from the HathiTrust Digital Library, a cooperative based at the University of Michigan that owes much of its 8.2-million-work collection to duplicate copies of books scanned by Google, and the popular journal and newspaper aggregator ProQuest, which are teaming up to let students and scholars conduct searches that query the full texts of every item in the HathiTrust archive. The second is from the Copyright Clearance Center, which is offering a digital retrieval service that it says will cut the lag time in delivering individual journal articles from five days to five minutes.

Officials at ProQuest and HathiTrust think their service could vastly improve the ability of students to find obscure but relevant book content using a search tool that is as simple to use as Google's. Most library search databases currently query only titles, authors, and “metatags” — keywords referring to certain themes in the work — says John Law, vice president of discovery services at Serial Solutions, the ProQuest division that developed the search tool, which is called Summon. That means books that might have relevant chapters or passages that are not accounted for in those basic identifiers are left out of search results.

But the new tool will use advanced algorithms, a la Google, to troll every word of every book, monograph, journal, and magazine held in the HathiTrust Digital Library that is also either in the library’s print collection or part of the “public domain,” a body of non-copyrighted works that comprises at least 20 percent of HathiTrust’s digital holdings. If the work is under copyright, Summon directs the user to where it can be found in the stacks. If it is in the public domain, Summon links to the full electronic text.

The idea is to make library catalog searches simpler and more like Google. Recent studies suggest that students tend to rely on the company’s popular search engine as a starting point for research. Andrew Asher, an anthropologist at the Ethnographic Research in Illinois Academic Libraries (ERIAL) Project, has done research indicating that students, their expectations primed by Google’s simple search function and the faith it inspires, tend to favor similarly straightforward tools, even when doing academic research.

This finding has prompted different reactions within academe, with some saying librarians and professors need to do a better job steering students toward more discriminating scholarly research tools, and others saying that the methods popularized by Google are here to stay and libraries would do well to imitate the simple search in order to appeal to students.

Law, the Summon developer, falls in the latter camp. Princeton University, he says, presents students starting research projects with hundreds of possible starting points. While it is great that Princeton makes so many resources available to students, students can be paralyzed by choice, Law argues. “Libraries need to be as simple, easy and fast to access and use as commercial alternatives like Google,” he says. “Having a search box for the library that is easy for users is important.”

Imitating Google-type searches of libraries’ print holdings has been difficult. Aside from the obvious challenges of duplicating the effectiveness of the company’s closely guarded search formulas, many libraries simply do not own full digital texts of many of their print collections, and therefore have no choice but to rely on searches that troll through titles, abstracts, and metatags, rather than full texts. But by aggregating the digital copies from many different libraries in one searchable archive, HathiTrust — which was founded in 2008 and has quickly grown to include contributions from 52 research libraries — offers an unprecedented opportunity for libraries to search the full texts of works they own in print but have not digitized.

For example: Library A might not have a digital copy of Alexis de Tocqueville’s Democracy in America, but as long as Library B does, and has contributed it to HathiTrust, a student using the Summon tool for a research project on early American poetry at Library A might discover Tocqueville’s brief but insightful musings on “the sources of poetic inspiration in the democratic age” hidden deep in HathiTrust’s digital copy from Library B, even though it is doubtful that a search of titles and abstracts would have pointed her in the direction of the French political thinker.

A recent study by the Online Computer Resource Center predicts that by 2014 HathiTrust’s digital archive will mirror 60 percent of works currently held in print by the major U.S. research libraries.

The reach of the Summon service therefore stands to be significant, says Law. “It really is unlocking the hidden content in the library,” he says. “It’s really going to have a massive, massive impact on usage of libraries’ collections.”

Golden Retriever

Another offering for libraries scheduled to be announced today is a service from the Copyright Clearance Center, a Massachusetts-based nonprofit, called Get It Now. Designed to eliminate inefficiencies in inter-library lending of journal articles, Get It Now allows students who want to read articles from journals to which their libraries do not subscribe to get a digital copy of the article e-mailed to them in minutes, rather than having a librarian send away for a photocopied version from another library.

The old way tended to take 5 to 10 days, says Gerry Hanley, senior director for academic technology services at the California State University chancellor’s office, which has been piloting the service for a year. The new way takes 5 to 10 minutes.

Get It Now essentially allows college libraries to purchase individual articles for students for less than it would cost, on average, to get a copy made and sent from another library. “The service has been a boon to graduate students and faculty who have had access to a greater scope of digital content than what was previously available through licensed content agreements,” Hanley wrote in an e-mail.

Some journals do allow colleges to purchase single articles on demand, but by using the Copyright Clearance Center as an intermediary, colleges avoid the hassle of negotiating those discrete exchanges with different publishers, says Tim Bowen, a product manager at the center. And rather than paying publishers for each exchange with a credit card, the libraries would pay the Copyright Clearance Center for the articles students order each month.

During the California State pilot, the center has charged about $24 per article, according to Hanley. The cost to the university of ordering a copy through inter-library loan is often higher. The biggest part of that cost is royalties. As of 2005, the average cost of royalties for an article acquired through inter-library loan was about $29, says Hanley. In some cases it can run higher. And then there are the postage and labor costs.

“When a library adds up the various unit costs: rush fees and other marginal costs of an inter-library loan transaction, it is not uncommon to find that filling a request through inter-library loan can make this content some of the most expensive, per-use content, that a library purchases in the course of a year,” Hanley says.

However, Get It Now is not necessarily a super-saver, says Hanley. There are upfront costs to implementing the service, he says. And of course, when you make ordering articles quicker and easier — users need only to click on the “Get It Now” button in their library’s discovery engine to place an order — patrons might be more apt to do so. The delivery mechanism is more efficient, but Get It Now expands access more than it trims costs, Hanley says, noting that some California State libraries might have to charge user fees to help subsidize the expense.

“Since this service does carry a new cost for libraries, libraries have had to explore where they might find the cost savings in their budgets to cover the expense of offering the new patron-driven services,” Hanley says. “Publishers, too, have had to be flexible adopting a business model that supports selling content by the article. This is not an easy or comfortable adjustment for publishers or libraries to make, but both are necessary for new patron-driven services to flourish."

For the latest technology news and opinion from Inside Higher Ed, follow @IHEtech on Twitter.

Advertisement
Advertisement

Matching Jobs

Comments on Google Who?

  • Fantastic!
  • Posted by Lola Estelle on March 28, 2011 at 2:30pm EDT
  • Copyright issues aside, I'm happy to see the libraries banding together to provide a non-corporate alternative to Google. Granted, Summon is a commercial product as well, but it's only a form of access to the content -- ultimately, ownership of the content remains in the hands of those who care about the public preservation of knowledge, rather than business concerns.
  • Remembering MARC and ILL
  • Posted by Fred Stielow , Dean of Libraries at American Public University Systems on March 28, 2011 at 2:30pm EDT
  • Fortunately, the searching issue mentioned will be transitory. Libraries pioneered automated information storage and retrieval with then innovative MARC formatting in the late 1960s. It indexed and soon replaced catalog cards for the overwhelming bulk of books. Although this resource will prove as hard to give up as the massive card catalogs in the 1980s, it must eventually give way. The Web revolution with full text and relevance engines, like Google, will triumph. The particular responses to the Google settlement, however, carry more serious threats to established library services. In effect, they look to commoditize and individually charge for what has been collective services. Unless closely guarded, the library's right-of-sale and ability to share amongst libraries with InterLibrary Loan will become a business for others with digital materials--apparently even for works in the Public Domain.
  • Posted by Barbara Fister on March 28, 2011 at 3:46pm EDT
  • "Get it Now" sounds like a slick way to reroute money from libraries directly to publishers without any assets accruing to the institution paying the bills. In fact, the library pays for the service as well as the article. Slick!

    Our interlibrary loan of articles takes about 24 hours. That seems pretty quick. And frankly, students have figured out another way to get an article that they badly need if ILL isn't the answer. They contact the author.
  • Posted by Cathy Doyle , library on March 29, 2011 at 10:45am EDT
  • Barbara we can get a journal article within 20 minutes sometimes if it's in our consortium in electronic form! Our ILL staff just came back from the Illiad conference very excited about the new CCC product (and we aren't often excited about them here!) For articles that we do need to pay copyright fees, this will streamline our workflow and reduce our costs from the $45 we're paying Elsevier now to a more manageable $22-24/article. It will also allow us to take our Acquisitions person out of the loop as well, dealing with everything on one bill.
  • Posted by Barbara on March 29, 2011 at 9:30pm EDT
  • I know, I know. Twenty minutes beats 24 hours. But when libraries exist to pay the pay-per-view needs (and get nothing that can be shared by anyone else or retained for future scholars) I really worry about where we're headed. Why not give everyone a credit card?

    We should be investing in open access, not in slicker ways to pay corporations. Yeah, I know, I'm dreaming. We are but handmaidens of our patrons. Their immediate need is our command. Future scholars are not available at the moment to give us worried looks and ask what's in it for them.

    How interesting to have two such radically different models--one library designed, one designed by corporations--discussed in one article.