Mellon Grant to Expand Text Mining Research by Unsworth and Team

Posted: Oct. 19, 2010

GSLIS Dean John Unsworth, English Department faculty member Ted Underwood, and a team of fellow researchers will explore text-mining as a tool for understanding the humanities through a two-year grant awarded to Stanford University. Unsworth will serve as co-principal investigator along with Michael Welge, director of the Automated Learning Group at Illinois's National Center for Supercomputing Applications, and Stanford University Librarian Mike Keller will serve as principal investigator. Matthew Jockers, at Stanford University, will serve as Project Director.

Awarded in the amount of $761,000 by The Andrew W. Mellon Foundation, the grant will fund use cases by participants at four universities: Dan Cohen, from the Center for History and New Media at George Mason University; Ted Underwood, from the English Department at the University of Illinois; Tanya Clement, Associate Director of Digital Cultures and Creativity at the University of Maryland; and Franco Moretti, the Danily C. and Laura Louise Bell Professor of English and Comparative Literature at Stanford University.

The new grant comes at a time of increasing interest in text mining as a technique for producing new research insights into the humanities. It will expand on work completed as part of three other Mellon-supported projects:

  • The Software Environment for the Advancement of Scholarly Research (SEASR), which focuses on the development of leading-edge digital humanities initiatives;
  • The NORA project, a two-year project to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries.
  • Metadata Offer New Knowledge (MONK), a continuation of the NORA project with co-PI Martin Mueller, at Northwestern University, now available as a library service to faculty staff and students at all CIC institutions.

“As more and more cultural heritage materials are digitized, it becomes increasingly important to explore computational methods for exploring and understanding those materials at ‘library scale’,” said Dean Unsworth, adding “this is not a substitute for traditional modes of reading and argumentation; rather, it is a new way of finding evidence for those readings and arguments—a kind of ‘attention prosthetic.’ ”  

Goals of the project include sharing research findings through peer-reviewed publications in print and online, as well as the further development of infrastructure for text-mining. Software development will focus on creating or adapting SEASR modules to explore specific research questions from the use cases and on extending work done in MONK to allow researchers to assemble their own collections out of digital repositories. The main emphasis will be on developing, coordinating, and investigating research questions posed by the participating humanities scholars. Examples include exploring questions related to the evolution of literary style in the 19th-centuryAnglophone novel (Moretti), analyzing oral features of Gertrude Stein’s poetry, prose, and plays in both text and audio recordings (Clement), automated topical classification and visualization of historical documents concerning the events of 9/11 (Cohen), and understanding the impetus for changes in diction during the Romantic era (Underwood). “This is already an established research topic,” Underwood remarked, “because Romantic writers knew the language was changing, and spent a lot of time arguing about the social implications of word choice. But until recently we’ve had no way to pose the most obvious questions you’d want to ask about the topic: How did diction actually change? When? Which genres were affected first? It’s a problem that seems to cry out for analysis at a larger scale.”

Filed Under: Information Organization, Access, and Retrieval, text mining