About
Research in the humanities is more data-intensive now than it ever has been. Scholars rely on extremely large, multimedia corpora involving everything from spontaneous speech to poetry to judicial records, in languages both ancient and modern. This massive influx of new data means that, for the first time in history, results obtained from close analysis of specific texts can readily be tested against the quantitative characteristics of entire literary genres, styles, and dialects. This Visioning project centers around such data-rich humanities research: building tools, conducting research, and bringing these ideas and techniques to the classroom.
Highlights
January 2009
Noah Constant,
Chris Davis,
Chris Potts, and
Florian Schwarz
released the UMass Amherst Linguistics Sentiment Corpora.
This is a collection of over 700,000 online reviews in
Chinese, English, German, and Japanese.
December 2008
Chris Potts posted at Language Log on using large email corpora to detect emotional linguistic devices and understand what they say about social networks: Swearing and social networks.
November 2008
On November 4, the Supreme Court began hearings on the case of the "fleeting
expletive". As part of his ongoing NSF grant, Chris Potts commented on the case in a Wall Street Journal article, which also featured a graphic based on this work on automatically
detecting word and phrase connotations using large corpora.
October 2008
Rex Wallace and colleagues have launched The Etruscan Texts Project, which "makes available to the scholarly community in a user-friendly format recently recovered Etruscan inscriptions."
September 2008
Chris Potts and Florian Schwarz have posted a draft of Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora. Potts and Schwarz collected about 18 million words of book and hotel reviews for this paper. They use this large document collection to develop a quantitative perspective on how emotion is expressed in language.
August 2008
Julie Hayes has published her database French Translators, 1600-1800: An Online Anthology of Prefaces and Criticism on ScholarWorks@UMassAmherst.
June 2008The highly collaborative paper Expressives and identity conditions has been accepted for publication in Linguistic Inquiry. The paper uses thousands of documents from the Internet to support its central claims.