About

Research in the humanities is more data-intensive now than it ever has been. Scholars rely on extremely large, multimedia corpora involving everything from spontaneous speech to poetry to judicial records, in languages both ancient and modern. This massive influx of new data means that, for the first time in history, results obtained from close analysis of specific texts can readily be tested against the quantitative characteristics of entire literary genres, styles, and dialects. This Visioning project centers around such data-rich humanities research: building tools, conducting research, and bringing these ideas and techniques to the classroom.

Swearing and emoticons in social networks

The full project description

Highlights

January 2009 Noah Constant, Chris Davis, Chris Potts, and Florian Schwarz released the UMass Amherst Linguistics Sentiment Corpora. This is a collection of over 700,000 online reviews in Chinese, English, German, and Japanese.
December 2008 Chris Potts posted at Language Log on using large email corpora to detect emotional linguistic devices and understand what they say about social networks: Swearing and social networks.
November 2008 On November 4, the Supreme Court began hearings on the case of the "fleeting expletive". As part of his ongoing NSF grant, Chris Potts commented on the case in a Wall Street Journal article, which also featured a graphic based on this work on automatically detecting word and phrase connotations using large corpora.
October 2008 Rex Wallace and colleagues have launched The Etruscan Texts Project, which "makes available to the scholarly community in a user-friendly format recently recovered Etruscan inscriptions."
September 2008 Chris Potts and Florian Schwarz have posted a draft of Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora. Potts and Schwarz collected about 18 million words of book and hotel reviews for this paper. They use this large document collection to develop a quantitative perspective on how emotion is expressed in language.
June 2008The highly collaborative paper Expressives and identity conditions has been accepted for publication in Linguistic Inquiry. The paper uses thousands of documents from the Internet to support its central claims.