Computer Science Talk: Noah Smith
Noah Smith
Language Technologies Institute and Machine Learning Department
Carnegie Mellon
Hidden Grammar: Advances in Data-Driven Models of Language
Monday, March 9, 1:30 pm Computer Science Building, Room 151
Noah Smith
Language Technologies Institute and Machine Learning Department
Carnegie Mellon
Hidden Grammar: Advances in Data-Driven Models of Language
Monday, March 9, 1:30 pm Computer Science Building, Room 151
Earlier this year, Noah Constant, Chris Davis, Chris Potts, and Florian Schwarz released the UMass Amherst Linguistics Sentiment Corpora:
The UMass Amherst Linguistics Sentiment Corpora consist of n-gram counts extracted from over 700,000 online product reviews in Chinese, English, German, and Japanese. The files are UTF-8 encoded text. They are formatted to be read in as R data frames, but they can easily be manipulated with other tools.
This data collection effort and research that makes use of it were supported by an NSF grant and by a UMass Amherst College of Humanities and Fine Arts Visioning Grant.
The second North East Computational Phonology Workshop takes place on Saturday, November 15, at Yale. Here's a tentative schedule; the precise timing of everything is still being sorted out:
Update [Thanks Joe]: The website is now up.
| Joe Pater (UMass Amherst) | Emergent simplicity bias in a Gradual Maximum Entropy Learner |
| Bruce Tesar (Rugers) | Learning phonological grammars for output-driven maps |
| Sarah Eisenstat (Brown) | Learning underlying forms together with constraint weights |
| Mark Johnson (Brown) | Improving word segmentation by also learning syllable structure |
| Jennifer Michaels (MIT) | Summing up constraint interactions: Chain shifts in a split additive model |
| Giorgio Magri (MIT) | A convergent version of the GLA for standard OT |
| Gaja Jarosz & J. Alex Johnson (Yale) | Comparing phonotactic cues to word boundaries in three languages |
An NSF Symposium on Semantic Knowledge Discovery, Organization and Use will take place at NYU, November 14-15. Registration is free. The program is chock-a-block with leading lights in NLP research, from academia and industry.
[Thanks Aynat!]
A summer group for people interested in learning Java has been meeting on Thursday at 2pm. Pat Pratt (Linguistics undergrad) has ably taken the lead. If you're interested in joining, let Karen Jesney know.
Michael Becker, Joe Pater, and Chris Potts launched OT-Help on November 1. OT-Help is a suite of software that facilitates solving large, complex phonological systems using OT and Harmonic Grammar. It's an easy-to-use, fully-documented downloadable. A link to Michael's announcement on phonoloblog.
The first meeting of the Northeast Computational Phonology Circle will be held in the Department of Linguistics' Donald and Margaret Freeman lounge, Saturday, November 10, starting at noon. There is a growing interest in computational methods in phonological theory, and the northeast has a particularly dense population of people working in this area. This meeting aims to bring these people together in an informal setting to share results, ideas, and maybe even software. All are welcome, but please contact Joe Pater if you are coming so he can buy enough bagels for lunch (which can be eaten during the first presentation!)
[Thanks Joe!]
Andrew McCallum is teaching his Computational Linguistics course this fall, Tuesdays and Thursdays, 2:30-3:45, in CMPSCI 140.
Here's Andrew's blurb about the course:
This Fall I will be teaching undergraduate Natural Language Processing again. This course is designed to introduce both Computer Science and Linguistics students to the exciting and intertwined topics of (1) using computational and statistical methods to give insight into observed human language phenomena, and (2) making computers perform various useful tasks with human languages, web pages, email, etc.
It typically attracts a fun, interdisciplinary group of engaged undergraduates.
The prerequisites are light: students should merely have some facility with programming, and familiarity with basic math (exponents, logs, elementary probability).
Even if you aren't sure you'd like to take the course, you are welcome to simply show up at the first lecture, September 4, Tuesday, 2:30pm in UMass Computer Science Building Room 140.
Chris Potts will be guest lecturing in Barbara Partee's Mathematical Linguistics class on November 14, 16, and 21. All the meetings are 1:00-2:15 pm in Herter 640.
November 14: Computation for theoretical linguistics --- when and where is it useful to take an algorithmic perspective?
November 16: The basics of game theory: strategic games with pure and mixed strategies, the minimax algorithm, equilibria, and signaling games. Chris will review the requisite background notions from probability theory.
November 21: Linguistic applications of game theory, with attempts to apply the lessons
Barbara Partee's Mathematical Linguistics has a series of guest lectures coming up:
The department server is now running WebExp2, the Web-based experiment software developed at the University of Edinburgh. At present, we just have some demos up. Contact Florian if you'd like to set up an experiment. Here are links to two of the demos. The first shows off the WebExp2 interface. The second is like Hot or Not, but it's science.
The comp4ling project is flourishing with a bunch of new utilities for doing linguistics (and writing up your results). The latest additions:
Chris Potts, Tim Beechey, and Aynat Rubinstein have begun the comp4ling project (view the announcement here). The first algorithm is up: it is a CGI/Perl implementation of Paul Dekker's Predicate Logic with Anaphora.
Watch this space for additional algorithms and other goodies as the summer progresses.
Suggestions for algorithms to include in the collection are very welcome. Send such suggestions to Chris.
LaTeX is used by mathematicians, physicists, logicians, and computer scientists the world over, and it was designed by one of the leading theoretical computer scientists. But the Wikipedia example of LaTeX in action is quite clearly a linguistics example.
By the way, do check out Knuth's answer to his FAQ "When did you stop using email?"
[Thanks Chris D!]
From John Kingston:
Colleagues,
The 1000th person has signed up!!!
We hit 1000 just now (17.15:01 Thursday 4 May 2006) for the number of people who've signed up to run in experiments for course credit through the experimental sign up database. This began less than two years ago, so it's been a real success. On behalf of all experimenters, I'd like to thank everyone who's been willing to grant course credit to students who participate in experiments, and I'd like to thank Youri for creating the web-based signup procedure we've been using.
John
The Experimental Sign-up Database is getting close to its 1000th sign-up. The database was established in February 2004 (according to WHISC). Youri Zabbal wrote the code, based on John Kingston's vision.
A truly inter-subdisciplinary group of UMass Amherst linguists met yesterday (April 12), and will meet again at 10:00 am on April 19, to use algorithms and techniques from linear programming to find a general method for determining whether a given pattern of violations marks has a consistent constraint weighting. A preliminary Perl/CGI implementation is described and linked to here. The group has been working slowly but steadily towards a linguistically customized version of the famous simplex algorithm for solving linear systems.
Chris Potts is guest lecturing in Andrew McCallum's Computational Linguistics class today, 2:30-3:45, in CS 140. He will be talking about his dabbling in computation for pragmatics.