Introduction to Natural Language Processing (CISC882)
Fall 2009

Time: T H 2:00-3:15  Place 102A Smith Hall
Professor:  Kathy McCoy Office:   Room 201 77-79 E. Delaware Avenue
    Office Hours:  T 3:30-5:00, H 9:00-10:30, by appointment
Email:  mccoy@cis.udel.edu Phone:  302-831-1956

Description:

This course provides an introduction to the field of computational linguistics, also called natural language processing (NLP) - the creation of computer programs that can understand and generate natural languages (such as English). We will use natural language understanding as a vehicle to introduce the three major subfields of NLP: syntax (which concerns itself with determining the structure of an utterance), semantics (which concerns itself with determining the explicit truth-functional meaning of a single utterance), and pragmatics (which concerns itself with deriving the context-dependent meaning of an utterance when it is used in a specific discourse context). The course will introduce both knowledge-based and statistical approaches to NLP, illustrate the use of NLP techniques and tools in a variety of application areas, and provide insight into many open research problems.

Prerequisites: CISC681 - Introduction to Artificial Intelligence

Text:

Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Second Edition by Jurafsky and Martin.  Please check the online errata for the text for each chapter as you read it.

Requirements:

Concepts taught in class will be reinforced with assignments (both problem sets and programming), a project (which will be presented to the class), and a midterm exam. The midterm exam will be take-hom to be due around October ??th.

There are several possibilities for the final project. In all cases, the final project will be presented to the class as a whole, and a write-up will also be expected.

You are expected to heavily participate in class discussion.

Grade Basis (approximate): homeworks/projects (35%), final project (30%), exam (25%), class participation (10%).

Syllabus (evolving and subject to change!).  On the syllabus I have left up roughly what I covered the last time this course was taught. At the end of the syllabus, I list topics that could be covered as time permits (and depending on the selection of final projects). Please give me your suggestions for topics!

As the course goes on, I will put slides/materials up on the web for the class lecture. I will try to put these up early so that you can make a printout to take notes on during the class.

Please note that many of the materials/slides are borrowed from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, and Johanna Moore. Also thanks also to Owen Rambow for the introduction to CFG's.

CALENDAR

Date
Topic Reading Assignments
9/01
Course Overview, Introduction
Print of Course Overview, Introduction
Chapter 1  
9/03
More Introduction...
 Assignment 1 due September 22nd
9/08
Regular Expressions and Automata
Print of Regular Expressions and Automata
Chapter 2, Perl Introduction by Patrick Ryan
 
9/10
Regular Expressions and Automata (second part)
Print of Regular Expressions and Automata (second part)
and
Finite Automata, Words, and the Lexicon
Print of Finite Automata, Words, and the Lexicon
Chapter 2, Perl Introduction by Patrick Ryan
 
9/15
Morphology and Finite State Transducers
Print of Morphology and Finite State Transducers
Chapter 3  
9/17
N-Grams
Print of N-Grams
Chapter 4 (through 4.4?)  
9/22
Continue with N-Grams     Assignment 2 due 10/13
ASSIGNMENT 1 DUE
9/24
Finish N-Grams
Word Classes and Part of Speech Tagging
Print of Word Classes and Part of Speech Tagging
Chapter 5   
9/29
more on Part of Speech Tagging
Chapter 5  
10/01
Finish Part of Speech Tagging
Context-Free Grammars for English
Print of Context-Free Grammars for English
Chapter 12    
10/06
Finish Context-Free Grammars for English
Start Parsing with CFGs,
Print of Parsing with CFGs
Chapter 13  
10/08
Guest Lecturer: Keith Trnka!!!
Words/N-Grams/Evaluation on Corpora

PDF of Keith's Language Modeling slides 6/page
PDF of Keith's Language Modeling slides 1/page  
 
10/13
More discussion of Context-Fress Grammars for English
Test Files for Assignment 2 Competition ASSIGNMENT 2 TECHNICALLY DUE - Test files released. Prepare spreadsheets for Thursday's class discussion

10/15
Discussion of Assignment 2 -- Competition for NLP Belt

10/20
Finish Parsing with CFGs; Earley Algorithm
Print of Earley Algorithm
Chapter 13.4
10/22
Guest Lecturer: Charlie Greenbacker
Generating Referring Expressions

 
10/27
Guest Lecture #2: Keith Trnka!!!!
Word Prediction and Topic/Style Modeling

 
10/29
Finish Early Algorithm
Features and Unification
Print of Features and Unification
Chapter 15 Midterm Exam Questions 1-3 Due
11/03
Representing Meaning
Print of Representing Meaning
Chapter 17 Midterm Exam Question 4 Due 
11/05
Representing Meaning Chapter 14  
11/10
Finish Representing Meaning; Semantic Analysis
Print of Semantic Analysis
Chapter 15
11/12
Finish Up sementic Analysis -- Intro to Compansion Project    
11/17
Discourse Processing: resolving anaphora, focusing, centering    
11/19
More Discourse: Centering, RAFT/RAPR, Pronoun Generation?    
11/24
CLASS PROJECT PRESENTATIONS    
12/26
HAPPY THANKSGIVING!!    
12/01
CLASS PROJECT PRESENTATIONS    
12/03
CLASS PROJECT PRESENTATIONS    
12/08
CLASS PROJECT PRESENTATIONS    
???
CLASS PROJECT PRESENTATIONS    
???
Final Class Project Due before 1:00pm Final Reports Due Project Reports
TOPIC LISTING (From Text)
Topic Reading  

Course Overview, Introduction Chapter 1  

Regular Expressions and Automata Chapter 2, Perl Introduction by Patrick Ryan


Words and the Lexicon Chapter 2  

Morphology and Finite State Transducers Chapter 3  

N-Grams Chapter 6 (through 6.4)

Word Classes and Part of Speech Tagging Chapter 8 (through 8.4)  

Context-Free Grammars for English Chapter 9  

Parsing with CFGs, Chapter 10  

Earley Algorithm Chapter 10  

Features and Unification Chapter 11  

Representing Meaning Chapter 14  

Semantic Analysis Chapter 15  

Discourse Chapter 18  

Natural Language Generation Chapter 20  

Project Presentations    

Probabilistic Models of Spelling Chapter 5.1-5.6,

and pieces of the rest of chapter

 

More on Part of Speech Tagging Chapter 8.5 - 8.7  

Lexicalized and Probabilistic Parsing Chapter 12  

Lexical Semantics Chapter 16  

Word Sense Disambiguation and Information Retrieval Chapter 17  

Dialogue and Conversational Agents Chapter 19  

Machine Translation Chapter 21  

Academic Integrity:

Assignments must be your own individual work, unless explicitly stated otherwise. You must do the work without undue help from other people, and you must not present material from resources such as the Web, books, papers, code listings, and other people as your own. You may talk to each other about concepts and techniques, but you must not discuss specific solutions or approaches to solutions. Web resources will be very useful in this course and we will encourage class discussion of the use of such resources with their proper citations. Copying or paraphrasing someone's work, or permitting your own work to be copied or paraphrased, even in part, is not allowed and will result in an automatic grade of 0 for the assignment.

Interesting Links (besides resources available from J&M):

Stanford University Natural Language Processing Lab's: Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources

Chapters 1 and 2:

Classic NLP programs

Chapter 3:

AT&T Labs - Research Finite State Machine Library

Chapter 11:

Michael Collins' Parser (requires a tagger to work).

Chapter 15:

Appelt and Israel's information extraction tutorial (IJCAI-99).

Chapter 16:

Framenet.

Chapter 19:

Allen's Dialogue Modeling for Spoken Language Systems tutorial (ACL Workshop 1997).

Books you may find useful (to borrow or find in the library):

  • Natural Language Understanding, by James Allen, 1995.
  • Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, 1999.
  • A Comprehensive Grammar of English Language, by Randolf Quirk, Sidney Greenbaum, Geoffrey Leech, Jan Svartvik, 1985.

    Thanks:

    Some of the materials used in this course borrow from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, and Johanna Moore whose courses themselves were influenced by others.  Also thanks  to Owen Rambow for the introduction to CFG's.