Introduction to Natural Language Processing (CISC882)
Fall 2007

Time: T H 12:30-1:45  Place 426 Smith Hall
Professor:  Kathy McCoy Office:   Room 201 77-79 E. Delaware Avenue
    Office Hours:  T H 9:00-10:30, by appointment
Email:  mccoy@cis.udel.edu Phone:  302-831-1956

*** NEW *** Pictures from the WWNLPC Belt Competition ***
Don't Mess with Charlie Greenbacker takes the Belt from Dynamite Dan the Blaster Blanchard!!!!

Description:

This course provides an introduction to the field of computational linguistics, also called natural language processing (NLP) - the creation of computer programs that can understand and generate natural languages (such as English). We will use natural language understanding as a vehicle to introduce the three major subfields of NLP: syntax (which concerns itself with determining the structure of an utterance), semantics (which concerns itself with determining the explicit truth-functional meaning of a single utterance), and pragmatics (which concerns itself with deriving the context-dependent meaning of an utterance when it is used in a specific discourse context). The course will introduce both knowledge-based and statistical approaches to NLP, illustrate the use of NLP techniques and tools in a variety of application areas, and provide insight into many open research problems.

Prerequisites: CISC681 - Introduction to Artificial Intelligence

Text:

Speech and Language Processing by Jurafsky and Martin.  Please check the online errata for the text for each chapter as you read it.  Please let me know if you find undocumented errors.

Requirements:

Concepts taught in class will be reinforced with assignments (both problem sets and programming), a project (which will be presented to the class), and exams. The midterm exam will most likely be take-hom to be dur around October 18th. The final will be given during final exam week and will cover the second half of the course. We will need to discuss the nature of the final projects, as I have several different possibilities in mind. You are expected to heavily participate in class discussion.

Grade Basis (approximate): homeworks (35%), project (25%), exams (35%), class participation (5%).

Syllabus (evolving and subject to change!).  On the syllabus I have a number of topics that I will cover. At the end of the syllabus, I list topics that I would like to be covered as time permits (and depending on the selection of final projects). Please give me your suggestions for topics!

I will also keep a calendar that will be filled in as the semester goes on with the slides/materials for each specific lecture.

Please note that many of the materials/slides are borrowed from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, and Johanna Moore. Also thanks also to Owen Rambow for the introduction to CFG's.

CALENDAR

Date
Topic Reading Assignments
8/28
Course Overview, Introduction Chapter 1  
8/30
Regular Expressions and Automata Chapter 2, Perl Introduction by Patrick Ryan
 Assignment 1 due 9/18
9/4
Regular Expressions and Automata (second part) Chapter 2, Perl Introduction by Patrick Ryan
 
9/6
Finite Automata, Words, and the Lexicon    
9/11
Morphology and Finite State Transducers Chapter 3  
9/13
N-Grams Chapter 6 (through 6.4)  
9/18
Continue with N-Grams     Assignment 2 due 10/11
ASSIGNMENT 1 DUE
9/20
More on N-Grams    
9/25
Word Classes and Part of Speech Tagging
Print of Word Classes and Part of Speech Tagging
Chapter 8  
9/27
More on Part of Speech Tagging    
10/2
Finish POS Tagging Chapter 9  
10/4
NEEDED TO RESCHEDULE CLASS -- K. OUT!!    
10/8
Moved from 10/4 - Make-Up Class
Context-Free Grammars for English
Print of Context-Free Grammars for English


10/9
Finish Context Free Grammars
Test Files for Assignment 2 Competition
ASSIGNMENT 2 TECHNICALLY DUE - Prepare spreadsheets for Thursday's class discussion
10/11
Discussion of Assignment 2 -- Competition for NLP Belt   Assignment 3 due 10/23
NEW: Test and Solution File for Assignment 3 Found Here
10/16
NEED TO RESCHEDULE CLASS -- K. OUT OF TOWN!!    
10/18
Parsing with CFGs,
Print of Parsing with CFGs
Chapter 10-10.3  
10/22
Make-up Class moved from 10/16
Finish Parsing with CFGs; Earley Algorithm
Print of Earley Algorithm
Chapter 10.4  
10/23
Features and Unification
Print of Features and Unification
Chapter 11 Assignment 3 Due
Class Exam Due NEW DATE: 11/1
10/25
NO LONGER NEED TO RESCHEDULE CLASS -- K. NOT OUT OF TOWN!!    
10/25
Representing Meaning
Print of Representing Meaning
Chapter 14  
10/30
Representing Meaning Chapter 14  
11/1
Finish Representing Meaning; Semantic Analysis
Print of Semantic Analysis
Chapter 15 NEW: Midterm Exam Due
11/6
Finish Up sementic Analysis -- Intro to Compansion Project    
11/8
Discourse Processing: resolving anaphora, focusing, centering    
11/13
More Discourse: Centering, RAFT/RAPR, Pronoun Generation?    
11/15
CLASS PROJECT PRESENTATIONS    
11/20
CLASS PROJECT PRESENTATIONS    
11/22
HAPPY THANKSGIVING!!    
11/27
CLASS PROJECT PRESENTATIONS    
11/29
CLASS PROJECT PRESENTATIONS    
12/4
CLASS PROJECT PRESENTATIONS    
???
Final Class Project Due before 1:00pm Final Reports Due Project Reports
TOPIC LISTING (From Text)

Topic Reading  

Course Overview, Introduction Chapter 1  

Regular Expressions and Automata Chapter 2, Perl Introduction by Patrick Ryan


Words and the Lexicon Chapter 2  

Morphology and Finite State Transducers Chapter 3  

N-Grams Chapter 6 (through 6.4)

Word Classes and Part of Speech Tagging Chapter 8 (through 8.4)  

Context-Free Grammars for English Chapter 9  

Parsing with CFGs, Chapter 10  

Earley Algorithm Chapter 10  

Features and Unification Chapter 11  

Representing Meaning Chapter 14  

Semantic Analysis Chapter 15  

Discourse Chapter 18  

Natural Language Generation Chapter 20  

Project Presentations    

Probabilistic Models of Spelling Chapter 5.1-5.6,

and pieces of the rest of chapter

 

More on Part of Speech Tagging Chapter 8.5 - 8.7  

Lexicalized and Probabilistic Parsing Chapter 12  

Lexical Semantics Chapter 16  

Word Sense Disambiguation and Information Retrieval Chapter 17  

Dialogue and Conversational Agents Chapter 19  

Machine Translation Chapter 21  

Academic Integrity:

Assignments must be your own individual work, unless explicitly stated otherwise. You must do the work without undue help from other people, and you must not present material from resources such as the Web, books, papers, code listings, and other people as your own. You may talk to each other about concepts and techniques, but you must not discuss specific solutions or approaches to solutions. Web resources will be very useful in this course and we will encourage class discussion of the use of such resources with their proper citations. Copying or paraphrasing someone's work, or permitting your own work to be copied or paraphrased, even in part, is not allowed and will result in an automatic grade of 0 for the assignment.

Interesting Links (besides resources available from J&M):

Chapters 1 and 2:

Classic NLP programs

Chapter 3:

AT&T Labs - Research Finite State Machine Library

Chapter 8:

The LT POS HMM part of speech tagger

Chapter 11:

Michael Collins' Parser (requires a tagger to work).

Chapter 15:

Appelt and Israel's information extraction tutorial (IJCAI-99).

Chapter 16:

Framenet.

Chapter 19:

Allen's Dialogue Modeling for Spoken Language Systems tutorial (ACL Workshop 1997).

Books you may find useful (to borrow or find in the library):

  • Natural Language Understanding, by James Allen, 1995.
  • Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, 1999.
  • A Comprehensive Grammar of English Language, by Randolf Quirk, Sidney Greenbaum, Geoffrey Leech, Jan Svartvik, 1985.

    Thanks:

    Some of the materials used in this course borrow from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, and Johanna Moore whose courses themselves were influenced by others.  Also thanks  to Owen Rambow for the introduction to CFG's.