Introduction to Natural Language Processing
CISC882 -- Fall 2012

Time: T H 9:30-10:45  Place 102A Smith Hall
Professor:  Kathy McCoy Office:   Room 108, Human Language Technologies Lab
Office Hours:  T 8:00-9:15, H 11:00-12:30, by appointment   aka "The Tea House"; 100 Elkton Road
Email:  mccoy@cis.udel.edu Phone:  302-831-1956

Description:

This course provides an introduction to the field of computational linguistics, also called natural language processing (NLP) - the creation of computer programs that can understand and generate natural languages (such as English). We will use natural language understanding as a vehicle to introduce the three major subfields of NLP: syntax (which concerns itself with determining the structure of an utterance), semantics (which concerns itself with determining the explicit truth-functional meaning of a single utterance), and pragmatics (which concerns itself with deriving the context-dependent meaning of an utterance when it is used in a specific discourse context). The course will introduce both linguistic (knowledge-based) and statistical approaches to NLP, illustrate the use of NLP techniques and tools in a variety of application areas, and provide insight into many open research problems.

Prerequisites: CISC681 - Introduction to Artificial Intelligence

Text:

Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Second Edition by Jurafsky and Martin.  Please check the online errata for the text for each chapter as you read it.

Requirements:

Concepts taught in class will be reinforced with assignments (both problem sets and programming), a project (which will be presented to the class), and a midterm exam. The midterm exam will be take-home to be due around October ??th.

There are several possibilities for the final project. In all cases, the final project will be presented to the class as a whole, and a write-up will also be expected.

You are expected to heavily participate in class discussion.

Grade Basis (approximate): homeworks/projects (35%), final project (30%), exam (25%), class participation (10%).

Syllabus (evolving and subject to change!).  On the syllabus I have left up roughly what I covered the last time this course was taught. At the end of the syllabus, I list topics that could be covered as time permits (and depending on the selection of final projects). Please give me your suggestions for topics!

As the course goes on, I will put slides/materials up on the web for the class lecture. I will try to put these up early so that you can make a printout to take notes on during the class.

Please note that many of the materials/slides are borrowed from the book's website and also from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, Kathy McKeown, and Johanna Moore. Also thanks also to Owen Rambow for the introduction to CFG's.

CALENDAR - Second Half WILL Change

Date
Topic Reading Assignments
8/28
Course Overview, Introduction
Print of Course Overview, Introduction
Chapter 1  
8/30
Regular Expressions and Automata
Print of Regular Expressions and Automata
Chapter 2, Perl Introduction by Patrick Ryan
Assignment 1 - Stock market Question Answering - due September 18th

Test File assign1-wsj_2300.txt
9/04
Finish up lecture 2 - Regular Expressions and Automata

A short lecture on Words and the Lexicon
Print of a short lecture on Words and the Lexicon

Morphology and Finite State Transducers
Print of Morphology and Finite State Transducers
Chapter 3  
9/06
Continue with Morphology and Finite State Transducers Chapter 3  
9/11
N-Grams
Print of N-Grams
Chapter 4 (through 4.7)  
9/13
Continue with N-Grams Chapter 4 (through 4.7)  
9/18
Finish N-Grams

Context-Free Grammars for English
Print of Context-Free Grammars for English
Chapter 12.1-12.4  ASSIGNMENT 1 DUE September 19th, noon
9/20
Assignment 1 Results -- Candy-Bar Competition

Continue with Context-Free Grammars
Chapter 12.1-12.4   Assignment 2 due 10/10 - midnight

GetScanTime.zip Evaluation Script
9/25
Finish Context-Free Grammar for English Chapter 12.1-12.4    
9/27
Start Parsing with CFGs,
Print of Parsing with CFGs
Chapter 13  
10/02
Some English Analysis,
Print of Some English Analysis
Chapter 13  
10/04
More Parsing with CFGs; CKY and Earley Algorithms
Print of CKY and Earley Algorithms
Chapter 13  
10/09
More Parsing; Start Statistical Parsing
Print of Statistical Parsing
Chapter 14
10/11
More Statistical Parsing Chapter 14
10/16
Finish Statistical Parsing; Start Unification Grammars
Print Unification Grammars  
 
10/18
Discussion of Assignment 2 -- Competition for NLP Belt      
10/23
No Class - Kathy out of town     
10/25
No Class - Kathy out of town   
10/30
Postponed Class - University Classes Canceled due to Hurricane Sandy   Midterm Exam Due - Extension Given to November 1st
11/1
Finish Unification Grammars; Representing Meaning
Print of Representing Meaning
Chapter 17 Midterm Exam Due 
11/06
NO CLASS - Election Day!    
11/8
Finish Representing Meaning; Semantic Analysis
Print of Semantic Analysis
Chapter 18
11/13
Finish Up sementic Analysis; Lexical Semantics
Print of Lexical Semantics
Chapter 19   
11/14
*** Wednesday Evening Make-Up Class Marathon ***
6:30pm-9:00pm
Finish Lexical Semantics;
Begin Question Answering, Information Retrieval, and Text Summarization
Print of Question Answering, Information Retrieval, and Text Summarization
Chapter 23  Assignment 3 Out... 
11/15
More on Information Retrieval, and Text Summarization    
11/20
Discourse Coherence
Print of Discourse Coherence
Chapte 21   
12/22
HAPPY THANKSGIVING!!    
11/27
More Discourse: Rhetorical Structure Theory, Anaphora, Centering    
11/29
Continue Anaphora/Coherence
Print slides on Focusing/Centering  
   
12/04
Anaphora Resolution      
12/5
*** Wednesday Evening Class ***
6:30pm-9:00pm; 102A Smith
CLASS PROJECT3 PRESENTATIONS/EVALUATIONS
   
12/13
Take-Home Final Exam Due before 12:30pm Final Exam Due Final Exam
TOPIC LISTING (From Text)

Topic Reading  

Course Overview, Introduction Chapter 1  

Regular Expressions and Automata Chapter 2, Perl Introduction by Patrick Ryan


Words and the Lexicon Chapter 2  

Morphology and Finite State Transducers Chapter 3  

N-Grams Chapter 4

Word Classes and Part of Speech Tagging Chapter 5 (through 5.6)  

Context-Free Grammars for English Chapter 12  

Parsing with CFGs, Chapter 13  

Earley Algorithm Chapter 13.4  

Statistical Parsing Chapter 14  

Features and Unification Chapter 15  

Representing Meaning Chapter 17  

Semantic Analysis Chapter 18  

Lexical Semantics Chapter 19  

Word Sense Disambiguation Chapter 20  

Discourse Chapter 21  

Applications Chapter 22, 23, 24  

Academic Integrity:

Assignments must be your own individual work, unless explicitly stated otherwise. You must do the work without undue help from other people, and you must not present material from resources such as the Web, books, papers, code listings, and other people as your own. You may talk to each other about concepts and techniques, but you must not discuss specific solutions or approaches to solutions. Web resources will be very useful in this course and we will encourage class discussion of the use of such resources with their proper citations. Copying or paraphrasing someone's work, or permitting your own work to be copied or paraphrased, even in part, is not allowed and will result in an automatic grade of 0 for the assignment.

Interesting Links (besides resources available from J&M):

Stanford University Natural Language Processing Lab's: Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources

Chapters 1 and 2:

Classic NLP programs

Chapter 3:

AT&T Labs - Research Finite State Machine Library

Chapter 11:

Michael Collins' Parser (requires a tagger to work).

Chapter 15:

Appelt and Israel's information extraction tutorial (IJCAI-99).

Chapter 16:

Framenet.

Chapter 19:

Allen's Dialogue Modeling for Spoken Language Systems tutorial (ACL Workshop 1997).

Books you may find useful (to borrow or find in the library):

  • Natural Language Understanding, by James Allen, 1995.
  • Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, 1999.
  • A Comprehensive Grammar of English Language, by Randolf Quirk, Sidney Greenbaum, Geoffrey Leech, Jan Svartvik, 1985.

    Thanks:

    Some of the materials used in this course borrow from the NLP courses of Julia Hirschberg, Diane Litman, James Martin, and Johanna Moore whose courses themselves were influenced by others.  Also thanks  to Owen Rambow for the introduction to CFG's.