CISC882 - Introduction to Natural Language Processing - Assignment 1

Due: Tuesday, September 18, 2007

Exercises

(These exercises are borrowed heavily from Johanna Moore, University of Edinburgh.)

  1. ELIZA (30 points total).
    • Implement a small version of your own ELIZA in Perl.  You should include enough rules so as to hold a conversation that is at least 10 exchanges long.

For those of you who already have some Perl experience, do Jurafsky and Martin 2.2. Stick with the Rogerian psychotherapy domain and implement your program in perl.

      1. Provide rules such that for a given user input, there is more than one option (as in the example on page 32-33 of J&M.)
      2. When more than one rule can apply, select a rule at random.
      3. The original ELIZA had a "memory" mechanism. When no pattern matched the input, it said "Tell me more about X", where X was some topic that the user mentioned earlier in the dialogue; i.e., X was something that appeared in an input that the user typed in previously. Add such a history mechanism to your program.
  1. Jurafsky & Martin 2.1 (15 points)
  2. Jurafsky & Martin 2.4 (15 points)
  3. Jurafsky & Martin 2.5 (15 points)
  4. Jurafsky & Martin 2.6 (15 points)
  5. Exercise 7: (70 points total)

Using the FSA's you've just designed, write a program in Perl that puts XML-like tags around time and date specifications. For example:

    • INPUT: a text in English.

 

    • OUTPUT: the same text with all date and time expressions marked by <TIME> and </TIME> (for both dates and times).

 

    • SAMPLE INPUT: Christmas is celebrated on the 25th of December.

 

    • SAMPLE OUTPUT: <TIME> Christmas </TIME> is celebrated on <TIME> the 25th of December </TIME>.

 

    • SCOPE: At a minimum, your program should be able to process all time and date expressions in the following files:

as well as all time and date expressions listed in exercises 2.4 - 2.6. in J&M, page 54.

 

    • SUBMIT: source code, output of your program on the files mentioned earlier and a text file listing all time and date expressions that your program can handle.