Main Menu


Contact Information

Research

Publications

Dissertation

NLP Links

Misc. Links

John Chen


Old Contact Information

University of Delaware
Department of Computer and Information Sciences

Office: NLP/AI Lab (Greenhouse)
77-79 East Delaware Avenue, Newark, DE 19711
Phone: (302) 831-3183
Fax: (302) 831-4091
Email: jchen@cis.udel.edu

I was a PhD student in the Department of Computer and Information Sciences at the University of Delaware; I have since graduated. I now work for Columbia University.


Research

My interests lie in computational linguistics . My advisor is Professor K. Vijay-Shanker . I am looking at different methods in which parsing of natural languages may be done efficiently and accurately . With regard to the former, one approach is to have a preprocessing step that assigns words of the input sentence with tags, each tag giving detailed instructions as to how a parser should relate this word with other words in the sentence. This is called supertag disambiguation. Supertagging achieves exceptional efficiency (linear time) but suffers in accuracy. Four reasons why accuracy suffers is because it does not consider enough context when making disambiguation decisions, because its training corpus had mistakes in it, because of lousy independence assumptions made by the basic model (a legacy of its heritage in part of speech tagging models), and because of sparse data problems (quite pervasive in statistical models of natural language processing). To address the first problem, we developed models that considers more context while keeping linear time. To address the second problem, we developed a grammar extraction procedure that produces a mistake free training corpus from the Penn Treebank, a large bracketed corpus of Wall Street Journal text. To address the third problem, we are going to see if chart parsing, with a more sound statistical model, can be employed to boost accuracy. With regards to how we address the third problem, we realize that there are already a myriad of statistical chart parsers out there. We differ from the others by basing our chart parsing on tree adjoining grammar, that is, the units of grammar that our chart parser will manipulate are much larger than those usually used. Consequently, we are exploring new issues such as how to do smoothing in this model. This smoothing will also address the fourth and last problem that we identified which hindered accurate supertagging. We are also curious if a hybrid supertagging and chart parsing model can combine the advantages of the former (i.e. efficiency) with our hypothesized notions of the advantages of the latter (i.e. accuracy). In support of this idea, we have empirically shown that while supertagging's accuracy may be low, a slight increase in ambiguity by coarsening the tags a little is enough to significantly increase accuracy.


Publications

John Chen and Owen Rambow. Use of Deep Linguistic Features for the Recognition and Labeling of Semantic Arguments . In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing Sapporo, Japan, 2003.

John Chen, Srinivas Bangalore, Owen Rambow, and Marilyn Walker. Towards Automatic Generation of Natural Language Generation Systems . In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002) Taipei, Taiwan, 2002.

John Chen, Srinivas Bangalore, Michael Collins, and Owen Rambow. Reranking an N-Gram Supertagger . In Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Frameworks Venice, Italy, 2002.

Alexis Nasr, Owen Rambow, John Chen, and Srinivas Bangalore. Context-Free Parsing of a Tree Adjoining Grammar Using Finite-State Machines. In Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Frameworks Venice, Italy, 2002.

Srinivas Bangalore, John Chen, and Owen Rambow. Impact of Quality and Quantity of Corpora on Stochastic Generation . In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, Pittsburgh, Pennsylvania, 2001.

John Chen and K. Vijay-Shanker. Towards a Reduced Commitment, D-theory Style TAG Parser . In Harry Bunt and Anton Nijholt, editors, Advances in Probabilistic and Other Parsing Technologies, pages 141-160. Kluwer Academic Publishers, Boston, Massachusetts, 2000. Also appeared in Proceedings of the 5th International Workshop on Parsing Technologies, Cambridge, Massachusetts, 1997.

John Chen and K. Vijay-Shanker. Automated Extraction of TAGs from the Penn Treebank . In Proceedings of the 6th International Workshop on Parsing Technologies, Trento, Italy, 2000.

John Chen, Srinivas Bangalore, and K. Vijay-Shanker. New Models for Improving Supertag Disambiguation . In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, 1999.

Robert Frank, K. Vijay-Shanker, and John Chen. Dominance, Precedence, and C-Command in Description-based Parsing . In Proceedings of the 7th Congress on Formal Languages and Natural Languages, La Sen D'Urgell, Spain, 1996.


Dissertation

John Chen. Towards Efficient Statistical Parsing using Lexicalized Grammatical Information. Thesis, 2001. [pdf] , [ps]


Last modified: July 15, 2002