Information Filtering and Classification

Michael J. Pazzani

University of California, Irvine

The vast amount of information available on the Internet has given rise to a number of agents for locating relevant, useful or interesting information for a given individual. Such agents perform tasks such as prioritizing, filtering, or sorting electronic mail; filtering news group articles and locating interesting articles in unread newsgroups; "clipping" articles from on-line news services; constructing queries for Internet search engines to find relevant information; guiding a user to find relevant information on the World Wide Web; notifying a user when a significant change occurs to a web site or providing access to information relevant to a user's current tasks.  This tutorial focuses on the technology for filtering and classifying information.

To perform such tasks, a profile of the user's interests must be created. In this tutorial, we will focus on the learning and representation of user profiles, the methods for collecting user feedback, and the representation of information sources. This tutorial will review a variety the findings from several decades of research on information retrieval focusing on approaches to information filtering and classification. Next, machine learning approaches to classification will be described including decision trees, nearest neighbor algorithms, Bayesian classifiers and neural networks.  We will discuss how they may be used to learn user profiles  The relationship between machine learning and classic approaches from information retrieval will be discussed.  Finally, recent developments such as collaborative filtering, efficient rule learners, combining multiple models, weighted majority algorithms and infinite attribute models will be described.

The technology will be illustrated with examples from a variety of information agents including LIRA, NewsWeeder, WebWatcher, WebDoggie,  Fab, WiseWire, SavvySearch, FAQFinder, InfoFinder, Letizia, firefly, InfoFinder, Syskill & Webert, DICA and the Remembrance Agent
 
 
 

Michael Pazzani is a professor and department chair in Information and Computer Science at the University of California, Irvine.  He has been active in Machine Learning research for the past decade with numerous publications in IJCAI, AAAI, and the International Machine Learning Conference. He has taught a variety of courses including Introduction to Artificial Intelligence at the undergraduate level (8 times), Natural Language Processing at the graduate level and graduate seminars in Machine Learning and Information Retrieval.