CISC-683: Data Mining
Project
Students in CISC-683 are required to do a
project. You have a great
deal of latitude in selecting the topic of your project. My objective
is that you have the opportunity to do something of interest to you and
that you learn from it.
Important dates:
-
Thurs. Nov. 6: Initial project description due
-
Thurs. Dec.11: Final project due
Below I've provided links to several sites that
have data repositories.
Links to sites with publicly available datasets --- There is overlap
among the datasets provided at the different sites:
-
University of California Irvine Data Mining Repository: a large repository of datasets
supplied that serves as a benchmark for comparison of data mining techniques
-
University of California Irvine Machine Learning Repository: a large repository of datasets
supplied by individuals, with some overlap with the Data Mining Repository
-
ACM Data Mining and Knowledge
Discovery Cup Center: contains links to instructions and datasets
for the annual KDD contest
-
Links to a variety of large
datasets: These are very large datasets, but many of them are not
well-described
-
Links to
criminal justice datasets: Some large datasets from international
sources, but their format may make them more difficult to use
-
Links to
datasets: Many of these are statistical or done without
descriptions of the attributes, and so may not be of much use.
However, others
(such as the baseball dataset) are interesting
- Financial and Economic Datasets --- lots of overlap among them:
First link
Second link
Third link
- Asteroid dataset
-
Insurance dataset:
This dataset was used in the CoIL (Computational Intelligence and Learning
Cluster) competition.