CISC-683: Data Mining

Project


Students in CISC-683 are required to do a project. You have a great deal of latitude in selecting the topic of your project. My objective is that you have the opportunity to do something of interest to you and that you learn from it.

Important dates: Below I've provided links to several sites that have data repositories.



    Links to sites with publicly available datasets --- There is overlap among the datasets provided at the different sites:

  1. University of California Irvine Data Mining Repository: a large repository of datasets supplied that serves as a benchmark for comparison of data mining techniques

  2. University of California Irvine Machine Learning Repository: a large repository of datasets supplied by individuals, with some overlap with the Data Mining Repository

  3. ACM Data Mining and Knowledge Discovery Cup Center: contains links to instructions and datasets for the annual KDD contest

  4. Links to a variety of large datasets: These are very large datasets, but many of them are not well-described

  5. Links to criminal justice datasets: Some large datasets from international sources, but their format may make them more difficult to use

  6. Links to datasets: Many of these are statistical or done without descriptions of the attributes, and so may not be of much use. However, others (such as the baseball dataset) are interesting

  7. Financial and Economic Datasets --- lots of overlap among them:
    First link
    Second link
    Third link

  8. Asteroid dataset

  9. Insurance dataset: This dataset was used in the CoIL (Computational Intelligence and Learning Cluster) competition.








  10.