Prerequisite This course requires basic knowledge of statistics and artificial intelligence.

Course Description
The course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course Outline Overview of Data Mining: Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL Data Preparation/Wrangling: Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques Classification/Supervised Learning: Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression Clustering/Unsupervised Learning: K-Means, Fuzzy c-Means, Self-Organizing Map Model Evaluation: Confusion Matrix, Recall and Precision, ROC Curve Patterns and Association Mining: A-Priori Algorithm Discussion: Case Studies and Research Papers

Reference Books
Introduction to Data Mining by Tan, Steinbach and Kumar (2006) Data Mining Concepts and Techniques by Han and Kamber (2011) Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2011) R and Data Mining: Examples and Case Studies by Yanchang Zhao (2013) Data Science for Businesses by Provost and Fawcett (2013)
Applied Predictive Modeling by Kuhn and Johnson (2013)

PrerequisiteThis course requires basic knowledge of statistics and artificial intelligence.

Course DescriptionThe course introduces students to fundamentals of data mining theory and algorithms. In addition to building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of actual data sets via popular data mining tools such as Weka, KNIME and R. The list of covered topics include classification (k-nearest neighborhood, classification tree, naïve Bayes, artificial neural networks), regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining. Feature selection, data cleaning, data transformation, model evaluation and data visualization are also covered in sufficient details. By the end of this course, students are expected to have learned the art of modeling and interpreting large complicated data sets via predictive and descriptive data mining methods.

Course OutlineOverview of Data Mining: Definition, Original of Data Mining, Applications of Data Mining, Data Mining vs. OLAP and SQL

Data Preparation/Wrangling: Feature Ranking, Feature Discretization, Normalization, Outlier Detection Techniques

Classification/Supervised Learning: Classification Tree, Naïve Bayes, Neural Networks, k-NN Classifier, Logistic Regression

Clustering/Unsupervised Learning: K-Means, Fuzzy c-Means, Self-Organizing Map

Model Evaluation: Confusion Matrix, Recall and Precision, ROC Curve

Patterns and Association Mining: A-Priori Algorithm

Discussion: Case Studies and Research Papers

Reference BooksIntroduction to Data Mining by Tan, Steinbach and Kumar (2006)

Data Mining Concepts and Techniques by Han and Kamber (2011)

Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank (2011)

R and Data Mining: Examples and Case Studies by Yanchang Zhao (2013)

Data Science for Businesses by Provost and Fawcett (2013)

Applied Predictive Modeling by Kuhn and Johnson (2013)

JournalsIEEE Transactions on Knowledge Engineering and Data Mining (IEEE)

Data Mining and Knowledge Discovery (Springer)

Data and Knowledge Engineering (Elsevier)

WIREs Data Mining and Knowledge Discovery (Wiley)

International Journal of Knowledge Engineering and Data Mining (Inderscience)

Marks DistributionTwo Midterms - 30%

Final - 40%

Projects - 20%

Research Paper Presentations - 10%